Equipment Rental Cost Prediction using Bayesian Regression in ML
FREE Online Courses: Click, Learn, Succeed, Start Now!
Equipment‑rental companies and project planners need to estimate the daily rental cost of heavy machinery before committing to a rental contract—using factors such as machine age, prior hours of use, manufacturer, model category, and region. Rental rates escalate nonlinearly with machine wear (higher‐hour units command lower rates) and with specialized categories (e.g., hydraulic cranes cost more per day than standard excavators). Moreover, regional demand and fleet availability introduce uncertainty. By applying Bayesian Regression, we can produce both:
1. A point forecast of the daily rental rate.
2. A credible interval around that forecast—capturing our uncertainty from data sparsity and market fluctuation.
Libraries Required
import pandas as pd # data I/O import numpy as np # numerical ops import matplotlib.pyplot as plt # plotting import seaborn as sns # visualization import pymc3 as pm # Bayesian modeling import arviz as az # posterior analysis from sklearn.model_selection import train_test_split from sklearn.preprocessing import StandardScaler from sklearn.metrics import mean_absolute_error
Dataset
Step-by-Step Code Implementation
Data Loading & Feature Engineering
We convert the sale price into a rental‐rate proxy at 0.1%/day—an industry heuristic.
Encoding & scaling:
- One‑hot encode Manufacturer and Model Category.
- Standardise year and hours so Bayesian priors apply uniformly.
import pandas as pd
# Load the dataset
df = pd.read_csv("data/heavy_equipment.csv")
# Select and rename relevant columns
# Assume columns: 'Year Made', 'Machine Hours', 'Manufacturer', 'Model Category', 'SalePrice'
df = df.rename(columns={
'Year Made': 'year',
'Machine Hours': 'hours',
'SalePrice': 'price'
})
df = df[['year','hours','Manufacturer','Model Category','price']].dropna()
# Create a proxy for rental cost: assume daily rate ≈ 0.1% of sale price
df['daily_rate'] = df['price'] * 0.001
# One‐hot encode categorical features
df = pd.get_dummies(df, columns=['Manufacturer','Model Category'], drop_first=True)
# Define predictors and target
X = df.drop(columns=['price','daily_rate']).values
y = df['daily_rate'].values
Train/Test Split & Standardisation
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
# Split into train and test (80/20)
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42
)
# Standardize numeric columns (year, hours)
scaler = StandardScaler().fit(X_train[:, :2])
X_train_s = X_train.copy()
X_train_s[:, :2] = scaler.transform(X_train[:, :2])
X_test_s = X_test.copy()
X_test_s[:, :2] = scaler.transform(X_test[:, :2])
Define & Fit Bayesian Regression Model
Model:
- Weakly informative Normal priors on intercept (α) and slopes (β).
- HalfNormal prior on σ to enforce positive noise scale.
- Observed daily rental rates modelled as Normal(α + X·β, σ).
Sampling: 2,000 posterior draws (after 1,000 tuning) with target_accept=0.9 for robust convergence.
import pymc3 as pm
with pm.Model() as rental_model:
# Priors
α = pm.Normal("α", mu=0, sigma=1) # intercept
β = pm.Normal("β", mu=0, sigma=1, shape=X_train_s.shape[1])
σ = pm.HalfNormal("σ", sigma=1) # noise scale
# Linear predictor
μ = α + pm.math.dot(X_train_s, β)
# Likelihood: daily_rate observed
Y_obs = pm.Normal("Y_obs", mu=μ, sigma=σ, observed=y_train)
# MCMC sampling
trace = pm.sample(
draws=2000, # posterior draws
tune=1000, # burn‑in
target_accept=0.9,
return_inferencedata=True
)
Posterior Analysis & Point Predictions
- Posterior predictive: Generates predictive distributions, from which we compute point forecasts and a 94% Highest Posterior Density interval.
- Evaluation: Mean Absolute Error on held‑out machines quantifies the average deviation of our rate predictions.
import arviz as az
from sklearn.metrics import mean_absolute_error
# Summarize posterior
az.summary(trace, round_to=2)
# Posterior predictive sampling
with rental_model:
ppc = pm.sample_posterior_predictive(trace, var_names=["Y_obs"])
# Posterior means for α and β
α_post = trace.posterior["α"].mean().item()
β_post = trace.posterior["β"].mean(dim=["chain","draw"]).values
# Point forecasts
y_pred = α_post + X_test_s.dot(β_post)
# Evaluate MAE
mae = mean_absolute_error(y_test, y_pred)
print(f"Test MAE: ${mae:.2f} per day")
Visualise Predictions & Credible Intervals
By varying machine‐hours (wear) and holding other features constant, we plot both the posterior mean rental rate curve and its 94% credible band—showing how higher‑hour machines command lower rates and how uncertain that relationship is.
import numpy as np
import matplotlib.pyplot as plt
# Sweep 'hours' (machine wear) from min to max
hours_grid = np.linspace(X_train_s[:,1].min(), X_train_s[:,1].max(), 100)
grid = np.median(X_train_s, axis=0)[None,:].repeat(100, axis=0)
grid[:,1] = hours_grid
with rental_model:
ppc_grid = pm.sample_posterior_predictive(trace,
var_names=["Y_obs"],
samples=1000)
preds = ppc_grid["Y_obs"]
mean_pred = preds.mean(axis=0)
hpd = az.hdi(preds, hdi_prob=0.94)
# Back-transform hours
hours_orig = scaler.inverse_transform(grid)[:,1]
plt.figure(figsize=(8,5))
plt.plot(hours_orig, mean_pred, label="Posterior mean")
plt.fill_between(hours_orig, hpd[:,0], hpd[:,1], alpha=0.3,
label="94% credible interval")
plt.scatter(
scaler.inverse_transform(X_test_s)[:,1],
y_test, color="k", alpha=0.5, label="Test data"
)
plt.xlabel("Machine Hours")
plt.ylabel("Estimated Daily Rental Rate (USD)")
plt.title("Bayesian Regression: Rate vs. Machine Hours")
plt.legend()
plt.tight_layout()
plt.show()
Summary
This Bayesian Regression workflow for Equipment Rental Cost Prediction delivers:
1. Point forecasts of daily rental rates from machine attributes and wear metrics.
2. Credible intervals that quantify prediction uncertainty—key for contract negotiations and fleet utilization planning.
3. Actionable insights: rental companies can price equipment under uncertainty, plan maintenance cycles, and optimize fleet deployment with risk‑aware rate estimates.