Equipment Rental Cost Prediction using Bayesian Regression in ML

FREE Online Courses: Knowledge Awaits – Click for Free Access!

Equipment‑rental companies and project planners need to estimate the daily rental cost of heavy machinery before committing to a rental contract—using factors such as machine age, prior hours of use, manufacturer, model category, and region. Rental rates escalate nonlinearly with machine wear (higher‐hour units command lower rates) and with specialized categories (e.g., hydraulic cranes cost more per day than standard excavators). Moreover, regional demand and fleet availability introduce uncertainty. By applying Bayesian Regression, we can produce both:

1. A point forecast of the daily rental rate.

2. A credible interval around that forecast—capturing our uncertainty from data sparsity and market fluctuation.

Libraries Required

import pandas as pd                              # data I/O  				
import numpy as np                               # numerical ops  

import matplotlib.pyplot as plt                  # plotting  
import seaborn as sns                            # visualization  

import pymc3 as pm                               # Bayesian modeling  
import arviz as az                               # posterior analysis  

from sklearn.model_selection import train_test_split  
from sklearn.preprocessing import StandardScaler  
from sklearn.metrics import mean_absolute_error

Dataset

Heavy Equipment Pricing Data

Step-by-Step Code Implementation

Data Loading & Feature Engineering

We convert the sale price into a rental‐rate proxy at 0.1%/day—an industry heuristic.

Encoding & scaling:

One‑hot encode Manufacturer and Model Category.
Standardise year and hours so Bayesian priors apply uniformly.

import pandas as pd

# Load the dataset
df = pd.read_csv("data/heavy_equipment.csv")

# Select and rename relevant columns
# Assume columns: 'Year Made', 'Machine Hours', 'Manufacturer', 'Model Category', 'SalePrice'
df = df.rename(columns={
    'Year Made': 'year',
    'Machine Hours': 'hours',
    'SalePrice': 'price'
})
df = df[['year','hours','Manufacturer','Model Category','price']].dropna()

# Create a proxy for rental cost: assume daily rate ≈ 0.1% of sale price
df['daily_rate'] = df['price'] * 0.001

# One‐hot encode categorical features
df = pd.get_dummies(df, columns=['Manufacturer','Model Category'], drop_first=True)

# Define predictors and target
X = df.drop(columns=['price','daily_rate']).values
y = df['daily_rate'].values

Train/Test Split & Standardisation

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

# Split into train and test (80/20)
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# Standardize numeric columns (year, hours)
scaler = StandardScaler().fit(X_train[:, :2])
X_train_s = X_train.copy()
X_train_s[:, :2] = scaler.transform(X_train[:, :2])
X_test_s = X_test.copy()
X_test_s[:, :2] = scaler.transform(X_test[:, :2])

Define & Fit Bayesian Regression Model

Model:

Weakly informative Normal priors on intercept (α) and slopes (β).
HalfNormal prior on σ to enforce positive noise scale.
Observed daily rental rates modelled as Normal(α + X·β, σ).

Sampling: 2,000 posterior draws (after 1,000 tuning) with target_accept=0.9 for robust convergence.

import pymc3 as pm

with pm.Model() as rental_model:
    # Priors
    α = pm.Normal("α", mu=0, sigma=1)                      # intercept
    β = pm.Normal("β", mu=0, sigma=1, shape=X_train_s.shape[1])
    σ = pm.HalfNormal("σ", sigma=1)                         # noise scale

    # Linear predictor
    μ = α + pm.math.dot(X_train_s, β)

    # Likelihood: daily_rate observed
    Y_obs = pm.Normal("Y_obs", mu=μ, sigma=σ, observed=y_train)

    # MCMC sampling
    trace = pm.sample(
        draws=2000,       # posterior draws
        tune=1000,        # burn‑in
        target_accept=0.9,
        return_inferencedata=True
    )

Posterior Analysis & Point Predictions

Posterior predictive: Generates predictive distributions, from which we compute point forecasts and a 94% Highest Posterior Density interval.
Evaluation: Mean Absolute Error on held‑out machines quantifies the average deviation of our rate predictions.

import arviz as az
from sklearn.metrics import mean_absolute_error

# Summarize posterior
az.summary(trace, round_to=2)

# Posterior predictive sampling
with rental_model:
    ppc = pm.sample_posterior_predictive(trace, var_names=["Y_obs"])

# Posterior means for α and β
α_post = trace.posterior["α"].mean().item()
β_post = trace.posterior["β"].mean(dim=["chain","draw"]).values

# Point forecasts
y_pred = α_post + X_test_s.dot(β_post)

# Evaluate MAE
mae = mean_absolute_error(y_test, y_pred)
print(f"Test MAE: ${mae:.2f} per day")

Visualise Predictions & Credible Intervals

By varying machine‐hours (wear) and holding other features constant, we plot both the posterior mean rental rate curve and its 94% credible band—showing how higher‑hour machines command lower rates and how uncertain that relationship is.

import numpy as np
import matplotlib.pyplot as plt

# Sweep 'hours' (machine wear) from min to max
hours_grid = np.linspace(X_train_s[:,1].min(), X_train_s[:,1].max(), 100)
grid = np.median(X_train_s, axis=0)[None,:].repeat(100, axis=0)
grid[:,1] = hours_grid

with rental_model:
    ppc_grid = pm.sample_posterior_predictive(trace,
                                              var_names=["Y_obs"],
                                              samples=1000)

preds     = ppc_grid["Y_obs"]
mean_pred = preds.mean(axis=0)
hpd       = az.hdi(preds, hdi_prob=0.94)

# Back-transform hours
hours_orig = scaler.inverse_transform(grid)[:,1]

plt.figure(figsize=(8,5))
plt.plot(hours_orig, mean_pred, label="Posterior mean")
plt.fill_between(hours_orig, hpd[:,0], hpd[:,1], alpha=0.3,
                 label="94% credible interval")
plt.scatter(
    scaler.inverse_transform(X_test_s)[:,1],
    y_test, color="k", alpha=0.5, label="Test data"
)
plt.xlabel("Machine Hours")
plt.ylabel("Estimated Daily Rental Rate (USD)")
plt.title("Bayesian Regression: Rate vs. Machine Hours")
plt.legend()
plt.tight_layout()
plt.show()

Summary

This Bayesian Regression workflow for Equipment Rental Cost Prediction delivers:

1. Point forecasts of daily rental rates from machine attributes and wear metrics.

2. Credible intervals that quantify prediction uncertainty—key for contract negotiations and fleet utilization planning.

3. Actionable insights: rental companies can price equipment under uncertainty, plan maintenance cycles, and optimize fleet deployment with risk‑aware rate estimates.

Did we exceed your expectations?
If Yes, share your valuable feedback on Google | Facebook

Equipment Rental Cost Prediction using Bayesian Regression in ML

Libraries Required

Dataset

Step-by-Step Code Implementation

Data Loading & Feature Engineering

Train/Test Split & Standardisation

Define & Fit Bayesian Regression Model