Vehicle Maintenance Cost Prediction using Bayesian Regression in ML
FREE Online Courses: Enroll Now, Thank us Later!
Fleet‑management teams must forecast each vehicle’s next maintenance cost before scheduling service, using early-life indicators such as vehicle age, odometer reading, engine hours, prior maintenance costs, and usage intensity. Maintenance costs tend to increase nonlinearly with age and usage (e.g., via exponential wear-out), and uncertainty arises from variability in driving conditions and fluctuations in repair costs. By applying Bayesian Regression, we obtain not only a point estimate of expected cost but also a credible interval that quantifies our uncertainty—enabling data-driven budgeting and proactive parts procurement.
Libraries Required
import pandas as pd # data manipulation import numpy as np # numerical ops import matplotlib.pyplot as plt # plotting import seaborn as sns # visualization import pymc3 as pm # Bayesian modeling import arviz as az # posterior analysis from sklearn.model_selection import train_test_split from sklearn.preprocessing import StandardScaler from sklearn.metrics import mean_absolute_error
Dataset
Logistics Vehicle Maintenance History Dataset
Step-by-Step Code Implementation
Import Libraries & Load Data
import pandas as pd
# Load dataset
df = pd.read_csv("data/logistics-vehicle-maintenance-history-dataset.csv")
# Preview relevant columns
df.head()[[
'Age','Mileage','EngineHours','PrevMaintenanceCost','MaintenanceCost'
]]
Preprocessing & Train/Test Split
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
# Select features and target
X = df[['Age','Mileage','EngineHours','PrevMaintenanceCost']].values
y = df['MaintenanceCost'].values # USD
# Chronological or random split
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42
)
# Standardize predictors for stable MCMC sampling
scaler = StandardScaler().fit(X_train)
X_train_s = scaler.transform(X_train)
X_test_s = scaler.transform(X_test)
Define & Fit Bayesian Regression Model
- Priors (α,β,σ): We use broad normal priors for intercepts and slopes to remain weakly informative, reflecting substantial uncertainty about cost drivers.
- Model: μ = α + β·X_standardized captures a linear—but uncertainty‑quantified—relationship between cost and features.
- Likelihood: Observed costs are assumed to be normally distributed around μ with standard deviation σ.
- Sampling: MCMC draws (2,000 post‑tuning) with target_accept=0.9 ensure stable exploration of the posterior.
import pymc3 as pm
with pm.Model() as model:
# Priors: intercept α and weights β
α = pm.Normal("α", mu=0, sigma=100)
β = pm.Normal("β", mu=0, sigma=50, shape=X_train_s.shape[1])
σ = pm.HalfNormal("σ", sigma=50)
# Linear predictor
μ = α + pm.math.dot(X_train_s, β)
# Likelihood
Y_obs = pm.Normal("Y_obs", mu=μ, sigma=σ, observed=y_train)
# MCMC sampling
trace = pm.sample(
draws=2000, tune=1000,
target_accept=0.9,
return_inferencedata=True
)
Posterior Analysis & Point Predictions
- Posterior predictive: We generate predicted costs for new inputs and derive high-probability (HPD) credible intervals to quantify forecast uncertainty.
- Point estimates & MAE: Posterior means of α and β yield point forecasts; MAE on the hold‑out set quantifies average prediction error.
import arviz as az
# Summarize posterior distributions
az.summary(trace, round_to=2)
# Posterior predictive sampling
with model:
ppc = pm.sample_posterior_predictive(trace, var_names=["Y_obs"])
# Compute posterior means for α and β
α_post = trace.posterior["α"].mean().item()
β_post = trace.posterior["β"].mean(dim=["chain","draw"]).values
# Point predictions on test set
y_pred = α_post + X_test_s.dot(β_post)
# Evaluate mean absolute error
mae = mean_absolute_error(y_test, y_pred)
print(f"Test MAE: ${mae:.2f}")
Visualise Predictions & Credible Intervals
Plotting cost vs mileage (holding other features fixed) with credible bands illustrates both expected cost trend and our uncertainty, guiding risk‑aware maintenance budgeting.
# Example: vary Mileage, hold others at median
mile_grid = np.linspace(X_train_s[:,1].min(), X_train_s[:,1].max(), 100)
grid = np.column_stack([
np.full_like(mile_grid, X_train_s[:,0].mean()), # Age
mile_grid,
np.full_like(mile_grid, X_train_s[:,2].mean()), # EngineHours
np.full_like(mile_grid, X_train_s[:,3].mean()) # PrevCost
])
with model:
pm.set_data({"X": grid})
ppc_grid = pm.sample_posterior_predictive(trace, var_names=["Y_obs"])
preds = ppc_grid["Y_obs"]
mean_pred = preds.mean(axis=0)
hpd = az.hdi(preds, hdi_prob=0.94)
# Transform mileage back
mile_orig = scaler.inverse_transform(
np.column_stack([np.zeros_like(mile_grid), grid[:,1:],])
)[:,1]
plt.figure(figsize=(8,5))
plt.plot(mile_orig, mean_pred, label="Posterior mean")
plt.fill_between(mile_orig, hpd[:,0], hpd[:,1], alpha=0.3, label="94% CI")
plt.scatter(scaler.inverse_transform(X_test_s)[:,1], y_test,
color="k", alpha=0.5, label="Test data")
plt.xlabel("Mileage")
plt.ylabel("Maintenance Cost (USD)")
plt.title("Bayesian Regression: Cost vs. Mileage")
plt.legend()
plt.show()
Summary
By leveraging Bayesian Regression for vehicle maintenance cost forecasting, we achieve:
1. Point estimates of upcoming maintenance spending based on early indicators (age, mileage, engine hours).
2. Credible intervals that reflect uncertainty from usage variability and repair‐price fluctuations.
3. Actionable insights: fleet managers can budget with upper and lower bounds, proactively schedule parts procurement, and identify vehicles with high forecasting uncertainty for closer monitoring.