Shipping Rate Cost Prediction using Bayesian Regression in ML
FREE Online Courses: Elevate Skills, Zero Cost. Enroll Now!
Logistics managers need to forecast the per‑shipment cost—before tendering loads to carriers—using early‑trip indicators such as distance, freight weight, vehicle type, and route complexity (e.g. number of stops). Shipping costs scale nonlinearly (longer distances sometimes yield volume discounts; high‑weight loads incur surcharges) and vary with fuel-price volatility and carrier-rate fluctuations. A point‐estimate alone hides this uncertainty, risking underbidding or margin erosion. By applying Bayesian Regression, we obtain:
1. A point estimate of shipment cost.
2. A credible interval quantifying forecasting uncertainty—enabling risk‑aware pricing and carrier negotiations.
Libraries Required
import pandas as pd # data loading & manipulation import numpy as np # numerical operations import matplotlib.pyplot as plt # plotting import seaborn as sns # visualization import pymc3 as pm # Bayesian modeling import arviz as az # posterior analysis from sklearn.model_selection import train_test_split from sklearn.preprocessing import StandardScaler from sklearn.metrics import mean_absolute_error
Dataset
Cost Prediction for Logistic Company
Step-by-Step Code Implementation
Import Libraries & Load Data
import pandas as pd
# Load the competition data
df = pd.read_csv("data/train.csv")
# Inspect key columns
df[['Distance','Trip_Cost']].head()
Feature Engineering & Train/Test Split
- We train on Distance only; additional covariates (e.g., weight, stops) can easily be added.
- We z‑score Distance so the priors on β apply uniformly and the sampler converges reliably.
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
# Select predictors and target
# Here we'll use Distance; you can add Weight, Stops, etc., if available
X = df[['Distance']].values
y = df['Trip_Cost'].values # USD
# Split into train (80%) and test (20%)
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42
)
# Standardize distance for stable MCMC
scaler = StandardScaler().fit(X_train)
X_train_s = scaler.transform(X_train)
X_test_s = scaler.transform(X_test)
Define & Fit Bayesian Regression Model
Model Specification:
- α ∼ Normal(0, 100) as a broad prior for baseline cost.
- β ∼ Normal(0, 50) capturing how cost scales per standardized distance‐unit.
- σ ∼ HalfNormal(50) enforces positive residual noise.
- Observations follow Trip_Cost ∼ Normal(α + β·Distance_std, σ).
Inference: We draw 2,000 posterior samples after 1,000 tuning steps, with target_accept=0.9 to ensure stable convergence diagnostics.
import pymc3 as pm
with pm.Model() as shipping_model:
# Priors
α = pm.Normal("α", mu=0, sigma=100) # intercept prior
β = pm.Normal("β", mu=0, sigma=50) # slope prior
σ = pm.HalfNormal("σ", sigma=50) # residual scale
# Expected cost linear predictor
μ = α + β * X_train_s.flatten()
# Likelihood
Y_obs = pm.Normal("Y_obs", mu=μ, sigma=σ, observed=y_train)
# MCMC sampling
trace = pm.sample(
draws=2000, # number of posterior draws
tune=1000, # burn‑in
target_accept=0.9,
return_inferencedata=True
)
Posterior Analysis & Point Predictions
- Posterior Predictive: We generate predictive distributions for both held‑out test distances and a smoothed grid, allowing us to compute point forecasts (posterior means) and 94% Highest Posterior Density intervals.
- Evaluation: Mean Absolute Error (MAE) on the test set quantifies the average forecasting error in USD per trip.
import arviz as az
from sklearn.metrics import mean_absolute_error
# Summarize posterior distributions
az.summary(trace, round_to=2)
# Posterior predictive sampling
with shipping_model:
ppc = pm.sample_posterior_predictive(trace, var_names=["Y_obs"])
# Extract posterior means
α_post = trace.posterior["α"].mean().item()
β_post = trace.posterior["β"].mean().item()
# Compute point predictions on standardized test set
y_pred = α_post + β_post * X_test_s.flatten()
# Evaluate performance
mae = mean_absolute_error(y_test, y_pred)
print(f"Test MAE: ${mae:.2f}")
Visualise Predictions & Credible Intervals
Plotting cost vs. distance with credible intervals highlights both expected cost scaling and the uncertainty due to data variability and model inference.
import numpy as np
import matplotlib.pyplot as plt
# Create a grid of distance values
dist_grid_s = np.linspace(X_train_s.min(), X_train_s.max(), 100)
with shipping_model:
# Posterior predictive for the grid
μ_grid = trace.posterior["α"].values.flatten()[:,None] \
+ trace.posterior["β"].values.flatten()[:,None] * dist_grid_s[None,:]
# Compute mean and 94% HPD
mean_pred = μ_grid.mean(axis=0)
hpd = az.hdi(μ_grid, hdi_prob=0.94)
# Back-transform distances
dist_grid = scaler.inverse_transform(dist_grid_s.reshape(-1,1)).flatten()
plt.figure(figsize=(8,5))
plt.plot(dist_grid, mean_pred, label="Posterior mean")
plt.fill_between(dist_grid, hpd[:,0], hpd[:,1], alpha=0.3,
label="94% credible interval")
plt.scatter(X_test.flatten(), y_test, color="k", alpha=0.5,
label="Test data")
plt.xlabel("Distance")
plt.ylabel("Trip Cost (USD)")
plt.title("Bayesian Regression: Shipping Cost vs. Distance")
plt.legend()
plt.tight_layout()
plt.show()
Summary
This Bayesian Regression workflow for Shipping Rate Cost Prediction provides:
1. Point estimates of per‑shipment cost from early trip indicators.
2. Credible intervals that quantify forecasting uncertainty—vital for risk‑aware tendering and pricing.
3. Actionable insights: logistics managers can set bids, negotiate carrier rates, and budget fuel and labour with confidence bounds—optimising both margin and service reliability.