Supply Chain Cost Prediction using Bayesian Regression in ML

FREE Online Courses: Dive into Knowledge for Free. Learn More!

Logistics and procurement teams need to forecast per‑shipment supply‐chain costs—before committing to carriers or negotiating rates—using early indicators such as distance, weight, volume, number of stops, and mode of transport (e.g., road, rail, sea). Costs scale nonlinearly with these drivers (longer distances can unlock volume discounts, and multimodal routes incur handling surcharges), and are subject to uncertainty from fuel‐price volatility and carrier availability. By applying Bayesian Regression, we obtain both:

1. A point estimate of expected shipment cost.

2. A credible interval quantifying our forecast uncertainty—enabling data‑driven tendering, budgeting, and risk management.

Libraries Required

import pandas as pd                              # data loading & manipulation  
import numpy as np                               # numerical operations  

import matplotlib.pyplot as plt                  # plotting  
import seaborn as sns                            # visualization  

import pymc3 as pm                               # Bayesian modeling  
import arviz as az                               # posterior analysis  

from sklearn.model_selection import train_test_split  
from sklearn.preprocessing import StandardScaler  
from sklearn.metrics import mean_absolute_error

Dataset

Supply Chain Shipment Pricing Data

Step-by-Step Code Implementation

Data Loading & Preprocessing

We one‑hot encode Transport_Mode and z‑score numerical drivers so that priors on β operate uniformly and MCMC converges reliably.

import pandas as pd

# Load the shipment pricing data
df = pd.read_csv("data/supply-chain-shipment-pricing-data.csv")

# Keep relevant features
df = df[['Distance','Weight','Volume','Stops','Transport_Mode','Cost']].dropna()

# One‑hot encode transport mode
df = pd.get_dummies(df, columns=['Transport_Mode'], drop_first=True)

# Define predictors and target
X = df.drop(columns='Cost').values
y = df['Cost'].values  # USD per shipment

# Chronological or random 80/20 split
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# Standardize numeric features (first 4 columns) for stable MCMC
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler().fit(X_train[:, :4])
X_train_s = X_train.copy()
X_train_s[:, :4] = scaler.transform(X_train[:, :4])
X_test_s = X_test.copy()
X_test_s[:, :4] = scaler.transform(X_test[:, :4])

Define & Fit Bayesian Regression Model

Priors:

α ∼ Normal(0, 1 000) accommodates baseline cost scales.
βᵢ ∼ Normal(0, 100) reflects moderate uncertainty per standardised predictor.
σ ∼ HalfNormal(500) enforces positive residual noise at the cost scale.

Model:

Observed Cost ∼ Normal(α + β·X_std, σ).

Inference:

We draw 2,000 posterior samples (after 1,000 tuning steps) with target_accept=0.9 for stable convergence.

Posterior predictive:

Sampling Y_obs yields full predictive distributions; we extract both posterior mean forecasts and 94% Highest Posterior Density intervals to quantify forecast uncertainty.

import pymc3 as pm

with pm.Model() as scm_model:
    # Priors: intercept and coefficients
    α = pm.Normal("α", mu=0, sigma=1e3)
    β = pm.Normal("β", mu=0, sigma=100, shape=X_train_s.shape[1])
    σ = pm.HalfNormal("σ", sigma=500)

    # Linear predictor
    μ = α + pm.math.dot(X_train_s, β)

    # Likelihood: observed shipment cost
    Y_obs = pm.Normal("Y_obs", mu=μ, sigma=σ, observed=y_train)

    # MCMC sampling
    trace = pm.sample(
        draws=2000,
        tune=1000,
        target_accept=0.9,
        return_inferencedata=True
    )

Posterior Analysis & Point Predictions

Mean Absolute Error (MAE) on held‑out shipments quantifies average point‑forecast error.

import arviz as az
from sklearn.metrics import mean_absolute_error

# Summarize posterior distributions
az.summary(trace, round_to=2)

# Posterior predictive sampling
with scm_model:
    ppc = pm.sample_posterior_predictive(trace, var_names=["Y_obs"])

# Extract posterior means
α_post = trace.posterior["α"].mean().item()
β_post = trace.posterior["β"].mean(dim=["chain","draw"]).values

# Point forecasts on test set
y_pred = α_post + X_test_s.dot(β_post)

# Evaluate MAE
mae = mean_absolute_error(y_test, y_pred)
print(f"Test MAE: ${mae:.2f} per shipment")

Visualise Predictions & Credible Intervals

Sweeping Distance and holding other features fixed, we plot the expected cost curve alongside its credible band—revealing both the cost driver’s effect and the uncertainty around predictions.

import numpy as np
import matplotlib.pyplot as plt

# Sweep Distance while holding other features at median
dist_grid = np.linspace(X_train_s[:,0].min(), X_train_s[:,0].max(), 100)
grid = np.median(X_train_s, axis=0)[None,:].repeat(100, axis=0)
grid[:,0] = dist_grid

with scm_model:
    ppc_grid = pm.sample_posterior_predictive(
        trace, var_names=["Y_obs"], samples=1000
    )

preds     = ppc_grid["Y_obs"]
mean_pred = preds.mean(axis=0)
hpd       = az.hdi(preds, hdi_prob=0.94)

# Back‑transform Distance
dist_orig = scaler.inverse_transform(grid[:, :4])[:,0]

plt.figure(figsize=(8,5))
plt.plot(dist_orig, mean_pred, label="Posterior mean")
plt.fill_between(dist_orig, hpd[:,0], hpd[:,1], alpha=0.3,
                 label="94% credible interval")
plt.scatter(
    scaler.inverse_transform(X_test_s[:, :4])[:,0],
    y_test, color="k", alpha=0.5, label="Test data"
)
plt.xlabel("Distance")
plt.ylabel("Shipment Cost (USD)")
plt.title("Bayesian Regression: Cost vs. Distance")
plt.legend()
plt.tight_layout()
plt.show()

Summary

This Bayesian Regression workflow for Supply Chain Cost Prediction delivers:

1. Accurate point estimates of per‑shipment costs from early logistical drivers.

2. Credible intervals quantifying uncertainty from market and operational variability—critical for risk‑aware tendering and budgeting.

3. Actionable insights: supply‐chain managers can set carrier bids, negotiate contracts, and allocate contingency funds with explicit confidence bounds—optimising cost efficiency and service reliability.

Did you like our efforts? If Yes, please give ProjectGurukul 5 Stars on Google | Facebook

Supply Chain Cost Prediction using Bayesian Regression in ML

Libraries Required

Dataset

Step-by-Step Code Implementation

Data Loading & Preprocessing

Define & Fit Bayesian Regression Model

Posterior Analysis & Point Predictions

Visualise Predictions & Credible Intervals