Shipping Rate Cost Prediction using Bayesian Regression in ML

FREE Online Courses: Transform Your Career – Enroll for Free!

Logistics managers need to forecast the per‑shipment cost—before tendering loads to carriers—using early‑trip indicators such as distance, freight weight, vehicle type, and route complexity (e.g. number of stops). Shipping costs scale nonlinearly (longer distances sometimes yield volume discounts; high‑weight loads incur surcharges) and vary with fuel-price volatility and carrier-rate fluctuations. A point‐estimate alone hides this uncertainty, risking underbidding or margin erosion. By applying Bayesian Regression, we obtain:

1. A point estimate of shipment cost.

2. A credible interval quantifying forecasting uncertainty—enabling risk‑aware pricing and carrier negotiations.

Libraries Required

import pandas as pd                              # data loading & manipulation  
import numpy as np                               # numerical operations  

import matplotlib.pyplot as plt                  # plotting  
import seaborn as sns                            # visualization  

import pymc3 as pm                               # Bayesian modeling  
import arviz as az                               # posterior analysis  

from sklearn.model_selection import train_test_split  
from sklearn.preprocessing import StandardScaler  
from sklearn.metrics import mean_absolute_error

Dataset

Cost Prediction for Logistic Company 

Step-by-Step Code Implementation

Import Libraries & Load Data

import pandas as pd

# Load the competition data
df = pd.read_csv("data/train.csv")

# Inspect key columns
df[['Distance','Trip_Cost']].head()

Feature Engineering & Train/Test Split

We train on Distance only; additional covariates (e.g., weight, stops) can easily be added.
We z‑score Distance so the priors on β apply uniformly and the sampler converges reliably.

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

# Select predictors and target
# Here we'll use Distance; you can add Weight, Stops, etc., if available
X = df[['Distance']].values
y = df['Trip_Cost'].values  # USD

# Split into train (80%) and test (20%)
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# Standardize distance for stable MCMC
scaler = StandardScaler().fit(X_train)
X_train_s = scaler.transform(X_train)
X_test_s  = scaler.transform(X_test)

Define & Fit Bayesian Regression Model

Model Specification:

α ∼ Normal(0, 100) as a broad prior for baseline cost.
β ∼ Normal(0, 50) capturing how cost scales per standardized distance‐unit.
σ ∼ HalfNormal(50) enforces positive residual noise.
Observations follow Trip_Cost ∼ Normal(α + β·Distance_std, σ).

Inference: We draw 2,000 posterior samples after 1,000 tuning steps, with target_accept=0.9 to ensure stable convergence diagnostics.

import pymc3 as pm

with pm.Model() as shipping_model:
    # Priors
    α = pm.Normal("α", mu=0, sigma=100)                        # intercept prior
    β = pm.Normal("β", mu=0, sigma=50)                         # slope prior
    σ = pm.HalfNormal("σ", sigma=50)                           # residual scale

    # Expected cost linear predictor
    μ = α + β * X_train_s.flatten()

    # Likelihood
    Y_obs = pm.Normal("Y_obs", mu=μ, sigma=σ, observed=y_train)

    # MCMC sampling
    trace = pm.sample(
        draws=2000,       # number of posterior draws
        tune=1000,        # burn‑in
        target_accept=0.9,
        return_inferencedata=True
    )

Posterior Analysis & Point Predictions

Posterior Predictive: We generate predictive distributions for both held‑out test distances and a smoothed grid, allowing us to compute point forecasts (posterior means) and 94% Highest Posterior Density intervals.
Evaluation: Mean Absolute Error (MAE) on the test set quantifies the average forecasting error in USD per trip.

import arviz as az
from sklearn.metrics import mean_absolute_error

# Summarize posterior distributions
az.summary(trace, round_to=2)

# Posterior predictive sampling
with shipping_model:
    ppc = pm.sample_posterior_predictive(trace, var_names=["Y_obs"])

# Extract posterior means
α_post = trace.posterior["α"].mean().item()
β_post = trace.posterior["β"].mean().item()

# Compute point predictions on standardized test set
y_pred = α_post + β_post * X_test_s.flatten()

# Evaluate performance
mae = mean_absolute_error(y_test, y_pred)
print(f"Test MAE: ${mae:.2f}")

Visualise Predictions & Credible Intervals

Plotting cost vs. distance with credible intervals highlights both expected cost scaling and the uncertainty due to data variability and model inference.

import numpy as np
import matplotlib.pyplot as plt

# Create a grid of distance values
dist_grid_s = np.linspace(X_train_s.min(), X_train_s.max(), 100)
with shipping_model:
    # Posterior predictive for the grid
    μ_grid = trace.posterior["α"].values.flatten()[:,None] \
            + trace.posterior["β"].values.flatten()[:,None] * dist_grid_s[None,:]
    # Compute mean and 94% HPD
    mean_pred = μ_grid.mean(axis=0)
    hpd = az.hdi(μ_grid, hdi_prob=0.94)

# Back-transform distances
dist_grid = scaler.inverse_transform(dist_grid_s.reshape(-1,1)).flatten()

plt.figure(figsize=(8,5))
plt.plot(dist_grid, mean_pred, label="Posterior mean")
plt.fill_between(dist_grid, hpd[:,0], hpd[:,1], alpha=0.3,
                 label="94% credible interval")
plt.scatter(X_test.flatten(), y_test, color="k", alpha=0.5,
            label="Test data")
plt.xlabel("Distance")
plt.ylabel("Trip Cost (USD)")
plt.title("Bayesian Regression: Shipping Cost vs. Distance")
plt.legend()
plt.tight_layout()
plt.show()

Summary

This Bayesian Regression workflow for Shipping Rate Cost Prediction provides:

1. Point estimates of per‑shipment cost from early trip indicators.

2. Credible intervals that quantify forecasting uncertainty—vital for risk‑aware tendering and pricing.

3. Actionable insights: logistics managers can set bids, negotiate carrier rates, and budget fuel and labour with confidence bounds—optimising both margin and service reliability.

Did you like our efforts? If Yes, please give ProjectGurukul 5 Stars on Google | Facebook

Shipping Rate Cost Prediction using Bayesian Regression in ML

Libraries Required

Dataset

Step-by-Step Code Implementation

Import Libraries & Load Data

Feature Engineering & Train/Test Split