Freight Cost Prediction using Quantile Regression in ML
We offer you a brighter future with FREE online courses - Start Now!!
While traditional freight‑cost models forecast the average shipping expense, logistics planners must also anticipate variability—from low‑cost bulk shipments (10th percentile) to high‑cost expedited deliveries (90th percentile).
In this freight cost prediction in ML project, we will predict the 10th, 50th, and 90th percentiles of per‑shipment cost (USD) using shipment attributes such as distance (km), weight (kg), volume (m³), transport mode (road vs. rail), and service level (standard vs. expedited). By fitting separate quantile regression models, we’ll uncover how each feature’s influence shifts across the cost distribution—helping supply‑chain teams set conservative budgets, target typical expenses, and provision for peak‑cost scenarios.
Libraries Required
import pandas as pd # Data loading & manipulation import numpy as np # Numerical operations import statsmodels.formula.api as smf # Quantile regression via formula API from sklearn.model_selection import train_test_split # Train/test split from sklearn.metrics import mean_pinball_loss # Proper loss for quantile forecasts
Dataset
Supply Chain Shipment Pricing Data
Step-by-Step Code Implementation
Load & Inspect Data
We load the pricing dataset, which includes trip-level features (distance, weight, volume), categorical fields (Transport_Mode, Service_Level), and the observed cost (Cost_USD) for ~50,000 shipments (Kaggle). We examine the schema and cost summary to identify range and skew.
# Load the “Supply Chain Shipment Pricing” dataset from Kaggle :contentReference[oaicite:1]{index=1}
df = pd.read_csv("supply_chain_shipment_pricing_data.csv")
# Inspect structure and cost distribution
print(df.head())
print(df.info())
print(df['Cost_USD'].describe())
Preprocessing & Feature Engineering
- We drop any incomplete records in our key variables.
- We convert Transport_Mode (e.g., Road/Rail) and Service_Level (Standard/Expedited) into binary dummy variables, dropping one category to avoid multicollinearity.
- We assemble features: numeric predictors (Distance_km, Weight_kg, Volume_m3) and the two dummies. We rename the cost column to Cost for brevity.
# Drop missing rows in key columns
df = df.dropna(subset=[
'Distance_km','Weight_kg','Volume_m3',
'Transport_Mode','Service_Level','Cost_USD'
])
# Map categorical features to dummies
df = pd.get_dummies(df,
columns=['Transport_Mode','Service_Level'],
drop_first=True
)
# Define predictors and target
features = [
'Distance_km','Weight_kg','Volume_m3',
'Transport_Mode_Rail','Service_Level_Expedited'
]
data = df[features + ['Cost_USD']].rename(
columns={'Cost_USD':'Cost'}
)
Train/Test Split
We randomly hold out 20% of shipments for evaluation, ensuring our quantile models generalize to unseen routes and cargo profiles.
# Reserve 20% for evaluation
train, test = train_test_split(
data, test_size=0.2, random_state=42
)
Fit Quantile Regression Models
For each percentile (10th, 50th, 90th):
- We build a formula, e.g. “Cost ~ Distance_km + Weight_kg + …”.
- We fit a QuantReg model on the training set at that quantile.
- We print the coefficient table, revealing how each predictor’s effect varies—e.g., Distance_km may contribute less to the lower‑cost tail than to the upper tail.
quantiles = [0.10, 0.50, 0.90]
results = {}
formula = "Cost ~ " + " + ".join(features)
for q in quantiles:
model = smf.quantreg(formula, train)
res = model.fit(q=q)
results[q] = res
print(f"\n--- {int(q*100)}th Percentile Coefficients ---")
print(res.summary().tables[1]) # show coefficient estimates
Evaluation with Pinball Loss
- We generate quantile‑specific cost forecasts on the test set.
- We compute pinball loss for each quantile—quantifying asymmetrically weighted prediction errors appropriate to that percentile. Lower pinball loss indicates more accurate quantile calibration.
for q, res in results.items():
preds = res.predict(test[features])
loss = mean_pinball_loss(test['Cost'], preds, alpha=q)
print(f"{int(q*100)}th quantile pinball loss: {loss:.2f}")
Summary
By modelling the 10th, 50th, and 90th percentiles of freight cost, we gain distribution‑aware insights into shipping expenses:
- The 10th‑percentile model supports budgeting for the cheapest bulk shipments, avoiding over‑provisioning.
- The median (50th‑percentile) model predicts typical shipping costs for everyday planning.
- The 90th‑percentile model prepares for high‑cost expedited or long‑distance shipments, ensuring financial buffers for peak scenarios.
These quantile forecasts give logistics and finance teams more precise cost estimates, helping them optimise rate negotiations, route planning, and working‑capital allocation amid demand and operational uncertainty.