Freight Cost Prediction using Quantile Regression in ML

We offer you a brighter future with FREE online courses - Start Now!!

While traditional freight‑cost models forecast the average shipping expense, logistics planners must also anticipate variability—from low‑cost bulk shipments (10th percentile) to high‑cost expedited deliveries (90th percentile).

In this freight cost prediction in ML project, we will predict the 10th, 50th, and 90th percentiles of per‑shipment cost (USD) using shipment attributes such as distance (km), weight (kg), volume (m³), transport mode (road vs. rail), and service level (standard vs. expedited). By fitting separate quantile regression models, we’ll uncover how each feature’s influence shifts across the cost distribution—helping supply‑chain teams set conservative budgets, target typical expenses, and provision for peak‑cost scenarios.

Libraries Required

import pandas as pd                      # Data loading & manipulation  
import numpy as np                       # Numerical operations  
import statsmodels.formula.api as smf    # Quantile regression via formula API  
from sklearn.model_selection import train_test_split  # Train/test split  
from sklearn.metrics import mean_pinball_loss        # Proper loss for quantile forecasts 

Dataset

Supply Chain Shipment Pricing Data

Step-by-Step Code Implementation

Load & Inspect Data

We load the pricing dataset, which includes trip-level features (distance, weight, volume), categorical fields (Transport_Mode, Service_Level), and the observed cost (Cost_USD) for ~50,000 shipments (Kaggle). We examine the schema and cost summary to identify range and skew.

# Load the “Supply Chain Shipment Pricing” dataset from Kaggle :contentReference[oaicite:1]{index=1}
df = pd.read_csv("supply_chain_shipment_pricing_data.csv")

# Inspect structure and cost distribution
print(df.head())
print(df.info())
print(df['Cost_USD'].describe())

Preprocessing & Feature Engineering

  • We drop any incomplete records in our key variables.
  • We convert Transport_Mode (e.g., Road/Rail) and Service_Level (Standard/Expedited) into binary dummy variables, dropping one category to avoid multicollinearity.
  • We assemble features: numeric predictors (Distance_km, Weight_kg, Volume_m3) and the two dummies. We rename the cost column to Cost for brevity.
# Drop missing rows in key columns
df = df.dropna(subset=[
    'Distance_km','Weight_kg','Volume_m3',
    'Transport_Mode','Service_Level','Cost_USD'
])

# Map categorical features to dummies
df = pd.get_dummies(df,
    columns=['Transport_Mode','Service_Level'],
    drop_first=True
)

# Define predictors and target
features = [
    'Distance_km','Weight_kg','Volume_m3',
    'Transport_Mode_Rail','Service_Level_Expedited'
]
data = df[features + ['Cost_USD']].rename(
    columns={'Cost_USD':'Cost'}
)

Train/Test Split

We randomly hold out 20% of shipments for evaluation, ensuring our quantile models generalize to unseen routes and cargo profiles.

# Reserve 20% for evaluation
train, test = train_test_split(
    data, test_size=0.2, random_state=42
)

Fit Quantile Regression Models

For each percentile (10th, 50th, 90th):

  • We build a formula, e.g. “Cost ~ Distance_km + Weight_kg + …”.
  • We fit a QuantReg model on the training set at that quantile.
  • We print the coefficient table, revealing how each predictor’s effect varies—e.g., Distance_km may contribute less to the lower‑cost tail than to the upper tail.
quantiles = [0.10, 0.50, 0.90]
results   = {}
formula   = "Cost ~ " + " + ".join(features)

for q in quantiles:
    model = smf.quantreg(formula, train)
    res   = model.fit(q=q)
    results[q] = res
    print(f"\n--- {int(q*100)}th Percentile Coefficients ---")
    print(res.summary().tables[1])   # show coefficient estimates

Evaluation with Pinball Loss

  • We generate quantile‑specific cost forecasts on the test set.
  • We compute pinball loss for each quantile—quantifying asymmetrically weighted prediction errors appropriate to that percentile. Lower pinball loss indicates more accurate quantile calibration.
for q, res in results.items():
    preds = res.predict(test[features])
    loss  = mean_pinball_loss(test['Cost'], preds, alpha=q)
    print(f"{int(q*100)}th quantile pinball loss: {loss:.2f}")

Summary

By modelling the 10th, 50th, and 90th percentiles of freight cost, we gain distribution‑aware insights into shipping expenses:

  • The 10th‑percentile model supports budgeting for the cheapest bulk shipments, avoiding over‑provisioning.
  • The median (50th‑percentile) model predicts typical shipping costs for everyday planning.
  • The 90th‑percentile model prepares for high‑cost expedited or long‑distance shipments, ensuring financial buffers for peak scenarios.

These quantile forecasts give logistics and finance teams more precise cost estimates, helping them optimise rate negotiations, route planning, and working‑capital allocation amid demand and operational uncertainty.

Your opinion matters
Please write your valuable feedback about ProjectGurukul on Google | Facebook

ProjectGurukul Team

The ProjectGurukul Team delivers project-based tutorials on programming, machine learning, and web development. We simplify learning by providing hands-on projects to help you master real-world skills. We also provide free major and minor projects for enginering students.

Leave a Reply

Your email address will not be published. Required fields are marked *