Shipping Rate Prediction using Linear Regression in ML

FREE Online Courses: Dive into Knowledge for Free. Learn More!

Freight forwarders and logistics start‑ups must quote door‑to‑door shipping rates in seconds, yet each quote depends on weight, distance band, transport mode, service speed, and fuel surcharges. Overpricing scares customers away, while underpricing erodes margins.

In this walkthrough, we create a linear‑regression baseline that predicts a shipment’s all‑in rate (USD) from routinely captured booking information: package weight, volumetric weight, origin‑to‑destination distance band, service tier (standard / express), transport mode (air / sea / truck), shipment type (parcel / pallet / container), declared value, and fuel‑price month.

A transparent linear model reveals first‑order cost drivers and serves as the yardstick before you move on to gradient‑boosted trees or network‑pricing engines.

Libraries Required

pandas # data wrangling
numpy # numerical helpers
matplotlib.pyplot # optional quick plots
scikit‑learn # preprocessing, model, metrics
joblib # save the trained pipeline

Dataset Link

Supply‑Chain Shipment Pricing Data

Step-by-Step Implementation

Why linear regression? Freight tariffs are typically a base fee plus additional surcharges for weight, distance, service speed, and special handling. OLS captures this additive logic and outputs coefficients that pricing managers can sanity‑check.

1. Import Libraries

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import OneHotEncoder, StandardScaler
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score, mean_absolute_error
import joblib`

2. Load the dataset

df = pd.read_csv("SCMS_Delivery_History_Dataset.csv")   # path after un‑zip
print(df.head())

Typical columns in Supply‑Chain Shipment Pricing Data

column	example
FreightCostUSD	2 145
Weight_kgs	780
Volume_m3	5.3
DistanceGroup	150‑500 km
TransportMode	Air / Road / Sea
ServiceLevel	Express / Standard
ShipmentType	Parcel / Pallet / Container
DeclaredValueUSD	54 000
FuelPriceIndex	0.87

3. Minimal cleaning & feature lists

Standard scaling on numeric inputs puts kilograms, cubic metres, declared value, and fuel index on comparable variance, so coefficients are read as dollars per 1 σ shift.

core = ['FreightCostUSD','Weight_kgs','Volume_m3','DistanceGroup',
        'TransportMode','ServiceLevel','ShipmentType',
        'DeclaredValueUSD','FuelPriceIndex']
df   = df.dropna(subset=core).copy()

num_cols = ['Weight_kgs','Volume_m3','DeclaredValueUSD','FuelPriceIndex']
cat_cols = ['DistanceGroup','TransportMode','ServiceLevel','ShipmentType']
target   = 'FreightCostUSD'

X = df[num_cols + cat_cols]
y = df[target]

4. Pre‑processing & model pipeline

One-hot encoding ensures that there is no fake numeric order between service tiers or distance buckets, while assigning each its own $- offset.

preproc = ColumnTransformer([
        ('cat', OneHotEncoder(handle_unknown='ignore'), cat_cols),
        ('num', StandardScaler(),                      num_cols)
])

linreg = LinearRegression()

pipe = Pipeline([
        ('prep',  preproc),
        ('model', linreg)
])

5. Train‑test split & training

X_train, X_test, y_train, y_test = train_test_split(
        X, y, test_size=0.2, random_state=42, shuffle=True)

pipe.fit(X_train, y_train)

6. Evaluation

Performance metrics – R² indicates the proportion of cost variance that the simple formula captures; MAE (e.g., ±$180) informs sales teams about their typical quoting error band.

y_pred = pipe.predict(X_test)
print(f"R²  : {r2_score(y_test, y_pred):.3f}")
print(f"MAE : ${mean_absolute_error(y_test, y_pred):,.0f} per shipment")

7. Interpret rate drivers

Coefficient table instantly surfaces high‑impact levers: e.g., TransportMode_Air might add $420, while DistanceGroup_0‑150 km subtracts $115—direct input for discount matrices.

ohe_feats = pipe.named_steps['prep']\
                .named_transformers_['cat']\
                .get_feature_names_out(cat_cols)

all_feats = list(ohe_feats) + num_cols
coef = (pd.Series(pipe.named_steps['model'].coef_, index=all_feats)
        .sort_values())

print("\nCost‑reducing factors (negative coefficients):")
print(coef.head(8))
print("\nCost‑increasing factors (positive coefficients):")
print(coef.tail(8))

Numeric coefficients are expressed as $/shipment for a one σ change; one‑hot coefficients are $‑offsets vs the reference category.

8. Persist the trained pipeline

Joblib persistence freezes preprocessing and coefficients together; tomorrow’s quoting API can joblib.load(“shipping_rate_linreg.pkl”), feed in a JSON of booking details, and return a live price in milliseconds.

joblib.dump(pipe, "shipping_rate_linreg.pkl")

Summary

With ~120 lines of Python, we’ve turned raw shipment logs into an explainable shipping‑rate engine:

Instant, data‑backed quotes for sales reps and self‑serve booking portals.
Crystal‑clear surcharges & discounts that reveal precisely how weight, distance, mode, and service tier tug cost up or down.

This transparent linear baseline is your benchmark—every boosted tree, neural tariff model, or optimisation engine you test next must beat its MAE while still telling a story that the pricing team can trust.

Did you like this article? If Yes, please give ProjectGurukul 5 Stars on Google | Facebook