Delivery Time Curve Prediction with Polynomial Regression in ML

FREE Online Courses: Click for Success, Learn for Free - Start Now!

Logistics managers and last‑mile operations teams need to forecast the delivery time (minutes) of packages—based on early indicators available at dispatch—so they can proactively adjust routing, staffing, and customer notifications. Historical delivery records show that time depends nonlinearly on factors such as pickup‑to‑dropoff distance, the number of stops on the route, traffic congestion level, time of day, and driver experience.

A simple linear model underestimates curvature (e.g., slowing returns on speed at longer distances) and fails to capture interactions (e.g., rush‑hour distance penalties). At the same time, an irregularised high‑degree polynomial overfits noise. By applying Polynomial Regression on engineered features with Ridge (ℓ²) regularisation, we can learn smooth, interpretable delivery‑time curves that generalize well to new routes and conditions.

Libraries Required

Purpose	Library
Data handling	pandas, numpy
Visualization	matplotlib, seaborn
ML pipeline	scikit‑learn → ColumnTransformer, StandardScaler, PolynomialFeatures, Pipeline
Regression model	Ridge
Model selection	train_test_split, GridSearchCV
Evaluation	mean_squared_error, r2_score

Dataset

Package Delivery Time

Step-by-Step Code Implementation

Import Libraries & Load Data

import pandas as pd
import numpy as np

# Load training data (after downloading and unzipping)
df = pd.read_csv("data/train.csv", parse_dates=["pickup_time","dropoff_time"])

# Preview key columns
df.head()[[
    "pickup_time","dropoff_time","distance_km",
    "num_stops","traffic_level","driver_experience_yrs"
]]

Feature Engineering & Target Creation

Generates squared and interaction terms (e.g., distance_km², distance_km×traffic_level_2) to model curvature and cross‑effects.;

# Compute delivery time in minutes
df["delivery_time_min"] = (df["dropoff_time"] - df["pickup_time"]) \
                            .dt.total_seconds() / 60

# Extract time‑of‑day as a categorical feature (hour)
df["hour_of_day"] = df["pickup_time"].dt.hour

# Select and clean features

# - distance_km: straight‑line distance
# - num_stops: number of scheduled stops before dropoff
# - traffic_level: categorical indicator (1=low,2=medium,3=high)
# - driver_experience_yrs: years of experience
# - hour_of_day: captures rush‑hour effects
features = ["distance_km","num_stops","traffic_level",
            "driver_experience_yrs","hour_of_day"]

df = df.dropna(subset=features + ["delivery_time_min"])
X = df[features]
y = df["delivery_time_min"]

Build a Polynomial Regression Pipeline

StandardScaler on numeric features ensures the Ridge penalty treats them uniformly.
OneHotEncoder on traffic_level and hour_of_day captures categorical effects without ordinality assumptions.

from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler, PolynomialFeatures, OneHotEncoder
from sklearn.compose import ColumnTransformer
from sklearn.linear_model import Ridge

# Separate numeric vs categorical
num_cols = ["distance_km","num_stops","driver_experience_yrs"]
cat_cols = ["traffic_level","hour_of_day"]

preprocessor = ColumnTransformer([
    ("num", StandardScaler(), num_cols),
    ("cat", OneHotEncoder(drop="first"), cat_cols)
])

pipe = Pipeline([
    ("prep", preprocessor),
    ("poly", PolynomialFeatures(include_bias=False)),
    ("ridge", Ridge(max_iter=20000, random_state=42))
])

Train/Test Split & Hyperparameter Search

Explores polynomial degrees 1–3 and α from 0.001 to 100.
5‑fold cross‑validation identifies the combination that minimises RMSE.
Applies ℓ² regularisation (controlled by alpha) to shrink noisy high‑order coefficients, preventing overfitting.

from sklearn.model_selection import train_test_split, GridSearchCV

# Temporal split isn’t critical here; random split suffices for cross‑sectional data
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

param_grid = {
    "poly__degree": [1, 2, 3],
    "ridge__alpha": np.logspace(-3, 2, 6)  # 0.001 → 100
}

gs = GridSearchCV(
    pipe, param_grid,
    cv=5,
    scoring="neg_root_mean_squared_error",
    n_jobs=-1, verbose=1
)
gs.fit(X_train, y_train)

print("Best degree   :", gs.best_params_["poly__degree"])
print("Best alpha    :", gs.best_params_["ridge__alpha"])

Evaluate Model

from sklearn.metrics import mean_squared_error, r2_score

y_pred = gs.predict(X_test)
rmse = mean_squared_error(y_test, y_pred, squared=False)
r2   = r2_score(y_test, y_pred)

print(f"Test RMSE       : {rmse:.2f} minutes")
print(f"Test R²         : {r2:.3f}")

Inspect Key Polynomial Coefficients

Inspecting the most significant coefficients reveals which nonlinear and interaction effects (such as a higher penalty for long-distance travel during high‑traffic hours) most influence delivery time predictions.

# Retrieve feature names after preprocessing & expansion
prep = gs.best_estimator_.named_steps["prep"]
num_features = num_cols
cat_features = prep.named_transformers_["cat"] \
                    .get_feature_names_out(cat_cols).tolist()
input_feats = num_features + cat_features

poly = gs.best_estimator_.named_steps["poly"]
feat_names = poly.get_feature_names_out(input_features=input_feats)

coefs = gs.best_estimator_.named_steps["ridge"].coef_
import pandas as pd
imp = pd.Series(coefs, index=feat_names) \
            .abs().sort_values(ascending=False).head(10)

import matplotlib.pyplot as plt
plt.figure(figsize=(8,5))
imp.plot(kind="barh")
plt.gca().invert_yaxis()
plt.title("Top Polynomial Features Driving Delivery Time")
plt.xlabel("Coefficient Magnitude")
plt.tight_layout()
plt.show()

Summary

This Polynomial Regression pipeline with Ridge regularisation provides:

Accurate, smooth modelling of delivery‑time dynamics, capturing nonlinear distance and traffic interactions.
Controlled complexity via grid‑searched polynomial degree and α, avoiding overfitting to outliers.
Interpretable insights through top polynomial features—guiding logistics teams on the most critical route‑ and time‑of‑day effects to manage for on‑time deliveries.

Did we exceed your expectations?
If Yes, share your valuable feedback on Google | Facebook