Delivery Time Curve Prediction with Polynomial Regression in ML

FREE Online Courses: Dive into Knowledge for Free. Learn More!

Logistics managers and last‑mile operations teams need to forecast the delivery time (minutes) of packages—based on early indicators available at dispatch—so they can proactively adjust routing, staffing, and customer notifications. Historical delivery records show that time depends nonlinearly on factors such as pickup‑to‑dropoff distance, the number of stops on the route, traffic congestion level, time of day, and driver experience.

A simple linear model underestimates curvature (e.g., slowing returns on speed at longer distances) and fails to capture interactions (e.g., rush‑hour distance penalties). At the same time, an irregularised high‑degree polynomial overfits noise. By applying Polynomial Regression on engineered features with Ridge (ℓ²) regularisation, we can learn smooth, interpretable delivery‑time curves that generalize well to new routes and conditions.

Libraries Required

Purpose Library
Data handling pandas, numpy
Visualization matplotlib, seaborn
ML pipeline scikit‑learnColumnTransformer, StandardScaler, PolynomialFeatures, Pipeline
Regression model Ridge
Model selection train_test_split, GridSearchCV
Evaluation mean_squared_error, r2_score

Dataset

Package Delivery Time

Step-by-Step Code Implementation

Import Libraries & Load Data

import pandas as pd
import numpy as np

# Load training data (after downloading and unzipping)
df = pd.read_csv("data/train.csv", parse_dates=["pickup_time","dropoff_time"])

# Preview key columns
df.head()[[
    "pickup_time","dropoff_time","distance_km",
    "num_stops","traffic_level","driver_experience_yrs"
]]

Feature Engineering & Target Creation

Generates squared and interaction terms (e.g., distance_km², distance_km×traffic_level_2) to model curvature and cross‑effects.;

# Compute delivery time in minutes
df["delivery_time_min"] = (df["dropoff_time"] - df["pickup_time"]) \
                            .dt.total_seconds() / 60

# Extract time‑of‑day as a categorical feature (hour)
df["hour_of_day"] = df["pickup_time"].dt.hour

# Select and clean features

# - distance_km: straight‑line distance
# - num_stops: number of scheduled stops before dropoff
# - traffic_level: categorical indicator (1=low,2=medium,3=high)
# - driver_experience_yrs: years of experience
# - hour_of_day: captures rush‑hour effects
features = ["distance_km","num_stops","traffic_level",
            "driver_experience_yrs","hour_of_day"]

df = df.dropna(subset=features + ["delivery_time_min"])
X = df[features]
y = df["delivery_time_min"]

Build a Polynomial Regression Pipeline

  • StandardScaler on numeric features ensures the Ridge penalty treats them uniformly.
  • OneHotEncoder on traffic_level and hour_of_day captures categorical effects without ordinality assumptions.
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler, PolynomialFeatures, OneHotEncoder
from sklearn.compose import ColumnTransformer
from sklearn.linear_model import Ridge

# Separate numeric vs categorical
num_cols = ["distance_km","num_stops","driver_experience_yrs"]
cat_cols = ["traffic_level","hour_of_day"]

preprocessor = ColumnTransformer([
    ("num", StandardScaler(), num_cols),
    ("cat", OneHotEncoder(drop="first"), cat_cols)
])

pipe = Pipeline([
    ("prep", preprocessor),
    ("poly", PolynomialFeatures(include_bias=False)),
    ("ridge", Ridge(max_iter=20000, random_state=42))
])

Train/Test Split & Hyperparameter Search

  • Explores polynomial degrees 1–3 and α from 0.001 to 100.
  • 5‑fold cross‑validation identifies the combination that minimises RMSE.
  • Applies ℓ² regularisation (controlled by alpha) to shrink noisy high‑order coefficients, preventing overfitting.
from sklearn.model_selection import train_test_split, GridSearchCV

# Temporal split isn’t critical here; random split suffices for cross‑sectional data
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

param_grid = {
    "poly__degree": [1, 2, 3],
    "ridge__alpha": np.logspace(-3, 2, 6)  # 0.001 → 100
}

gs = GridSearchCV(
    pipe, param_grid,
    cv=5,
    scoring="neg_root_mean_squared_error",
    n_jobs=-1, verbose=1
)
gs.fit(X_train, y_train)

print("Best degree   :", gs.best_params_["poly__degree"])
print("Best alpha    :", gs.best_params_["ridge__alpha"])

Evaluate Model

from sklearn.metrics import mean_squared_error, r2_score

y_pred = gs.predict(X_test)
rmse = mean_squared_error(y_test, y_pred, squared=False)
r2   = r2_score(y_test, y_pred)

print(f"Test RMSE       : {rmse:.2f} minutes")
print(f"Test R²         : {r2:.3f}")

Inspect Key Polynomial Coefficients

Inspecting the most significant coefficients reveals which nonlinear and interaction effects (such as a higher penalty for long-distance travel during high‑traffic hours) most influence delivery time predictions.

# Retrieve feature names after preprocessing & expansion
prep = gs.best_estimator_.named_steps["prep"]
num_features = num_cols
cat_features = prep.named_transformers_["cat"] \
                    .get_feature_names_out(cat_cols).tolist()
input_feats = num_features + cat_features

poly = gs.best_estimator_.named_steps["poly"]
feat_names = poly.get_feature_names_out(input_features=input_feats)

coefs = gs.best_estimator_.named_steps["ridge"].coef_
import pandas as pd
imp = pd.Series(coefs, index=feat_names) \
            .abs().sort_values(ascending=False).head(10)

import matplotlib.pyplot as plt
plt.figure(figsize=(8,5))
imp.plot(kind="barh")
plt.gca().invert_yaxis()
plt.title("Top Polynomial Features Driving Delivery Time")
plt.xlabel("Coefficient Magnitude")
plt.tight_layout()
plt.show()

Summary

This Polynomial Regression pipeline with Ridge regularisation provides:

  • Accurate, smooth modelling of delivery‑time dynamics, capturing nonlinear distance and traffic interactions.
  • Controlled complexity via grid‑searched polynomial degree and α, avoiding overfitting to outliers.
  • Interpretable insights through top polynomial features—guiding logistics teams on the most critical route‑ and time‑of‑day effects to manage for on‑time deliveries.

Did you know we work 24x7 to provide you best tutorials
Please encourage us - write a review on Google | Facebook

ProjectGurukul Team

The ProjectGurukul Team delivers project-based tutorials on programming, machine learning, and web development. We simplify learning by providing hands-on projects to help you master real-world skills. We also provide free major and minor projects for enginering students.

Leave a Reply

Your email address will not be published. Required fields are marked *