Urban Delivery Time Prediction using Quantile Regression in ML

FREE Online Courses: Click for Success, Learn for Free - Start Now!

Logistics managers need to understand not just the average package delivery time but also the range of possible outcomes—anticipating both fast deliveries (10th percentile) and slow, delayed ones (90th percentile).

In this project, we’ll predict the 10th, 50th, and 90th percentiles of last‑mile delivery time (in minutes) using features such as pickup–dropoff distance, package weight, vehicle type, traffic level, and time of day.

By fitting separate quantile regression models, we’ll uncover how each factor influences light, typical, and heavy‑delay scenarios—enabling planners to set realistic SLAs, provision buffer time, and optimize routing under uncertainty.

Libraries Required

import pandas as pd  
import numpy as np  
import statsmodels.formula.api as smf       # Quantile regression via formula API  
from sklearn.model_selection import train_test_split  
from sklearn.metrics import mean_pinball_loss  # Proper loss for quantile forecasts  

Dataset

Package Delivery Time

Step-by-Step Code Implementation

Load & Inspect Data

We load a city‐scale last‑mile delivery dataset—including pickup–dropoff distance (distance_km), package weight, vehicle type (van/truck), traffic congestion level (1–5), pickup hour, and actual delivery_time_min—and inspect its shape and summary statistics to understand central tendency and skew.

# Load the Package Delivery Time dataset :contentReference[oaicite:0]{index=0}
# CSV contains: delivery_id, distance_km, weight_kg, vehicle_type, 
#               traffic_level (1–5), pickup_hour (0–23), delivery_time_min
df = pd.read_csv("package_delivery_time.csv")

print(df.head())
print(df.info())
print(df['delivery_time_min'].describe())

Preprocessing & Feature Engineering

  • We remove any incomplete records to maintain modelling integrity.
  • We one‑hot encode vehicle_type to capture mode effects.
  • We rename delivery_time_min to DeliveryTime and assemble our predictor list: four continuous/ordinal features plus vehicle‐type dummies.
# Drop any rows missing key fields
df = df.dropna(subset=[
    'distance_km','weight_kg','vehicle_type',
    'traffic_level','pickup_hour','delivery_time_min'
])

# One‑hot encode categorical features
df = pd.get_dummies(df, 
    columns=['vehicle_type'], 
    drop_first=True
)

# Define predictors and response
features = [
    'distance_km','weight_kg','traffic_level','pickup_hour'
] + [c for c in df.columns if c.startswith('vehicle_type_')]

df = df.rename(columns={'delivery_time_min':'DeliveryTime'})

Train/Test Split

An 80/20 random split holds out 20% of deliveries for out‑of‑sample evaluation, ensuring our quantile regression models generalise to new routes and traffic conditions.

# Reserve 20% of deliveries for out‑of‑sample evaluation
train, test = train_test_split(
    df[features + ['DeliveryTime']],
    test_size=0.2,
    random_state=42
)

Fit Quantile Regression Models

For each target percentile (10th, 50th, 90th):

  • We build a formula string (e.g. “DeliveryTime ~ distance_km + weight_kg + …”).
  • We fit a QuantReg model at that quantile on the training set.
  • We print the coefficient table (.tables[1]), which shows how each feature’s marginal effect varies across fast, typical, and slow delivery scenarios (e.g., a traffic level may add a small delay at the 10th percentile but a much larger delay at the 90th).
quantiles = [0.10, 0.50, 0.90]
models    = {}
formula   = "DeliveryTime ~ " + " + ".join(features)

for q in quantiles:
    mod = smf.quantreg(formula, train)
    res = mod.fit(q=q)
    models[q] = res
    print(f"\n--- {int(q*100)}th Percentile Coefficients ---")
    print(res.summary().tables[1])   # coefficient estimates only

Evaluation with Pinball Loss

  • We generate quantile‑specific delivery time forecasts on the test set.
  • We compute pinball loss for each quantile—a loss function tailored to quantile regression that asymmetrically penalises under‑ and over‑predictions. Lower pinball loss indicates better‐calibrated, distribution‑aware models.
for q, res in models.items():
    preds = res.predict(test[features])
    loss  = mean_pinball_loss(test['DeliveryTime'], preds, alpha=q)
    print(f"{int(q*100)}th percentile pinball loss: {loss:.2f}")

Summary

By applying quantile regression to urban delivery data, we obtain distribution‑aware forecasts:

  • The 10th‑percentile model supports optimistic SLA settings, anticipating fast, consolidated deliveries.
  • The median (50th‑percentile) model informs typical delivery performance, guiding day‑to‑day scheduling.
  • The 90th‑percentile model accounts for worst‑case delays, providing buffer time and reserve capacity for high-traffic or heavy-load scenarios.

These quantile forecasts empower logistics planners with nuanced insights—optimising route design, staffing, and customer expectations under variable urban conditions.

Did you like our efforts? If Yes, please give ProjectGurukul 5 Stars on Google | Facebook

ProjectGurukul Team

The ProjectGurukul Team delivers project-based tutorials on programming, machine learning, and web development. We simplify learning by providing hands-on projects to help you master real-world skills. We also provide free major and minor projects for enginering students.

Leave a Reply

Your email address will not be published. Required fields are marked *