Delivery Time Prediction using Linear Regression in ML

FREE Online Courses: Knowledge Awaits – Click for Free Access!

Quick and reliable delivery-time estimates keep customers happy and help platforms schedule riders efficiently. Using the open Food Delivery Time dataset, we build a linear regression baseline that predicts the total minutes from order placement to drop-off for each shipment. A transparent line reveals first‑order drivers—distance, prep delay, rider ratings, weather, traffic—and gives every fancier model a hard benchmark to beat.

Libraries Required

pandas # data wrangling
numpy # numeric helpers
matplotlib.pyplot # sanity‑check visuals
scikit‑learn # preprocessing, model, metrics
joblib # save the trained pipeline

Dataset Link

Food Delivery Time Prediction

Step-by-Step Code Implementation

Why linear regression? Within normal operating ranges, delivery duration rises roughly linearly with distance and prep delay, while traffic or weather adds near‑constant penalties. A straight‑line fit exposes each driver’s minute‑per‑unit contribution and is trivial to explain to ops teams.

1. Import libraries

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import OneHotEncoder, StandardScaler
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score, mean_absolute_error
import joblib

2. Load the data

(Download the CSV first—and adjust the path.)

df = pd.read_csv("food_delivery_time.csv")
print(df.head())

3.  Minimal cleaning

# strip leading/trailing spaces in categorical columns
cat_cols_raw = ['Weather_conditions', 'Road_traffic_density',
                'Festival', 'City']
for col in cat_cols_raw:
    df[col] = df[col].str.strip()

# drop rows with missing critical fields
df = df.dropna(subset=['Order_Date', 'Time_Orderd',
                       'Time_Order_picked', 'Restaurant_latitude',
                       'Restaurant_longitude',
                       'Delivery_location_latitude',
                       'Delivery_location_longitude',
                       'Time_taken (min)'])

5.  Feature engineering

Preparation lag (order → pickup) is often the biggest uncertainty knob. Computing it in minutes decouples kitchen speed from travel speed so that the model can weigh them separately.
Haversine distance converts raw lat/long pairs into a single, physically meaningful kilometre feature—hugely predictive yet cheap to compute.`
Calendar cue (order_dayofweek) captures the mid-week lull and weekend spikes without requiring additional data feeds.

# ---- 3.4.1 Order → pickup delay (minutes) ----
dt_fmt = "%H:%M:%S"
df['prep_minutes'] = (
    pd.to_timedelta(df['Time_Order_picked']) -
    pd.to_timedelta(df['Time_Orderd'])
).dt.total_seconds() / 60.0

# ---- 3.4.2 Haversine distance (km) ----
def haversine(lat1, lon1, lat2, lon2):
    R = 6371  # Earth radius in km
    phi1, phi2 = np.radians(lat1), np.radians(lat2)
    dphi = np.radians(lat2 - lat1)
    dl   = np.radians(lon2 - lon1)
    a = np.sin(dphi/2)**2 + np.cos(phi1)*np.cos(phi2)*np.sin(dl/2)**2
    return 2*R*np.arcsin(np.sqrt(a))

df['distance_km'] = haversine(df['Restaurant_latitude'],
                              df['Restaurant_longitude'],
                              df['Delivery_location_latitude'],
                              df['Delivery_location_longitude'])

# ---- 3.4.3 Encode order day‑of‑week ----
df['order_dayofweek'] = pd.to_datetime(df['Order_Date']).dt.dayofweek

6. Define predictors & label

num_cols = ['Delivery_person_Age', 'Delivery_person_Ratings',
            'Vehicle_condition', 'multiple_deliveries',
            'prep_minutes', 'distance_km', 'order_dayofweek']

cat_cols = ['Weather_conditions', 'Road_traffic_density',
            'Festival', 'City']

target   = 'Time_taken (min)'

X = df[num_cols + cat_cols]
y = df[target]

7.  Pre‑processing & model pipeline

ColumnTransformer + Pipeline combines scaling, one-hot encoding, and the regressor into a single serializable object—eliminating the risk of misaligned preprocessing in production.

preproc = ColumnTransformer([
        ('cat', OneHotEncoder(handle_unknown='ignore'), cat_cols),
        ('num', StandardScaler(),                      num_cols)
])

linreg = LinearRegression()

pipe = Pipeline([
        ('prep',  preproc),
        ('model', linreg)
])

8. Train‑test split & training

X_train, X_test, y_train, y_test = train_test_split(
        X, y, test_size=0.2, random_state=42)

pipe.fit(X_train, y_train)

9.  Metrics chosen

R² shows variance explained; MAE speaks the language of operations (“our typical miss is ±4.3 min”).

y_pred = pipe.predict(X_test)
print(f"R²  : {r2_score(y_test, y_pred):.3f}")
print(f"MAE : {mean_absolute_error(y_test, y_pred):.1f} minutes")

10. Inspect top coefficients

Coefficient inspection instantly identifies bottlenecks: e.g., a +7-minute weight for High traffic or +5 minutes for rainy weather, guiding dispatch rules even before complex models are rolled out.

# recover encoded feature names
ohe_feats = pipe.named_steps['prep']\
                .named_transformers_['cat']\
                .get_feature_names_out(cat_cols)
feature_names = list(ohe_feats) + num_cols

coefs = pd.Series(pipe.named_steps['model'].coef_,
                  index=feature_names).sort_values()

print("\nFast‑delivery factors:")
print(coefs.head(8))
print("\nDelay‑inducing factors:")
print(coefs.tail(8))

11. Persist the trained pipeline

joblib.dump(pipe, "delivery_time_linreg.pkl")

Summary

In ~70 lines of Python, we turned raw order logs into an explainable delivery‑time estimator. The linear model delivers:

Actionable forecasts for customer ETAs and rider scheduling.
Crystal‑clear elasticity numbers that show how distance, kitchen prep, traffic, and weather tug on delivery duration.

Use this interpretable baseline as your compass; when you upgrade to gradient‑boosted trees or neural networks, you’ll know exactly how much extra predictive punch the complexity buys.

Your opinion matters
Please write your valuable feedback about ProjectGurukul on Google | Facebook