Food Order Time Prediction using Linear Regression in ML

FREE Online Courses: Elevate Skills, Zero Cost. Enroll Now!

Hungry customers expect their meals to arrive when the app promises they will. If the promised time and the actual time drift apart, satisfaction (and tips) plummet. In this hands‑on project, we build a linear‑regression baseline that predicts the total minutes from order placement to customer hand‑off—the order time—using only information known the instant a ticket is confirmed: rider age, rider rating, vehicle condition, order distance, kitchen prep delay, traffic, weather, and city type. While modern platforms rely on gradient-boosted trees or deep networks, a transparent linear fit reveals first-order levers that speed up or slow down service, and provides a benchmark for any future model to beat.

Libraries Required

pandas # tidy data handling
numpy # numerical helpers
matplotlib.pyplot # quick sanity plots
scikit‑learn # preprocessing, model, metrics
joblib # persist the trained pipeline

Dataset Link

Food Delivery Time Prediction

Step-by-Step Code Implementation

Why linear regression? Order‑to‑door time grows almost linearly with physical distance and kitchen prep delay; traffic or weather usually tack on near‑fixed penalties. The straight‑line fit quantifies each lever in plain minutes.

1. Import Libraries

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import OneHotEncoder, StandardScaler
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score, mean_absolute_error
import joblib

2.  Load the data

Download the CSV from Kaggle and point to its path:

df = pd.read_csv("food_delivery_time.csv")       # file name after unzip
print(df.head())

3.  Minimal cleaning & de‑spurring

# strip stray spaces in categoricals
for col in ['Weather_conditions', 'Road_traffic_density',
'Festival', 'City']:
df[col] = df[col].str.strip()
# drop rows missing critical fields
df = df.dropna(subset=['Time_Orderd', 'Time_Order_picked',
'Delivery_location_latitude',
'Delivery_location_longitude',
'Restaurant_latitude',
'Restaurant_longitude',
'Time_taken (min)'])

4. Feature engineering

Calendar cue catches weekend surges without requiring an external holiday calendar.

# 3.4.1 kitchen prep delay in minutes
df['prep_min'] = (
pd.to_timedelta(df['Time_Order_picked']) -
pd.to_timedelta(df['Time_Orderd'])	
).dt.total_seconds() / 60.0
# 3.4.2 physical distance (haversine)
def hav(lat1, lon1, lat2, lon2):
R = 6371
phi1, phi2 = np.radians(lat1), np.radians(lat2)
dphi = np.radians(lat2 - lat1)
dlam = np.radians(lon2 - lon1)
a = np.sin(dphi/2)**2 + np.cos(phi1)*np.cos(phi2)*np.sin(dlam/2)**2
return 2 * R * np.arcsin(np.sqrt(a))
df['distance_km'] = hav(df['Restaurant_latitude'],
df['Restaurant_longitude'],
df['Delivery_location_latitude'],
df['Delivery_location_longitude'])
# 3.4.3 calendar cue – day of week
df['order_day'] = pd.to_datetime(df['Order_Date']).dt.dayofweek

5. Select predictors & label

Standardising numerics places age, distance, and prep delay on equal footing, so coefficients read as minutes per one‑σ shift—handy for operations teams.

num_cols = ['Delivery_person_Age', 'Delivery_person_Ratings',
            'Vehicle_condition', 'multiple_deliveries',
            'prep_min', 'distance_km', 'order_day']

cat_cols = ['Weather_conditions', 'Road_traffic_density',
            'Festival', 'City']

target = 'Time_taken (min)'

X = df[num_cols + cat_cols]
y = df[target]

6.  Pre‑processing & model pipeline

One-hot encoding treats weather or traffic labels as pure categories, ensuring no false numeric hierarchy sneaks in.

pre = ColumnTransformer([
        ('cat', OneHotEncoder(handle_unknown='ignore'), cat_cols),
        ('num', StandardScaler(),                      num_cols)
])

lin = LinearRegression()

pipe = Pipeline([
        ('prep',  pre),
        ('model', lin)
])

7.  Train‑test split & training

X_train, X_test, y_train, y_test = train_test_split(
        X, y, test_size=0.2, random_state=42)

pipe.fit(X_train, y_train)

9. Evaluation metrics

R² indicates the variance explained, and MAE in minutes informs product managers of the typical ETA error (e.g., ±4.2 minutes).

y_pred = pipe.predict(X_test)
print(f"R²  : {r2_score(y_test, y_pred):.3f}")
print(f"MAE : {mean_absolute_error(y_test, y_pred):.1f} minutes")

10. Inspect influential features

The coefficient table instantly reveals big hitters: a Rainy flag adding ~5 minutes or a High-traffic flag adding ~7 can significantly impact rider allocation and promo timing.

# grab names after encoding
ohe_names = pipe.named_steps['prep'].named_transformers_['cat']\
                        .get_feature_names_out(cat_cols)
all_feats = list(ohe_names) + num_cols

coef = pd.Series(pipe.named_steps['model'].coef_, index=all_feats)\
           .sort_values()

print("\nSpeed‑boost factors (negative coefficients):")
print(coef.head(8))
print("\nDelay drivers (positive coefficients):")
print(coef.tail(8))

11. Persist the pipeline

Joblib persistence packages the scaler, encoder, and regression weights so that tomorrow’s API can load .pkl files, score fresh orders, and push accurate ETAs in milliseconds.

joblib.dump(pipe, "food_order_time_linreg.pkl")

  Summary

In under a hundred lines, we transformed raw order logs into an explainable food‑order time predictor. The model:

Delivers on-the-spot ETAs that front‑end apps can surface with confidence.
Spots operational levers-distance, prep speed, traffic, weather—that teams can tackle to trim delays.

Hold this interpretable baseline as your yardstick; when you upgrade to boosted trees or sequence models, you’ll know precisely how much real‑world accuracy the complexity buys.

You give me 15 seconds I promise you best tutorials
Please share your happy experience on Google | Facebook

Food Order Time Prediction using Linear Regression in ML

Libraries Required

Dataset Link

Step-by-Step Code Implementation

1. Import Libraries

2. Load the data

3. Minimal cleaning & de‑spurring

4. Feature engineering

5. Select predictors & label

6. Pre‑processing & model pipeline

7. Train‑test split & training

9. Evaluation metrics

10. Inspect influential features

11. Persist the pipeline

Summary

Leave a Reply Cancel reply

Libraries Required

2.  Load the data

3.  Minimal cleaning & de‑spurring

6.  Pre‑processing & model pipeline

7.  Train‑test split & training

  Summary