Food Order Time Prediction using Linear Regression in ML
FREE Online Courses: Elevate Skills, Zero Cost. Enroll Now!
Hungry customers expect their meals to arrive when the app promises they will. If the promised time and the actual time drift apart, satisfaction (and tips) plummet. In this hands‑on project, we build a linear‑regression baseline that predicts the total minutes from order placement to customer hand‑off—the order time—using only information known the instant a ticket is confirmed: rider age, rider rating, vehicle condition, order distance, kitchen prep delay, traffic, weather, and city type. While modern platforms rely on gradient-boosted trees or deep networks, a transparent linear fit reveals first-order levers that speed up or slow down service, and provides a benchmark for any future model to beat.
Libraries Required
- pandas # tidy data handling
- numpy # numerical helpers
- matplotlib.pyplot # quick sanity plots
- scikit‑learn # preprocessing, model, metrics
- joblib # persist the trained pipeline
Dataset Link
Step-by-Step Code Implementation
Why linear regression? Order‑to‑door time grows almost linearly with physical distance and kitchen prep delay; traffic or weather usually tack on near‑fixed penalties. The straight‑line fit quantifies each lever in plain minutes.
1. Import Libraries
import pandas as pd import numpy as np import matplotlib.pyplot as plt from sklearn.model_selection import train_test_split from sklearn.preprocessing import OneHotEncoder, StandardScaler from sklearn.compose import ColumnTransformer from sklearn.pipeline import Pipeline from sklearn.linear_model import LinearRegression from sklearn.metrics import r2_score, mean_absolute_error import joblib
2. Load the data
Download the CSV from Kaggle and point to its path:
df = pd.read_csv("food_delivery_time.csv") # file name after unzip
print(df.head())
3. Minimal cleaning & de‑spurring
# strip stray spaces in categoricals for col in ['Weather_conditions', 'Road_traffic_density', 'Festival', 'City']: df[col] = df[col].str.strip() # drop rows missing critical fields df = df.dropna(subset=['Time_Orderd', 'Time_Order_picked', 'Delivery_location_latitude', 'Delivery_location_longitude', 'Restaurant_latitude', 'Restaurant_longitude', 'Time_taken (min)'])
4. Feature engineering
Calendar cue catches weekend surges without requiring an external holiday calendar.
# 3.4.1 kitchen prep delay in minutes df['prep_min'] = ( pd.to_timedelta(df['Time_Order_picked']) - pd.to_timedelta(df['Time_Orderd']) ).dt.total_seconds() / 60.0 # 3.4.2 physical distance (haversine) def hav(lat1, lon1, lat2, lon2): R = 6371 phi1, phi2 = np.radians(lat1), np.radians(lat2) dphi = np.radians(lat2 - lat1) dlam = np.radians(lon2 - lon1) a = np.sin(dphi/2)**2 + np.cos(phi1)*np.cos(phi2)*np.sin(dlam/2)**2 return 2 * R * np.arcsin(np.sqrt(a)) df['distance_km'] = hav(df['Restaurant_latitude'], df['Restaurant_longitude'], df['Delivery_location_latitude'], df['Delivery_location_longitude']) # 3.4.3 calendar cue – day of week df['order_day'] = pd.to_datetime(df['Order_Date']).dt.dayofweek
5. Select predictors & label
Standardising numerics places age, distance, and prep delay on equal footing, so coefficients read as minutes per one‑σ shift—handy for operations teams.
num_cols = ['Delivery_person_Age', 'Delivery_person_Ratings',
'Vehicle_condition', 'multiple_deliveries',
'prep_min', 'distance_km', 'order_day']
cat_cols = ['Weather_conditions', 'Road_traffic_density',
'Festival', 'City']
target = 'Time_taken (min)'
X = df[num_cols + cat_cols]
y = df[target]
6. Pre‑processing & model pipeline
One-hot encoding treats weather or traffic labels as pure categories, ensuring no false numeric hierarchy sneaks in.
pre = ColumnTransformer([
('cat', OneHotEncoder(handle_unknown='ignore'), cat_cols),
('num', StandardScaler(), num_cols)
])
lin = LinearRegression()
pipe = Pipeline([
('prep', pre),
('model', lin)
])
7. Train‑test split & training
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42)
pipe.fit(X_train, y_train)
9. Evaluation metrics
R² indicates the variance explained, and MAE in minutes informs product managers of the typical ETA error (e.g., ±4.2 minutes).
y_pred = pipe.predict(X_test)
print(f"R² : {r2_score(y_test, y_pred):.3f}")
print(f"MAE : {mean_absolute_error(y_test, y_pred):.1f} minutes")
10. Inspect influential features
The coefficient table instantly reveals big hitters: a Rainy flag adding ~5 minutes or a High-traffic flag adding ~7 can significantly impact rider allocation and promo timing.
# grab names after encoding
ohe_names = pipe.named_steps['prep'].named_transformers_['cat']\
.get_feature_names_out(cat_cols)
all_feats = list(ohe_names) + num_cols
coef = pd.Series(pipe.named_steps['model'].coef_, index=all_feats)\
.sort_values()
print("\nSpeed‑boost factors (negative coefficients):")
print(coef.head(8))
print("\nDelay drivers (positive coefficients):")
print(coef.tail(8))
11. Persist the pipeline
Joblib persistence packages the scaler, encoder, and regression weights so that tomorrow’s API can load .pkl files, score fresh orders, and push accurate ETAs in milliseconds.
joblib.dump(pipe, "food_order_time_linreg.pkl")
Summary
In under a hundred lines, we transformed raw order logs into an explainable food‑order time predictor. The model:
- Delivers on-the-spot ETAs that front‑end apps can surface with confidence.
- Spots operational levers-distance, prep speed, traffic, weather—that teams can tackle to trim delays.
Hold this interpretable baseline as your yardstick; when you upgrade to boosted trees or sequence models, you’ll know precisely how much real‑world accuracy the complexity buys.