Meal Preparation Time Prediction with Polynomial Regression in ML

We offer you a brighter future with FREE online courses - Start Now!!

Home‑cooking apps and meal‑kit services need to estimate a recipe’s total preparation time (minutes) before users begin cooking, based on high‑level recipe attributes such as the number of ingredients, total ingredient weight, number of steps, cuisine type, and number of servings. Empirical analysis shows that prep time grows nonlinearly with ingredient count and step complexity, and interacts with cuisine (e.g., multi‑component dishes). A simple linear model underestimates these curvatures, while an unconstrained high‑degree polynomial overfits to idiosyncratic recipes. By applying Polynomial Regression to a carefully engineered feature set with Ridge (ℓ²) regularisation, we can learn a smooth, interpretable mapping from recipe metadata to prep time, enabling better user guidance and kitchen staffing forecasts.

Dataset

Food.com Recipes and reviews 

Recipes dataset with images 

Step-by-Step Code Implementation

1. Libraries Required

import pandas as pd                   # data loading & handling  
import numpy as np                    # numerical operations  

import matplotlib.pyplot as plt       # plotting  
import seaborn as sns                 # enhanced visualization  

from sklearn.model_selection import train_test_split, GridSearchCV  
from sklearn.preprocessing import StandardScaler, PolynomialFeatures, OneHotEncoder  
from sklearn.compose import ColumnTransformer  
from sklearn.linear_model import Ridge  
from sklearn.pipeline import Pipeline  
from sklearn.metrics import mean_squared_error, r2_score

2. Load Data & Inspect

import pandas as pd

df = pd.read_csv("data/recipes.csv")  
# Relevant columns include: 'prep_time', 'n_ingredients', 'n_steps', 'cuisine', 'servings'
df = df.dropna(subset=['prep_time','n_ingredients','n_steps','cuisine','servings'])
print(df[['prep_time','n_ingredients','n_steps','cuisine','servings']].head())

3. Feature Engineering & EDA

import seaborn as sns
import matplotlib.pyplot as plt

# Visualize nonlinear relationship
sns.scatterplot(x='n_ingredients', y='prep_time', data=df.sample(5000), alpha=0.3)
plt.title("Ingredients vs Prep Time")
plt.xlabel("Number of Ingredients")
plt.ylabel("Preparation Time (min)")
plt.show()

4. Define Features & Target

Expands inputs into squares and interactions (e.g., n_ingredients², n_steps × servings, cuisine_Italian × n_steps), modeling nonlinear prep‑time drivers.

# Target
y = df['prep_time']

# Features
X = df[['n_ingredients','n_steps','servings','cuisine']]

# Identify categorical and numeric
cat_cols = ['cuisine']
num_cols = ['n_ingredients','n_steps','servings']

5. Build a Polynomial Regression Pipeline

  • StandardScaler z‑scores numeric features (n_ingredients, n_steps, servings) so the Ridge penalty treats them equally.
  • OneHotEncoder converts cuisine into binary flags, capturing cuisine‑specific complexity.
  • Applies an ℓ² penalty (alpha) to shrink noisy high‑order coefficients, preventing overfitting to outlier recipes.
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler, PolynomialFeatures, OneHotEncoder
from sklearn.compose import ColumnTransformer
from sklearn.linear_model import Ridge

preprocessor = ColumnTransformer([
    ('num', StandardScaler(), num_cols),
    ('cat', OneHotEncoder(drop='first'), cat_cols)
])

pipe = Pipeline([
    ('prep', preprocessor),
    ('poly', PolynomialFeatures(include_bias=False)),
    ('ridge', Ridge(random_state=42))
])

6. Train/Test Split & Hyperparameter Search

  • degree controls maximum polynomial order (1 = linear up to 3 = cubic).
  • alpha scales regularisation strength (10⁻³…10³).
  • GridSearchCV uses a 5‑fold CV to minimize RMSE and select the best combination.
from sklearn.model_selection import train_test_split, GridSearchCV
import numpy as np

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

param_grid = {
    'poly__degree': [1, 2, 3],
    'ridge__alpha': np.logspace(-3, 3, 7)
}

gs = GridSearchCV(
    pipe, param_grid,
    cv=5,
    scoring='neg_root_mean_squared_error',
    n_jobs=-1, verbose=1
)
gs.fit(X_train, y_train)

print("Best degree:", gs.best_params_['poly__degree'])
print("Best alpha :", gs.best_params_['ridge__alpha'])

7. Evaluate Model

from sklearn.metrics import mean_squared_error, r2_score

y_pred = gs.predict(X_test)
rmse = mean_squared_error(y_test, y_pred, squared=False)
r2   = r2_score(y_test, y_pred)

print(f"Test RMSE: {rmse:.2f} minutes")
print(f"Test R²  : {r2:.3f}")

8. Inspect Key Polynomial Coefficients

The most significant absolute coefficients—such as squared ingredient counts or interactions between steps and cuisine—highlight the key nonlinear factors that most extend preparation times.

# Retrieve feature names post-expansion
prep = gs.best_estimator_.named_steps['prep']
num_feats = num_cols
cat_feats = prep.named_transformers_['cat'].get_feature_names_out(cat_cols).tolist()
input_feats = num_feats + cat_feats

poly = gs.best_estimator_.named_steps['poly']
feat_names = poly.get_feature_names_out(input_features=input_feats)
coefs = gs.best_estimator_.named_steps['ridge'].coef_

import pandas as pd
imp = pd.Series(coefs, index=feat_names).abs().sort_values(ascending=False).head(10)

import matplotlib.pyplot as plt
plt.figure(figsize=(8,5))
imp.plot(kind='barh')
plt.gca().invert_yaxis()
plt.title("Top Polynomial Features Driving Prep Time")
plt.xlabel("Coefficient Magnitude")
plt.tight_layout()
plt.show()

Summary

This Polynomial Regression pipeline with Ridge regularisation delivers a robust, interpretable model for predicting meal prep times:

  1. Accurately captures nonlinear effects of ingredient count, step complexity, and cuisine on prep time.
  2. Controls model complexity through grid‑searched polynomial degree and ℓ² penalty, achieving low RMSE and high R².
  3. Provides actionable insights: top polynomial features guide recipe design and user expectations—e.g., adding many small steps in “baking” cuisines greatly increases prep times.

Did you know we work 24x7 to provide you best tutorials
Please encourage us - write a review on Google | Facebook

ProjectGurukul Team

ProjectGurukul Team specializes in creating project-based learning resources for programming, Java, Python, Android, AI, Webdevelopment and machine learning. Our mission is to help learners build practical skills through engaging, hands-on projects. We also offer free major and minor projects with source code for engineering students

Leave a Reply

Your email address will not be published. Required fields are marked *