Restaurant Profit Prediction using Linear Regression in ML

FREE Online Courses: Click, Learn, Succeed, Start Now!

Independent owners and franchise groups alike ask the same question before expanding or redesigning a concept:

“Given a restaurant’s location, concept type, opening age and local demographics, how much net profit will it make in its next full fiscal year?”

Being able to forecast annual profit (USD) early allows investors to size loans, landlords to negotiate leases, and operators to plan headcount. Here we build a linear‑regression baseline that predicts a unit’s profit from readily available data:

  • restaurant age (years since opening)
  • city and city‑group size (big city / other)
  • concept type (food‑court / inline / drive‑thru / mobile)
  • 37 anonymised site variables supplied by the franchisor (P1 … P37)
  • a simple industry‑average margin to convert revenue into profit.

The dataset comes from the classic “Restaurant Revenue Prediction” Kaggle competition. It provides yearly revenue; we assume a conservative 15 % net margin to obtain an approximate profit target.

Libraries Required

  • pandas # tabular wrangling
  • numpy # numerical helpers
  • matplotlib.pyplot # quick sanity plots
  • scikit‑learn # preprocessing, model, metrics
  • joblib # persist the trained pipeline

Dataset Link

Restaurant Revenue Prediction Dataset

Step-by-Step Code Implementation

Why linear regression? Restaurant EBIT often follows an additive recipe: baseline margin on sales + uplifts (prime location, upscale concept) – penalties (ageing décor, poor demographics). OLS captures those additive effects and yields coefficients managers can sanity‑check.

1. Import Libraries

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from datetime import datetime

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import OneHotEncoder, StandardScaler
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score, mean_absolute_error
import joblib

2. Load data & quick look

df = pd.read_csv("restaurant_revenue_train.csv")  # file from Kaggle
print(df.head())

Key columns in the file

column description
Open Date day the unit opened
City literal city name
City Group Big Cities / Other
Type FC (food‑court) / IL (inline) / DT (drive‑thru) / MB (mobile)
P1P37 anonymised site metrics
revenue yearly revenue (obfuscated scale)

3. Convert revenue → profit & feature engineer age

  • Profit derivation – We multiply the revenue target by an industry‑average 15 % net margin. If actual margin data becomes available per unit, swap in that column and retrain; the pipeline stays identical.
  • Standard scaling puts P‑features and age on comparable variance, so coefficients read as $ profit per 1 σ change.
NET_MARGIN   = 0.15                # industry‑average after‑tax margin
df['profit'] = df['revenue'] * NET_MARGIN

# years open as of dataset snapshot (assume snapshot on 2025‑01‑01)
snapshot = datetime(2025, 1, 1)
df['Open Date'] = pd.to_datetime(df['Open Date'])
df['YearsOpen'] = (snapshot - df['Open Date']).dt.days / 365.25

4. Define predictors & target

YearsOpen translates the Open Date string into a numeric age. Older outlets often earn higher profits due to brand equity, but may also suffer from cost creep; the coefficient reveals which side prevails.

num_cols = ['YearsOpen'] + [f'P{i}' for i in range(1, 38)]
cat_cols = ['City Group', 'Type', 'City']
target   = 'profit'

X = df[num_cols + cat_cols]
y = df[target]

5. Pre‑processing & regression pipeline

One-hot encoding allows each city group or concept type to carry its own fixed profit offset without imposing a “distance” between categories.

preproc = ColumnTransformer([
 ('cat', OneHotEncoder(handle_unknown='ignore'), cat_cols),
 ('num', StandardScaler(),                      num_cols)
])

linreg = LinearRegression()

pipe = Pipeline([
 ('prep',  preproc),
 ('model', linreg)
])

6. Train‑test split & training

X_train, X_test, y_train, y_test = train_test_split(
 X, y, test_size=0.2, random_state=42, shuffle=True)

pipe.fit(X_train, y_train)

7. Evaluation

Performance metrics – R² shows the share of profit variation our simple recipe explains; MAE in dollars gives finance teams a tangible error band (e.g., ±$38 k).

y_pred = pipe.predict(X_test)
print(f"R²  : {r2_score(y_test, y_pred):.3f}")
print(f"MAE : ${mean_absolute_error(y_test, y_pred):,.0f}")

8. Understand profit drivers

The coefficient table quickly spotlights profit levers: a large positive bump for Type_FC might confirm that food courts are cash cows; a negative weight on City Group_Other quantifies the drag of small-town sites.

# recover encoded feature names
ohe_feats = (pipe.named_steps['prep']
 .named_transformers_['cat']
 .get_feature_names_out(cat_cols))

all_feats = list(ohe_feats) + num_cols
coef      = pd.Series(pipe.named_steps['model'].coef_, index=all_feats)\
 .sort_values()

print("\nNegative‑impact factors (lower profit):")
print(coef.head(8))

print("\nPositive‑impact factors (raise profit):")
print(coef.tail(8))

Because numeric inputs are z‑scored, coefficients on YearsOpen or P‑features read as “profit change for a one‑σ shift.” One-hot coefficients indicate the dollar increase compared to the reference level.

9. Persist the trained pipeline

Joblib model is ready for your BI tool or quoting webform – call joblib.load(…), pass a new restaurant’s spec in a pandas row, and return a profit forecast in milliseconds.

joblib.dump(pipe, "restaurant_profit_linreg.pkl")

 Summary

With roughly 120 lines of Python, we turned an open revenue dataset into an explainable restaurant‑profit forecaster:

  • Instant, data-backed profit projections assist in site selection, loan sizing, and concept tweaks.
  • Transparent marginal effects reveal exactly how age, concept type, city size, and site metrics influence the bottom line.

Keep this transparent baseline in your toolbox; every gradient‑boosted tree, Bayesian network, or causal‑impact model you test next must beat its mean‑absolute‑error and still make business sense to the CFO.

Did we exceed your expectations?
If Yes, share your valuable feedback on Google | Facebook

ProjectGurukul Team

ProjectGurukul Team specializes in creating project-based learning resources for programming, Java, Python, Android, AI, Webdevelopment and machine learning. Our mission is to help learners build practical skills through engaging, hands-on projects. We also offer free major and minor projects with source code for engineering students

Leave a Reply

Your email address will not be published. Required fields are marked *