Consumer Spending Growth Prediction using Polynomial Regression in ML

FREE Online Courses: Dive into Knowledge for Free. Learn More!

Financial analysts and marketing teams aim to forecast week‑over‑week consumer spending growth (%) for key product categories—such as groceries, apparel, and electronics—using only early‑week indicators (daily spend totals, promotional flags, price indices, and store traffic). Empirical patterns show nonlinear effects: small promotions can trigger outsized lift at low base spend but saturate quickly at higher volumes, and holiday impacts interact with price indices. A plain linear model underfits these curvatures, while an unregularised high‑degree polynomial overfits noise. Polynomial Regression on engineered feature integration, combined with Ridge regularisation, can capture smooth, interpretable nonlinear growth dynamics and provide accurate forecasts to guide pricing and inventory decisions.

Libraries Required

import pandas as pd  
import numpy as np  

import matplotlib.pyplot as plt  
import seaborn as sns  

from sklearn.model_selection import train_test_split, GridSearchCV  
from sklearn.preprocessing import StandardScaler, PolynomialFeatures  
from sklearn.linear_model import Ridge  
from sklearn.pipeline import Pipeline  
from sklearn.metrics import mean_squared_error, r2_score  

Dataset

Spending Patterns Dataset

Step-by-Step Code Implementation

Load Data & Libraries

import pandas as pd
import numpy as np

# Load synthetic spending data
df = pd.read_csv("data/spending_habits.csv")

# Preview columns
df.head()[['date','category','daily_spend','promo_flag','price_index','store_traffic']]

Feature Engineering

  • PolynomialFeatures: Expands inputs with squares and interaction terms (e.g. daily_spend², daily_spend×promo_flag), capturing saturation and synergy effects.
  • Data aggregation: We roll daily spend into weekly totals per category, derive promo presence, average price index, and foot traffic, then compute week‑over‑week growth as our target.
# Convert date and compute week number
df['date'] = pd.to_datetime(df['date'])
df['week'] = df['date'].dt.isocalendar().week

# Aggregate to weekly totals by category
weekly = (df
    .groupby(['category','week'])
    .agg({
        'daily_spend':'sum',
        'promo_flag':'max',        # any promo in week
        'price_index':'mean',
        'store_traffic':'mean'
    })
    .reset_index()
    .sort_values(['category','week'])
)

# Compute week-over-week growth
weekly['spend_prev'] = (weekly
    .groupby('category')['daily_spend']
    .shift(1)
)
weekly['growth_pct'] = (weekly['daily_spend'] - weekly['spend_prev']) / weekly['spend_prev'] * 100
weekly = weekly.dropna(subset=['growth_pct'])

Define Features & Target

X = weekly[['daily_spend','promo_flag','price_index','store_traffic']]
y = weekly['growth_pct']

Build Polynomial Regression Pipeline

  • StandardScaler: Normalises each feature so Ridge’s ℓ² penalty treats them evenly, preventing high‑variance terms from dominating.
  • Ridge: Applies ℓ² regularisation to shrink noisy high‑degree coefficients, controlling overfitting.
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler, PolynomialFeatures
from sklearn.linear_model import Ridge

pipe = Pipeline([
    ('scale', StandardScaler()),
    ('poly', PolynomialFeatures(include_bias=False)),
    ('ridge', Ridge(random_state=42))
])

Train/Test Split & Hyperparameter Search

GridSearchCV: Tunes polynomial degree (1–3) and α (10⁻³–10³) with 5‑fold CV to minimise RMSE on growth‑rate forecasts.

from sklearn.model_selection import train_test_split, GridSearchCV

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

param_grid = {
    'poly__degree': [1, 2, 3],
    'ridge__alpha': np.logspace(-3, 3, 7)
}

gs = GridSearchCV(
    pipe, param_grid,
    cv=5,
    scoring='neg_root_mean_squared_error',
    n_jobs=-1, verbose=1
)
gs.fit(X_train, y_train)

print("Best parameters:", gs.best_params_)

Evaluate Model

y_pred = gs.predict(X_test)
rmse = mean_squared_error(y_test, y_pred, squared=False)
r2   = r2_score(y_test, y_pred)

print(f"Test RMSE: {rmse:.2f}% growth")
print(f"Test R²  : {r2:.3f}")

Inspect Key Polynomial Coefficients

Coefficient inspection: Identifies which nonlinear or interaction terms most influence predicted growth—informing strategic levers like optimal spend levels under promotions or pricing elasticity interactions.

poly = gs.best_estimator_.named_steps['poly']
feat_names = poly.get_feature_names_out(input_features=X.columns)

coefs = gs.best_estimator_.named_steps['ridge'].coef_
import pandas as pd
coef_series = pd.Series(coefs, index=feat_names).abs().sort_values(ascending=False)

plt.figure(figsize=(8,5))
coef_series.head(10).plot(kind='barh')
plt.gca().invert_yaxis()
plt.title("Top Polynomial Features Driving Spending Growth")
plt.xlabel("Coefficient Magnitude")
plt.tight_layout()
plt.show()

Summary

This Polynomial Regression pipeline delivers:

1. Accurate nonlinear forecasts of consumer spending growth, capturing saturation and synergy (low RMSE, strong R²).

2. Robust complexity control through Ridge regularisation, avoiding spurious high‑order effects.

3. Clear interpretability: top polynomial features highlight actionable drivers—e.g. how spend levels and promotions interact—to guide pricing and inventory strategies.

Your 15 seconds will encourage us to work even harder
Please share your happy experience on Google | Facebook

ProjectGurukul Team

The ProjectGurukul Team delivers project-based tutorials on programming, machine learning, and web development. We simplify learning by providing hands-on projects to help you master real-world skills. We also provide free major and minor projects for enginering students.

Leave a Reply

Your email address will not be published. Required fields are marked *