Consumer Spending Growth Prediction using Polynomial Regression in ML
FREE Online Courses: Dive into Knowledge for Free. Learn More!
Financial analysts and marketing teams aim to forecast week‑over‑week consumer spending growth (%) for key product categories—such as groceries, apparel, and electronics—using only early‑week indicators (daily spend totals, promotional flags, price indices, and store traffic). Empirical patterns show nonlinear effects: small promotions can trigger outsized lift at low base spend but saturate quickly at higher volumes, and holiday impacts interact with price indices. A plain linear model underfits these curvatures, while an unregularised high‑degree polynomial overfits noise. Polynomial Regression on engineered feature integration, combined with Ridge regularisation, can capture smooth, interpretable nonlinear growth dynamics and provide accurate forecasts to guide pricing and inventory decisions.
Libraries Required
import pandas as pd import numpy as np import matplotlib.pyplot as plt import seaborn as sns from sklearn.model_selection import train_test_split, GridSearchCV from sklearn.preprocessing import StandardScaler, PolynomialFeatures from sklearn.linear_model import Ridge from sklearn.pipeline import Pipeline from sklearn.metrics import mean_squared_error, r2_score
Dataset
Step-by-Step Code Implementation
Load Data & Libraries
import pandas as pd
import numpy as np
# Load synthetic spending data
df = pd.read_csv("data/spending_habits.csv")
# Preview columns
df.head()[['date','category','daily_spend','promo_flag','price_index','store_traffic']]
Feature Engineering
- PolynomialFeatures: Expands inputs with squares and interaction terms (e.g. daily_spend², daily_spend×promo_flag), capturing saturation and synergy effects.
- Data aggregation: We roll daily spend into weekly totals per category, derive promo presence, average price index, and foot traffic, then compute week‑over‑week growth as our target.
# Convert date and compute week number
df['date'] = pd.to_datetime(df['date'])
df['week'] = df['date'].dt.isocalendar().week
# Aggregate to weekly totals by category
weekly = (df
.groupby(['category','week'])
.agg({
'daily_spend':'sum',
'promo_flag':'max', # any promo in week
'price_index':'mean',
'store_traffic':'mean'
})
.reset_index()
.sort_values(['category','week'])
)
# Compute week-over-week growth
weekly['spend_prev'] = (weekly
.groupby('category')['daily_spend']
.shift(1)
)
weekly['growth_pct'] = (weekly['daily_spend'] - weekly['spend_prev']) / weekly['spend_prev'] * 100
weekly = weekly.dropna(subset=['growth_pct'])
Define Features & Target
X = weekly[['daily_spend','promo_flag','price_index','store_traffic']] y = weekly['growth_pct']
Build Polynomial Regression Pipeline
- StandardScaler: Normalises each feature so Ridge’s ℓ² penalty treats them evenly, preventing high‑variance terms from dominating.
- Ridge: Applies ℓ² regularisation to shrink noisy high‑degree coefficients, controlling overfitting.
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler, PolynomialFeatures
from sklearn.linear_model import Ridge
pipe = Pipeline([
('scale', StandardScaler()),
('poly', PolynomialFeatures(include_bias=False)),
('ridge', Ridge(random_state=42))
])
Train/Test Split & Hyperparameter Search
GridSearchCV: Tunes polynomial degree (1–3) and α (10⁻³–10³) with 5‑fold CV to minimise RMSE on growth‑rate forecasts.
from sklearn.model_selection import train_test_split, GridSearchCV
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42
)
param_grid = {
'poly__degree': [1, 2, 3],
'ridge__alpha': np.logspace(-3, 3, 7)
}
gs = GridSearchCV(
pipe, param_grid,
cv=5,
scoring='neg_root_mean_squared_error',
n_jobs=-1, verbose=1
)
gs.fit(X_train, y_train)
print("Best parameters:", gs.best_params_)
Evaluate Model
y_pred = gs.predict(X_test)
rmse = mean_squared_error(y_test, y_pred, squared=False)
r2 = r2_score(y_test, y_pred)
print(f"Test RMSE: {rmse:.2f}% growth")
print(f"Test R² : {r2:.3f}")
Inspect Key Polynomial Coefficients
Coefficient inspection: Identifies which nonlinear or interaction terms most influence predicted growth—informing strategic levers like optimal spend levels under promotions or pricing elasticity interactions.
poly = gs.best_estimator_.named_steps['poly']
feat_names = poly.get_feature_names_out(input_features=X.columns)
coefs = gs.best_estimator_.named_steps['ridge'].coef_
import pandas as pd
coef_series = pd.Series(coefs, index=feat_names).abs().sort_values(ascending=False)
plt.figure(figsize=(8,5))
coef_series.head(10).plot(kind='barh')
plt.gca().invert_yaxis()
plt.title("Top Polynomial Features Driving Spending Growth")
plt.xlabel("Coefficient Magnitude")
plt.tight_layout()
plt.show()
Summary
This Polynomial Regression pipeline delivers:
1. Accurate nonlinear forecasts of consumer spending growth, capturing saturation and synergy (low RMSE, strong R²).
2. Robust complexity control through Ridge regularisation, avoiding spurious high‑order effects.
3. Clear interpretability: top polynomial features highlight actionable drivers—e.g. how spend levels and promotions interact—to guide pricing and inventory strategies.