Sales Forecast Accuracy Prediction with Lasso Regression in ML
FREE Online Courses: Click, Learn, Succeed, Start Now!
Demand‑planning teams often discover after the fact that yesterday’s sales forecasts missed badly on specific store–SKU combinations. By then, safety stocks are gone and expedites are expensive. This project develops a Lasso-regularised linear model to predict the likely absolute percentage error (APE) of a time-series forecast in advance of the selling day arriving.
If the model indicates a high error, planners can review, apply judgmental overrides, or upgrade the algorithm only where necessary.
Libraries Required
| Role | Library |
| Data handling | pandas, numpy |
| Visualisation | matplotlib, seaborn |
| ML pipeline | scikit‑learn → ColumnTransformer, OneHotEncoder, StandardScaler, Pipeline, Lasso, GridSearchCV |
| Accuracy metric | mean_absolute_percentage_error (from sklearn.metrics) |
Dataset Link
Store Item Demand Forecasting Challenge
Step-by-Step Code Implementation
1. Import Libraries
import pandas as pd import numpy as np import matplotlib.pyplot as plt import seaborn as sns from sklearn.compose import ColumnTransformer from sklearn.preprocessing import OneHotEncoder, StandardScaler from sklearn.model_selection import train_test_split, GridSearchCV from sklearn.pipeline import Pipeline from sklearn.linear_model import Lasso from sklearn.metrics import mean_absolute_percentage_error, r2_score
2. Download & load dataset
Dataset – five years of daily sales for 500 store‑item series. It’s granular enough to extract volatility and calendar effects.
# One‑time terminal command (needs Kaggle API key):
# kaggle competitions download -c demand-forecasting-kernels-only -p data --unzip
df = pd.read_csv("data/train.csv", parse_dates=['date']) # 913 k rows, 10 stores × 50 items
3. Baseline forecast & label engineering
For every record, we create a 7‑day moving‑average forecast (simple yet realistic) and compute its APE; that error now becomes a supervised target the meta‑model tries to predict.
# Ensure chronological order
df = df.sort_values(['store', 'item', 'date'])
# 7‑day moving‑average forecast (shifted so it doesn't peek ahead)
df['ma_7'] = df.groupby(['store', 'item'])['sales'] \
.transform(lambda s: s.rolling(7).mean().shift(1))
# Remove first 7 days per series (NaN forecast)
df = df.dropna(subset=['ma_7'])
# Absolute‑percentage error the MA model *will* make tomorrow
df['APE'] = (df['sales'] - df['ma_7']).abs() / df['sales']
4. Feature creation (all known before the target day)
only signals available before the selling day: recent level (ma_7), recent volatility (vol_14), store ID, item ID, and calendar dummies. No leakage from actual future sales.
# Calendar signals
df['dow'] = df['date'].dt.dayofweek # Monday=0
df['month'] = df['date'].dt.month
df['is_weekend'] = df['dow'].isin([5,6]).astype(int)
# Recent volatility (std of last 14 days, shifted)
df['vol_14'] = df.groupby(['store', 'item'])['sales'] \
.transform(lambda s: s.rolling(14).std().shift(1))
num_cols = ['ma_7', 'vol_14']
cat_cols = ['store', 'item', 'dow', 'month', 'is_weekend']
X = df[num_cols + cat_cols]
y = df['APE']
5. Pre‑process & model pipeline
ColumnTransformer one‑hot‑encodes categorical variables and scales numerics; wrapping inside a Pipeline prevents data leakage through CV folds.
prep = ColumnTransformer([
('num', StandardScaler(), num_cols),
('cat', OneHotEncoder(drop='first', sparse=False), cat_cols)
])
pipe = Pipeline([
('prep', prep),
('model', Lasso(max_iter=20_000, random_state=42))
])
6. Train / validation split
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42, shuffle=True)
7. Hyper‑parameter search (α shrinkage)
A log-spaced grid search over α enforces sparsity, so the final model retains only the strongest meta-features, making insights more digestible for planners.
param_grid = {'model__alpha': np.logspace(-3, 0, 15)} # 0.001 → 1
cv_search = GridSearchCV(pipe, param_grid,
cv=3,
scoring='neg_mean_absolute_percentage_error',
n_jobs=-1)
cv_search.fit(X_train, y_train)
print("Best α:", cv_search.best_params_['model__alpha']
8. Hold‑out evaluation
We report MAPE (business‑friendly percentage error) and R² (variance explained). A low MAPE indicates that the planner can trust the early-warning signal.
y_pred = cv_search.predict(X_test)
mape = mean_absolute_percentage_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
print(f"Test MAPE: {mape:.2%} | R²: {r2:.3f}")
9. Interpreting top drivers
Non-zero coefficients indicate where the baseline forecast is likely to struggle, e.g., item 42 on weekends or high-volatility series. Those segments deserve either manual review or a more sophisticated forecasting algorithm.
ohe = cv_search.best_estimator_.named_steps['prep'] \
.named_transformers_['cat']
feature_names = np.hstack([num_cols, ohe.get_feature_names_out(cat_cols)])
coef = cv_search.best_estimator_.named_steps['model'].coef_
importance = pd.Series(coef, index=feature_names).sort_values(key=abs, ascending=False)
plt.figure(figsize=(9,6))
importance.head(20).plot(kind='barh')
plt.gca().invert_yaxis()
plt.title('Top Drivers of Forecast Error (Lasso coefficients)')
plt.xlabel('Coefficient (Δ APE)')
plt.show()
Summary
With under 150 lines of code, we converted raw transactional data into a forecast‑of‑forecast‑error tool:
- Predicts next‑day APE for each store–SKU using only pre‑day meta‑features.
- Surfaces key risk drivers via Lasso’s built‑in feature selection, guiding limited analyst time to the worst offenders.
- Pipeline architecture means monthly retraining is a single line (fit)—no messy re-engineering.
Adopting this lightweight, interpretable approach enables demand-planning teams to transition from reactive firefighting to proactive accuracy management—ultimately reducing stock-outs, rush shipments, and write-offs.