Retail Inventory Prediction using Quantile Regression in ML
FREE Online Courses: Click, Learn, Succeed, Start Now!
Retailers must not only forecast average demand but also understand inventory requirements under varying demand scenarios—from slow‐moving items (10th percentile demand) to best‐sellers (90th percentile demand). Holding excess stock ties up capital and incurs spoilage risk, while stockouts erode sales and customer loyalty.
In this project, we’ll predict the 10th, 50th, and 90th percentiles of monthly item sales using the “Predict Future Sales” dataset—leveraging quantile regression to model the lower, median, and upper tails of the demand distribution based on features like shop, item, category, historic sales, and price trends. The resulting quantile forecasts will enable inventory managers to set safety stock for lean months, plan baseline inventory, and provision for peak demand—optimizing service levels and working‑capital efficiency.
Libraries Required
import pandas as pd # Data loading & manipulation import numpy as np # Numerical operations import statsmodels.formula.api as smf # Quantile regression via formula API from sklearn.model_selection import train_test_split # Train/test split from sklearn.metrics import mean_pinball_loss # Proper loss for quantile forecasts
Dataset
Step-by-Step Code Implementation
Load & Inspect Data
We read daily sales and item metadata. We convert the date column to datetime, extract a month, and aggregate daily counts to monthly_sales and mean price per shop–item pair.
# Downloaded from Kaggle competition “Predict Future Sales” :contentReference[oaicite:0]{index=0}
sales = pd.read_csv("sales_train.csv")
items = pd.read_csv("items.csv")
shops = pd.read_csv("shops.csv")
item_cats = pd.read_csv("item_categories.csv")
# Aggregate daily sales to monthly sales per shop–item
sales['date'] = pd.to_datetime(sales['date'], format='%d.%m.%Y')
sales['month'] = sales['date'].dt.to_period('M')
monthly = (
sales.groupby(['month','shop_id','item_id'])
.agg(monthly_sales=('item_cnt_day','sum'),
avg_price=('item_price','mean'))
.reset_index()
)
print(monthly.head())
print(monthly.describe())
Feature Engineering
- We map each month to an integer month_num for ordering.
- We create a lag feature lag_1 (previous month’s sales) to capture temporal autocorrelation.
- We merge the item_category_id to encode demand heterogeneity across product groups.
# Add lag features and category info monthly['month_num'] = monthly['month'].dt.month + 12*(monthly['month'].dt.year - 2013) # Lagged sales (previous month) as predictor monthly['lag_1'] = monthly.groupby(['shop_id','item_id'])['monthly_sales'].shift(1).fillna(0) # Merge item category monthly = monthly.merge(items[['item_id','item_category_id']], on='item_id', how='left')
Train/Test Split
We use the first 30 months (~Jan 2013–Jun 2015) to train quantile models and reserve the last 3 months (~Jul–Sep 2015) for out‐of‐sample evaluation.
# Use months 1–30 for training, 31–33 as “test” (holdout) for evaluation train = monthly[monthly['month_num'] <= 30] test = monthly[monthly['month_num'] > 30]
Fit Quantile Regression Models
For each target percentile (10th, 50th, 90th):
- We specify a formula linking monthly_sales to the features.
- We fit a QuantReg model on the training set.
- We print the coefficient table, showing how predictors (lag, price, category, shop effects) influence different demand quantiles—e.g., price sensitivity may be greater in the upper tail.
quantiles = [0.10, 0.50, 0.90]
results = {}
features = ['lag_1','avg_price','item_category_id','shop_id']
formula = "monthly_sales ~ " + " + ".join(features)
for q in quantiles:
mod = smf.quantreg(formula, train)
res = mod.fit(q=q)
results[q] = res
print(f"\n--- {int(q*100)}th Percentile Coefficients ---")
print(res.summary().tables[1])
Evaluation with Pinball Loss
- We predict quantile‐specific monthly sales on the holdout set.
- We compute pinball loss for each quantile, a proper scoring rule that penalizes missed quantiles asymmetrically. Lower pinball loss indicates more precise quantile calibration and thus better inventory sizing across demand scenarios.
for q, res in results.items():
preds = res.predict(test[features])
loss = mean_pinball_loss(test['monthly_sales'], preds, alpha=q)
print(f"{int(q*100)}th quantile pinball loss: {loss:.2f}")
Summary
By applying quantile regression to the “Predict Future Sales” dataset, we obtain distribution‐aware demand forecasts:
- The 10th percentile model guides safety stock for slow‐moving items, minimizing overstock.
- The median (50th percentile) model predicts baseline inventory for typical demand.
- The 90th percentile model informs peak‐season provisioning, preventing stockouts when demand surges.
These quantile forecasts empower retail inventory planners with a nuanced, risk‐sensitive approach—balancing capital efficiency against service levels under demand uncertainty.