Retail Inventory Prediction using Quantile Regression in ML

FREE Online Courses: Click, Learn, Succeed, Start Now!

Retailers must not only forecast average demand but also understand inventory requirements under varying demand scenarios—from slow‐moving items (10th percentile demand) to best‐sellers (90th percentile demand). Holding excess stock ties up capital and incurs spoilage risk, while stockouts erode sales and customer loyalty.

In this project, we’ll predict the 10th, 50th, and 90th percentiles of monthly item sales using the “Predict Future Sales” dataset—leveraging quantile regression to model the lower, median, and upper tails of the demand distribution based on features like shop, item, category, historic sales, and price trends. The resulting quantile forecasts will enable inventory managers to set safety stock for lean months, plan baseline inventory, and provision for peak demand—optimizing service levels and working‑capital efficiency.

Libraries Required

import pandas as pd                               # Data loading & manipulation  
import numpy as np                                # Numerical operations  
import statsmodels.formula.api as smf             # Quantile regression via formula API  
from sklearn.model_selection import train_test_split  # Train/test split  
from sklearn.metrics import mean_pinball_loss        # Proper loss for quantile forecasts  

Dataset

Predict Future Sales

Step-by-Step Code Implementation

Load & Inspect Data

We read daily sales and item metadata. We convert the date column to datetime, extract a month, and aggregate daily counts to monthly_sales and mean price per shop–item pair.

# Downloaded from Kaggle competition “Predict Future Sales” :contentReference[oaicite:0]{index=0}
sales = pd.read_csv("sales_train.csv")
items = pd.read_csv("items.csv")
shops = pd.read_csv("shops.csv")
item_cats = pd.read_csv("item_categories.csv")

# Aggregate daily sales to monthly sales per shop–item
sales['date'] = pd.to_datetime(sales['date'], format='%d.%m.%Y')
sales['month'] = sales['date'].dt.to_period('M')
monthly = (
    sales.groupby(['month','shop_id','item_id'])
         .agg(monthly_sales=('item_cnt_day','sum'),
              avg_price=('item_price','mean'))
         .reset_index()
)
print(monthly.head())
print(monthly.describe())

Feature Engineering

  • We map each month to an integer month_num for ordering.
  • We create a lag feature lag_1 (previous month’s sales) to capture temporal autocorrelation.
  • We merge the item_category_id to encode demand heterogeneity across product groups.
# Add lag features and category info
monthly['month_num'] = monthly['month'].dt.month + 12*(monthly['month'].dt.year - 2013)

# Lagged sales (previous month) as predictor
monthly['lag_1'] = monthly.groupby(['shop_id','item_id'])['monthly_sales'].shift(1).fillna(0)

# Merge item category
monthly = monthly.merge(items[['item_id','item_category_id']], on='item_id', how='left')

Train/Test Split

We use the first 30 months (~Jan 2013–Jun 2015) to train quantile models and reserve the last 3 months (~Jul–Sep 2015) for out‐of‐sample evaluation.

# Use months 1–30 for training, 31–33 as “test” (holdout) for evaluation
train = monthly[monthly['month_num'] <= 30]
test  = monthly[monthly['month_num'] > 30]

Fit Quantile Regression Models

For each target percentile (10th, 50th, 90th):

  • We specify a formula linking monthly_sales to the features.
  • We fit a QuantReg model on the training set.
  • We print the coefficient table, showing how predictors (lag, price, category, shop effects) influence different demand quantiles—e.g., price sensitivity may be greater in the upper tail.
quantiles = [0.10, 0.50, 0.90]
results   = {}
features  = ['lag_1','avg_price','item_category_id','shop_id']

formula = "monthly_sales ~ " + " + ".join(features)
for q in quantiles:
    mod = smf.quantreg(formula, train)
    res = mod.fit(q=q)
    results[q] = res
    print(f"\n--- {int(q*100)}th Percentile Coefficients ---")
    print(res.summary().tables[1])

Evaluation with Pinball Loss

  • We predict quantile‐specific monthly sales on the holdout set.
  • We compute pinball loss for each quantile, a proper scoring rule that penalizes missed quantiles asymmetrically. Lower pinball loss indicates more precise quantile calibration and thus better inventory sizing across demand scenarios.
for q, res in results.items():
    preds = res.predict(test[features])
    loss  = mean_pinball_loss(test['monthly_sales'], preds, alpha=q)
    print(f"{int(q*100)}th quantile pinball loss: {loss:.2f}")

Summary

By applying quantile regression to the “Predict Future Sales” dataset, we obtain distribution‐aware demand forecasts:

  • The 10th percentile model guides safety stock for slow‐moving items, minimizing overstock.
  • The median (50th percentile) model predicts baseline inventory for typical demand.
  • The 90th percentile model informs peak‐season provisioning, preventing stockouts when demand surges.

These quantile forecasts empower retail inventory planners with a nuanced, risk‐sensitive approach—balancing capital efficiency against service levels under demand uncertainty.

Did we exceed your expectations?
If Yes, share your valuable feedback on Google | Facebook

ProjectGurukul Team

The ProjectGurukul Team delivers project-based tutorials on programming, machine learning, and web development. We simplify learning by providing hands-on projects to help you master real-world skills. We also provide free major and minor projects for enginering students.

Leave a Reply

Your email address will not be published. Required fields are marked *