Inventory Management Cost Prediction with Lasso Regression in ML

FREE Online Courses: Elevate Skills, Zero Cost. Enroll Now!

Carrying too much stock ties up cash and inflates warehouse bills, while carrying too little risks lost sales and overtime shipping. Accurately quantifying the holding‑and‑handling cost for the next replenishment cycle—before inventory is ordered—helps planners strike the sweet spot.

We will build a Lasso‑regularised linear model that:

  • Predicts the expected inventory‑management cost (USD) for each product‑week using signals that are known in advance (historical demand, product family, weight‑class, shelf‑life indicator, lead time, etc.).
  • Shrinks uninformative predictors to zero, surfacing the handful of drivers (e.g., bulky items with slow turns) that deserve special attention.

Because Lasso’s ℓ¹ penalty produces a sparse, interpretable model, supply‑chain managers can see why costs rise and act before purchase orders go out.

Libraries Required

Role Library
Data handling pandas, numpy
Visuals matplotlib, seaborn
ML pipeline scikit‑learnColumnTransformer, OneHotEncoder, StandardScaler, Pipeline, Lasso, GridSearchCV
Metrics mean_squared_error, r2_score

Dataset Link

Grupo Bimbo Inventory Demand

Step-by-Step Code Implementation

1. Import Libraries

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder, StandardScaler
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.pipeline import Pipeline
from sklearn.linear_model import Lasso
from sklearn.metrics import mean_squared_error, r2_score

2. Download and load the dataset

Grupo Bimbo provides nine weeks of sales, returns, and SKU metadata for 74 M transactions. We enrich it with SKU weight (Peso_kg) and engineer demand lags.

# One‑time shell command (needs Kaggle API):
# kaggle competitions download -c grupo-bimbo-inventory-demand -p data --unzip

df = pd.read_csv("data/train.csv")   # 7.4 M rows, sample if RAM‑limited

3. Cost label engineering

We transform unit demand into a holding cost per week by multiplying by weight and a notional $0.015 per kg-week rate; substitute your actual carrying-rate formula.

For illustration, define holding‑and‑handling cost as:
Cost=(inventory_units)×(unit_weight kg)×0.015  ($/kg week)\text{Cost} = (\text{inventory\_units}) \times (\text{unit\_weight\,kg}) \times 0.015\;(\$/\text{kg\,week})
# Example proxies — adapt to your ERP fields
df['holding_cost'] = df['Demanda_uni_equil'] * df['Peso_kg'] * 0.015
y = df['holding_cost']

4. Feature matrix (X)

Logistics nodes (agency, route), customer, product ID, calendar week, and short‑term demand history. All are known before the stock is ordered.

features = [
    'Semana',              # week number
    'Agencia_ID', 'Canal_ID', 'Ruta_SAK',   # logistics topology
    'Cliente_ID', 'Producto_ID',            # customer & SKU
    'Demanda_uni_equil_lag1',               # prior‑week demand (create below)
    'Demanda_uni_equil_lag4'
]

# Create simple demand lags per product‑client
df = df.sort_values(['Producto_ID', 'Cliente_ID', 'Semana'])
df['Demanda_uni_equil_lag1'] = df.groupby(['Producto_ID', 'Cliente_ID'])['Demanda_uni_equil'].shift(1)
df['Demanda_uni_equil_lag4'] = df.groupby(['Producto_ID', 'Cliente_ID'])['Demanda_uni_equil'].shift(4)

df = df.dropna(subset=['Demanda_uni_equil_lag1'])   # drop rows where lag not defined
X = df[features]

5. Pre‑processing pipeline

Categorical IDs become one‑hot vectors (dropping the first level to avoid dummy‑variable trap); numeric lags and week numbers are z‑scaled so the Lasso penalty treats them fairly.

cat_cols = ['Agencia_ID', 'Canal_ID', 'Ruta_SAK', 'Cliente_ID', 'Producto_ID']
num_cols = ['Semana', 'Demanda_uni_equil_lag1', 'Demanda_uni_equil_lag4']

preprocess = ColumnTransformer([
        ('cat', OneHotEncoder(handle_unknown='ignore', sparse=False, drop='first'), cat_cols),
        ('num', StandardScaler(), num_cols)
    ])

6. Train/test split

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42)

7. Build & tune Lasso pipeline

A log‑spaced α search seeks the sweet‑spot between sparsity and error; three‑fold CV speeds training on this extensive data.

pipe = Pipeline([
        ('prep', preprocess),
        ('model', Lasso(max_iter=20_000, random_state=42))
    ])

param_grid = {'model__alpha': np.logspace(-3, 1, 30)}   # 0.001 → 10
search = GridSearchCV(pipe, param_grid, cv=3,
                      scoring='neg_root_mean_squared_error',
                      n_jobs=-1, verbose=1)
search.fit(X_train, y_train)

print("Optimal α:", search.best_params_['model__alpha'])

8. Evaluate on the hold‑out set

RMSE expresses average dollar error per record; R2R^{2} shows variance explained.

y_pred = search.predict(X_test)
rmse = mean_squared_error(y_test, y_pred, squared=False)
r2   = r2_score(y_test, y_pred)

print(f"Test RMSE: ${rmse:,.0f} | R²: {r2:.3f}")

9. Interpret feature importance

Non-zero coefficients identify cost hotspots (e.g., specific bulky SKU, high-return customer route), while zeros mark negligible factors—guiding planners toward high-leverage fixes.

ohe = search.best_estimator_.named_steps['prep'].named_transformers_['cat']
ohe_names = ohe.get_feature_names_out(cat_cols)
feature_names = np.hstack([ohe_names, num_cols])

coef = search.best_estimator_.named_steps['model'].coef_
imp  = (pd.Series(coef, index=feature_names)
          .sort_values(key=abs, ascending=False))

plt.figure(figsize=(9,6))
imp.head(20).plot(kind='barh')
plt.gca().invert_yaxis()
plt.title('Top Drivers of Inventory Cost (Lasso Coefficients)')
plt.xlabel('Coefficient (Δ USD)')
plt.show()

Summary

In under 150 lines of Python, we built an interpretable, cross‑validated Lasso model that:

  • Predicts weekly inventory carrying cost for every product‑customer‑route combination.
  • Surfaces the costliest drivers, helping supply‑chain teams focus on SKUs, customers, or routes that inflate spend.
  • Refreshes quickly—thanks to the encapsulated Pipeline, pushing a new week of data through the model is a one‑line fit().

Deploying this lightweight tool transforms inventory budgeting from a reactive, “after‑the‑fact” exercise into a proactive, data-driven discipline.

Did you know we work 24x7 to provide you best tutorials
Please encourage us - write a review on Google | Facebook

ProjectGurukul Team

ProjectGurukul Team specializes in creating project-based learning resources for programming, Java, Python, Android, AI, Webdevelopment and machine learning. Our mission is to help learners build practical skills through engaging, hands-on projects. We also offer free major and minor projects with source code for engineering students

Leave a Reply

Your email address will not be published. Required fields are marked *