Inventory Management Cost Prediction with Lasso Regression in ML

FREE Online Courses: Your Passport to Excellence - Start Now

Carrying too much stock ties up cash and inflates warehouse bills, while carrying too little risks lost sales and overtime shipping. Accurately quantifying the holding‑and‑handling cost for the next replenishment cycle—before inventory is ordered—helps planners strike the sweet spot.

We will build a Lasso‑regularised linear model that:

Predicts the expected inventory‑management cost (USD) for each product‑week using signals that are known in advance (historical demand, product family, weight‑class, shelf‑life indicator, lead time, etc.).
Shrinks uninformative predictors to zero, surfacing the handful of drivers (e.g., bulky items with slow turns) that deserve special attention.

Because Lasso’s ℓ¹ penalty produces a sparse, interpretable model, supply‑chain managers can see why costs rise and act before purchase orders go out.

Libraries Required

Role	Library
Data handling	pandas, numpy
Visuals	matplotlib, seaborn
ML pipeline	scikit‑learn → ColumnTransformer, OneHotEncoder, StandardScaler, Pipeline, Lasso, GridSearchCV
Metrics	mean_squared_error, r2_score

Dataset Link

Grupo Bimbo Inventory Demand

Step-by-Step Code Implementation

1. Import Libraries

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder, StandardScaler
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.pipeline import Pipeline
from sklearn.linear_model import Lasso
from sklearn.metrics import mean_squared_error, r2_score

2. Download and load the dataset

Grupo Bimbo provides nine weeks of sales, returns, and SKU metadata for 74 M transactions. We enrich it with SKU weight (Peso_kg) and engineer demand lags.

# One‑time shell command (needs Kaggle API):
# kaggle competitions download -c grupo-bimbo-inventory-demand -p data --unzip

df = pd.read_csv("data/train.csv")   # 7.4 M rows, sample if RAM‑limited

3. Cost label engineering

We transform unit demand into a holding cost per week by multiplying by weight and a notional $0.015 per kg-week rate; substitute your actual carrying-rate formula.

For illustration, define holding‑and‑handling cost as:
Cost=(inventory_units)×(unit_weight kg)×0.015  ($/kg week)\text{Cost} = (\text{inventory\_units}) \times (\text{unit\_weight\,kg}) \times 0.015\;(\$/\text{kg\,week})
# Example proxies — adapt to your ERP fields
df['holding_cost'] = df['Demanda_uni_equil'] * df['Peso_kg'] * 0.015
y = df['holding_cost']

4. Feature matrix (X)

Logistics nodes (agency, route), customer, product ID, calendar week, and short‑term demand history. All are known before the stock is ordered.

features = [
    'Semana',              # week number
    'Agencia_ID', 'Canal_ID', 'Ruta_SAK',   # logistics topology
    'Cliente_ID', 'Producto_ID',            # customer & SKU
    'Demanda_uni_equil_lag1',               # prior‑week demand (create below)
    'Demanda_uni_equil_lag4'
]

# Create simple demand lags per product‑client
df = df.sort_values(['Producto_ID', 'Cliente_ID', 'Semana'])
df['Demanda_uni_equil_lag1'] = df.groupby(['Producto_ID', 'Cliente_ID'])['Demanda_uni_equil'].shift(1)
df['Demanda_uni_equil_lag4'] = df.groupby(['Producto_ID', 'Cliente_ID'])['Demanda_uni_equil'].shift(4)

df = df.dropna(subset=['Demanda_uni_equil_lag1'])   # drop rows where lag not defined
X = df[features]

5. Pre‑processing pipeline

Categorical IDs become one‑hot vectors (dropping the first level to avoid dummy‑variable trap); numeric lags and week numbers are z‑scaled so the Lasso penalty treats them fairly.

cat_cols = ['Agencia_ID', 'Canal_ID', 'Ruta_SAK', 'Cliente_ID', 'Producto_ID']
num_cols = ['Semana', 'Demanda_uni_equil_lag1', 'Demanda_uni_equil_lag4']

preprocess = ColumnTransformer([
        ('cat', OneHotEncoder(handle_unknown='ignore', sparse=False, drop='first'), cat_cols),
        ('num', StandardScaler(), num_cols)
    ])

6. Train/test split

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42)

7. Build & tune Lasso pipeline

A log‑spaced α search seeks the sweet‑spot between sparsity and error; three‑fold CV speeds training on this extensive data.

pipe = Pipeline([
        ('prep', preprocess),
        ('model', Lasso(max_iter=20_000, random_state=42))
    ])

param_grid = {'model__alpha': np.logspace(-3, 1, 30)}   # 0.001 → 10
search = GridSearchCV(pipe, param_grid, cv=3,
                      scoring='neg_root_mean_squared_error',
                      n_jobs=-1, verbose=1)
search.fit(X_train, y_train)

print("Optimal α:", search.best_params_['model__alpha'])

8. Evaluate on the hold‑out set

RMSE expresses average dollar error per record; R2R^{2} shows variance explained.

y_pred = search.predict(X_test)
rmse = mean_squared_error(y_test, y_pred, squared=False)
r2   = r2_score(y_test, y_pred)

print(f"Test RMSE: ${rmse:,.0f} | R²: {r2:.3f}")

9. Interpret feature importance

Non-zero coefficients identify cost hotspots (e.g., specific bulky SKU, high-return customer route), while zeros mark negligible factors—guiding planners toward high-leverage fixes.

ohe = search.best_estimator_.named_steps['prep'].named_transformers_['cat']
ohe_names = ohe.get_feature_names_out(cat_cols)
feature_names = np.hstack([ohe_names, num_cols])

coef = search.best_estimator_.named_steps['model'].coef_
imp  = (pd.Series(coef, index=feature_names)
          .sort_values(key=abs, ascending=False))

plt.figure(figsize=(9,6))
imp.head(20).plot(kind='barh')
plt.gca().invert_yaxis()
plt.title('Top Drivers of Inventory Cost (Lasso Coefficients)')
plt.xlabel('Coefficient (Δ USD)')
plt.show()

Summary

In under 150 lines of Python, we built an interpretable, cross‑validated Lasso model that:

Predicts weekly inventory carrying cost for every product‑customer‑route combination.
Surfaces the costliest drivers, helping supply‑chain teams focus on SKUs, customers, or routes that inflate spend.
Refreshes quickly—thanks to the encapsulated Pipeline, pushing a new week of data through the model is a one‑line fit().

Deploying this lightweight tool transforms inventory budgeting from a reactive, “after‑the‑fact” exercise into a proactive, data-driven discipline.

Your 15 seconds will encourage us to work even harder
Please share your happy experience on Google | Facebook