Inventory Management Cost Prediction with Lasso Regression in ML
FREE Online Courses: Your Passport to Excellence - Start Now
Carrying too much stock ties up cash and inflates warehouse bills, while carrying too little risks lost sales and overtime shipping. Accurately quantifying the holding‑and‑handling cost for the next replenishment cycle—before inventory is ordered—helps planners strike the sweet spot.
We will build a Lasso‑regularised linear model that:
- Predicts the expected inventory‑management cost (USD) for each product‑week using signals that are known in advance (historical demand, product family, weight‑class, shelf‑life indicator, lead time, etc.).
- Shrinks uninformative predictors to zero, surfacing the handful of drivers (e.g., bulky items with slow turns) that deserve special attention.
Because Lasso’s ℓ¹ penalty produces a sparse, interpretable model, supply‑chain managers can see why costs rise and act before purchase orders go out.
Libraries Required
| Role | Library |
| Data handling | pandas, numpy |
| Visuals | matplotlib, seaborn |
| ML pipeline | scikit‑learn → ColumnTransformer, OneHotEncoder, StandardScaler, Pipeline, Lasso, GridSearchCV |
| Metrics | mean_squared_error, r2_score |
Dataset Link
Step-by-Step Code Implementation
1. Import Libraries
import pandas as pd import numpy as np import matplotlib.pyplot as plt import seaborn as sns from sklearn.compose import ColumnTransformer from sklearn.preprocessing import OneHotEncoder, StandardScaler from sklearn.model_selection import train_test_split, GridSearchCV from sklearn.pipeline import Pipeline from sklearn.linear_model import Lasso from sklearn.metrics import mean_squared_error, r2_score
2. Download and load the dataset
Grupo Bimbo provides nine weeks of sales, returns, and SKU metadata for 74 M transactions. We enrich it with SKU weight (Peso_kg) and engineer demand lags.
# One‑time shell command (needs Kaggle API):
# kaggle competitions download -c grupo-bimbo-inventory-demand -p data --unzip
df = pd.read_csv("data/train.csv") # 7.4 M rows, sample if RAM‑limited
3. Cost label engineering
We transform unit demand into a holding cost per week by multiplying by weight and a notional $0.015 per kg-week rate; substitute your actual carrying-rate formula.
For illustration, define holding‑and‑handling cost as:
Cost=(inventory_units)×(unit_weight kg)×0.015 ($/kg week)\text{Cost} = (\text{inventory\_units}) \times (\text{unit\_weight\,kg}) \times 0.015\;(\$/\text{kg\,week})
# Example proxies — adapt to your ERP fields
df['holding_cost'] = df['Demanda_uni_equil'] * df['Peso_kg'] * 0.015
y = df['holding_cost']
4. Feature matrix (X)
Logistics nodes (agency, route), customer, product ID, calendar week, and short‑term demand history. All are known before the stock is ordered.
features = [
'Semana', # week number
'Agencia_ID', 'Canal_ID', 'Ruta_SAK', # logistics topology
'Cliente_ID', 'Producto_ID', # customer & SKU
'Demanda_uni_equil_lag1', # prior‑week demand (create below)
'Demanda_uni_equil_lag4'
]
# Create simple demand lags per product‑client
df = df.sort_values(['Producto_ID', 'Cliente_ID', 'Semana'])
df['Demanda_uni_equil_lag1'] = df.groupby(['Producto_ID', 'Cliente_ID'])['Demanda_uni_equil'].shift(1)
df['Demanda_uni_equil_lag4'] = df.groupby(['Producto_ID', 'Cliente_ID'])['Demanda_uni_equil'].shift(4)
df = df.dropna(subset=['Demanda_uni_equil_lag1']) # drop rows where lag not defined
X = df[features]
5. Pre‑processing pipeline
Categorical IDs become one‑hot vectors (dropping the first level to avoid dummy‑variable trap); numeric lags and week numbers are z‑scaled so the Lasso penalty treats them fairly.
cat_cols = ['Agencia_ID', 'Canal_ID', 'Ruta_SAK', 'Cliente_ID', 'Producto_ID']
num_cols = ['Semana', 'Demanda_uni_equil_lag1', 'Demanda_uni_equil_lag4']
preprocess = ColumnTransformer([
('cat', OneHotEncoder(handle_unknown='ignore', sparse=False, drop='first'), cat_cols),
('num', StandardScaler(), num_cols)
])
6. Train/test split
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42)
7. Build & tune Lasso pipeline
A log‑spaced α search seeks the sweet‑spot between sparsity and error; three‑fold CV speeds training on this extensive data.
pipe = Pipeline([
('prep', preprocess),
('model', Lasso(max_iter=20_000, random_state=42))
])
param_grid = {'model__alpha': np.logspace(-3, 1, 30)} # 0.001 → 10
search = GridSearchCV(pipe, param_grid, cv=3,
scoring='neg_root_mean_squared_error',
n_jobs=-1, verbose=1)
search.fit(X_train, y_train)
print("Optimal α:", search.best_params_['model__alpha'])
8. Evaluate on the hold‑out set
RMSE expresses average dollar error per record; R2R^{2} shows variance explained.
y_pred = search.predict(X_test)
rmse = mean_squared_error(y_test, y_pred, squared=False)
r2 = r2_score(y_test, y_pred)
print(f"Test RMSE: ${rmse:,.0f} | R²: {r2:.3f}")
9. Interpret feature importance
Non-zero coefficients identify cost hotspots (e.g., specific bulky SKU, high-return customer route), while zeros mark negligible factors—guiding planners toward high-leverage fixes.
ohe = search.best_estimator_.named_steps['prep'].named_transformers_['cat']
ohe_names = ohe.get_feature_names_out(cat_cols)
feature_names = np.hstack([ohe_names, num_cols])
coef = search.best_estimator_.named_steps['model'].coef_
imp = (pd.Series(coef, index=feature_names)
.sort_values(key=abs, ascending=False))
plt.figure(figsize=(9,6))
imp.head(20).plot(kind='barh')
plt.gca().invert_yaxis()
plt.title('Top Drivers of Inventory Cost (Lasso Coefficients)')
plt.xlabel('Coefficient (Δ USD)')
plt.show()
Summary
In under 150 lines of Python, we built an interpretable, cross‑validated Lasso model that:
- Predicts weekly inventory carrying cost for every product‑customer‑route combination.
- Surfaces the costliest drivers, helping supply‑chain teams focus on SKUs, customers, or routes that inflate spend.
- Refreshes quickly—thanks to the encapsulated Pipeline, pushing a new week of data through the model is a one‑line fit().
Deploying this lightweight tool transforms inventory budgeting from a reactive, “after‑the‑fact” exercise into a proactive, data-driven discipline.