Manufacturing Cost Prediction using Quantile Regression in ML
FREE Online Courses: Click, Learn, Succeed, Start Now!
Manufacturers need to budget for a broad distribution of per‑batch production costs—from low‑volume pilot runs (25th percentile) to large‑scale, high‑overhead batches (75th percentile). Planning based solely on average costs risks under‑provisioning for complex runs or over‑allocating capital to routine production.
In this manufacturing cost prediction ML project, we’ll predict the 25th, 50th, and 75th percentiles of batch manufacturing cost (USD) based on Units Produced, using historical cost data.
By fitting separate quantile regression models, we’ll reveal how per‑unit cost scales differ across lower-, median-, and upper-cost scenarios—enabling finance and operations teams to set conservative reserves, budget for typical runs, and plan for high‑cost cases.
Libraries Required
import pandas as pd # Data loading & manipulation import numpy as np # Numerical operations import statsmodels.formula.api as smf # Quantile regression via formula API from sklearn.model_selection import train_test_split from sklearn.metrics import mean_pinball_loss # Quantile loss metric
Dataset
Step-by-Step Code Implementation
Load & Inspect Data
We import the CSV, which contains Units produced per batch and the corresponding Manufacturing Cost (USD). The .info() confirms types; .describe() shows cost range and skew (e.g., min, median, max).
# Load the Manufacturing Cost dataset :contentReference[oaicite:0]{index=0}
df = pd.read_csv("manufacturing-cost.csv")
# Inspect structure and cost distribution
print(df.head())
print(df.info())
print(df['Manufacturing Cost'].describe())
Preprocessing & Renaming
- We rename Number of Units to Units and Manufacturing Cost to Cost for succinct formulas.
- We drop any incomplete records to ensure modeling integrity.
# Rename columns for clarity
df = df.rename(columns={
'Number of Units': 'Units',
'Manufacturing Cost': 'Cost'
})
# Drop any rows with missing values
df = df.dropna(subset=['Units', 'Cost'])
Train/Test Split
We hold out 20% of the data at random for evaluation, ensuring our quantile models generalise to unseen batch sizes.
# Reserve 20% for out‑of‑sample evaluation
train, test = train_test_split(
df[['Units', 'Cost']],
test_size=0.2,
random_state=42
)
Fit Quantile Regression Models
- For each target quantile (25th, 50th, 75th percentiles), we specify the formula Cost ~ Units.
- We fit a QuantReg model on the training set at that quantile.
- We print the coefficient table—showing how the intercept and per‑unit cost coefficient vary across cost levels (e.g., higher marginal cost at the upper tail).
quantiles = [0.25, 0.50, 0.75]
models = {}
formula = "Cost ~ Units"
for q in quantiles:
mod = smf.quantreg(formula, train)
res = mod.fit(q=q)
models[q] = res
print(f"\n--- {int(q*100)}th Percentile Coefficients ---")
print(res.summary().tables[1]) # coefficient estimates only
Evaluation with Pinball Loss
- We predict quantile‑specific costs on the test set.
- We compute the pinball loss for each model, an asymmetric loss that penalises under- and over-predictions based on the target percentile. Lower loss indicates better‑calibrated quantile forecasts.
for q, res in models.items():
preds = res.predict(test[['Units']])
loss = mean_pinball_loss(test['Cost'], preds, alpha=q)
print(f"{int(q*100)}th percentile pinball loss: {loss:.2f}")
Summary
By applying quantile regression to manufacturing cost data, we obtain distribution‑aware cost forecasts:
- The 25th‑percentile model sets conservative budget floors for lean production runs.
- The median (50th‑percentile) model predicts typical costs for routine planning.
- The 75th‑percentile model anticipates high‑cost scenarios, guiding contingency reserves for complex, high‑overhead batches.
These quantile insights equip operations and finance teams to allocate capital more precisely—balancing risk and efficiency across the full spectrum of production scenarios.