Manufacturing Output Prediction using Quantile Regression in ML
FREE Online Courses: Transform Your Career – Enroll for Free!
In high‑mix, low‑volume manufacturing environments, batch output (units produced per run) can vary drastically due to machine performance, material variability, shift patterns, and energy fluctuations. Planners and operations managers need to prepare not only for the average output but also for the range—anticipating low‐yield runs (10th percentile), typical performance (50th percentile), and over‐performance peaks (90th percentile).
In this manufacturing output prediction ML project, we will predict the 10th, 50th, and 90th percentiles of batch Units Produced using process features such as Production Time Hours, Material Usage (kg), Energy Consumption (kWh), Machine ID, and Shift by fitting separate quantile regression models. These distribution‑aware forecasts will enable more robust scheduling, capacity planning, and variance reduction strategies.
Libraries Required
import pandas as pd # Data loading & manipulation import numpy as np # Numerical operations import statsmodels.formula.api as smf # Quantile regression via formula API from sklearn.model_selection import train_test_split from sklearn.metrics import mean_pinball_loss # Proper loss for quantile forecasts
Dataset
Step-by-Step Code Implementation
Load & Inspect Data
We load the Kaggle “Manufacturing Production Data”, which contains per-run metrics such as production time, material usage, energy consumption, batch output, and categorical fields Machine_ID and Shift. Initial .info() and .describe() confirm data completeness and reveal output variability.
# Load the Manufacturing Production Data
# Source: Kaggle – Manufacturing Production Data :contentReference[oaicite:1]{index=1}
df = pd.read_csv("manufacturing-production-data.csv")
# Inspect structure and key distributions
print(df.head())
print(df.info())
print(df['Units_Produced'].describe())
Preprocessing & Feature Engineering
- Rows missing any core metric are dropped to ensure modeling integrity.
- Categorical fields (Machine_ID, Shift) are transformed via one‑hot encoding (dropping the first level) to capture machine‑ and shift‑specific effects.
- We assemble a predictor list: three continuous process variables plus the dummy columns; the response is renamed to Output for clarity.
# Drop rows with missing core variables
df = df.dropna(subset=[
'Production_Time_Hours','Material_Usage_kg',
'Energy_Consumption_kWh','Machine_ID','Shift','Units_Produced'
])
# One-hot encode categorical features: Machine_ID and Shift
df_enc = pd.get_dummies(df,
columns=['Machine_ID','Shift'],
drop_first=True
)
# Define predictors and response
features = [
'Production_Time_Hours',
'Material_Usage_kg',
'Energy_Consumption_kWh'
] + [c for c in df_enc.columns
if c.startswith('Machine_ID_') or c.startswith('Shift_')]
data = df_enc[features + ['Units_Produced']].rename(
columns={'Units_Produced':'Output'}
)
Train/Test Split
An 80/20 random split reserves 20% of batch records for out‑of‑sample evaluation. This checks how well quantile models generalize to new production runs.
# Reserve 20% of batches for out‐of‐sample evaluation
train, test = train_test_split(
data, test_size=0.2, random_state=42
)
Fit Quantile Regression Models
For each quantile (10th, 50th, 90th):
- We build a formula string, e.g., “Output ~ Production_Time_Hours + Material_Usage_kg + … + Machine_ID_B + Shift_Night”
- We fit a QuantReg model on the training set at that percentile.
- We print the coefficient table (.tables[1]), revealing how each predictor’s marginal effect varies across low‑, median‑, and high‑output scenarios—e.g., energy consumption might drive up the upper‐tail output more strongly than the median.
quantiles = [0.10, 0.50, 0.90]
models = {}
formula = "Output ~ " + " + ".join(features)
for q in quantiles:
mod = smf.quantreg(formula, train)
res = mod.fit(q=q)
models[q] = res
print(f"\n--- {int(q*100)}th Percentile Coefficients ---")
print(res.summary().tables[1]) # show coefficient table only
Evaluation with Pinball Loss
- We predict quantile‑specific outputs on the test set.
- We compute pinball loss for each quantile—a proper scoring rule for quantile forecasts that penalizes under‑ and over‑predictions asymmetrically according to the target percentile. Lower pinball loss indicates better‑calibrated quantile models, ensuring reliable distribution‑aware output planning.
for q, res in models.items():
preds = res.predict(test[features])
loss = mean_pinball_loss(test['Output'], preds, alpha=q)
print(f"{int(q*100)}th percentile pinball loss: {loss:.2f}")
Summary
By applying quantile regression to manufacturing‐production data, we obtain distribution‑aware output forecasts that support:
- Conservative Planning (10th Percentile): Preparing for low‐yield runs by understanding the drivers of worst‐case output scenarios.
- Baseline Forecasting (50th Percentile): Modeling typical batch performance to drive daily scheduling and capacity alignment.
- Peak Performance Insights (90th Percentile): Capturing conditions that lead to exceptionally high output—informing best‐practice replication and target‐setting.
These quantile‐specific models equip operations managers with nuanced, risk‑sensitive forecasts—optimizing resource allocation, improving scheduling robustness, and reducing variability in manufacturing throughput.