Manufacturing Cost Prediction using Quantile Regression in ML

We offer you a brighter future with FREE online courses - Start Now!!

Manufacturers need to budget for a broad distribution of per‑batch production costs—from low‑volume pilot runs (25th percentile) to large‑scale, high‑overhead batches (75th percentile). Planning based solely on average costs risks under‑provisioning for complex runs or over‑allocating capital to routine production.

In this manufacturing cost prediction ML project, we’ll predict the 25th, 50th, and 75th percentiles of batch manufacturing cost (USD) based on Units Produced, using historical cost data.

By fitting separate quantile regression models, we’ll reveal how per‑unit cost scales differ across lower-, median-, and upper-cost scenarios—enabling finance and operations teams to set conservative reserves, budget for typical runs, and plan for high‑cost cases.

Libraries Required

import pandas as pd                                # Data loading & manipulation  
import numpy as np                                 # Numerical operations  
import statsmodels.formula.api as smf              # Quantile regression via formula API  
from sklearn.model_selection import train_test_split  
from sklearn.metrics import mean_pinball_loss      # Quantile loss metric

Dataset

Manufacturing Cost

Step-by-Step Code Implementation

Load & Inspect Data

We import the CSV, which contains Units produced per batch and the corresponding Manufacturing Cost (USD). The .info() confirms types; .describe() shows cost range and skew (e.g., min, median, max).

# Load the Manufacturing Cost dataset :contentReference[oaicite:0]{index=0}
df = pd.read_csv("manufacturing-cost.csv")

# Inspect structure and cost distribution
print(df.head())
print(df.info())
print(df['Manufacturing Cost'].describe())

Preprocessing & Renaming

We rename Number of Units to Units and Manufacturing Cost to Cost for succinct formulas.
We drop any incomplete records to ensure modeling integrity.

# Rename columns for clarity
df = df.rename(columns={
    'Number of Units': 'Units',
    'Manufacturing Cost': 'Cost'
})

# Drop any rows with missing values
df = df.dropna(subset=['Units', 'Cost'])

Train/Test Split

We hold out 20% of the data at random for evaluation, ensuring our quantile models generalise to unseen batch sizes.

# Reserve 20% for out‑of‑sample evaluation
train, test = train_test_split(
    df[['Units', 'Cost']],
    test_size=0.2,
    random_state=42
)

Fit Quantile Regression Models

For each target quantile (25th, 50th, 75th percentiles), we specify the formula Cost ~ Units.
We fit a QuantReg model on the training set at that quantile.
We print the coefficient table—showing how the intercept and per‑unit cost coefficient vary across cost levels (e.g., higher marginal cost at the upper tail).

quantiles = [0.25, 0.50, 0.75]
models    = {}
formula   = "Cost ~ Units"

for q in quantiles:
    mod = smf.quantreg(formula, train)
    res = mod.fit(q=q)
    models[q] = res
    print(f"\n--- {int(q*100)}th Percentile Coefficients ---")
    print(res.summary().tables[1])   # coefficient estimates only

Evaluation with Pinball Loss

We predict quantile‑specific costs on the test set.
We compute the pinball loss for each model, an asymmetric loss that penalises under- and over-predictions based on the target percentile. Lower loss indicates better‑calibrated quantile forecasts.

for q, res in models.items():
    preds = res.predict(test[['Units']])
    loss  = mean_pinball_loss(test['Cost'], preds, alpha=q)
    print(f"{int(q*100)}th percentile pinball loss: {loss:.2f}")

Summary

By applying quantile regression to manufacturing cost data, we obtain distribution‑aware cost forecasts:

The 25th‑percentile model sets conservative budget floors for lean production runs.
The median (50th‑percentile) model predicts typical costs for routine planning.
The 75th‑percentile model anticipates high‑cost scenarios, guiding contingency reserves for complex, high‑overhead batches.

These quantile insights equip operations and finance teams to allocate capital more precisely—balancing risk and efficiency across the full spectrum of production scenarios.

Did you know we work 24x7 to provide you best tutorials
Please encourage us - write a review on Google | Facebook