Factory Output Cost Prediction using Bayesian Regression in ML

We offer you a brighter future with FREE online courses - Start Now!!

Manufacturing directors need to predict per‑unit production cost for a factory given planned output volumes, machine run‑times, raw‑material usage, and labour hours—before finalising production schedules. Costs often vary nonlinearly: per‑unit cost decreases with higher volumes (economies of scale) but can increase if overtime or excess material waste arises. Uncertainty in parameters (e.g., material yield rates) further complicates planning. By applying Bayesian Regression, we obtain not only a point estimate of cost but also credible intervals that reflect parameter uncertainty. This enables more robust budgeting and risk‑aware decision‑making.

Libraries Required

import pandas as pd                               # data handling  
import numpy as np                                # numerical operations  

import matplotlib.pyplot as plt                   # plotting  
import seaborn as sns                             # visualization  

import pymc3 as pm                                # Bayesian modeling  
import arviz as az                                # posterior analysis  

from sklearn.model_selection import train_test_split  
from sklearn.preprocessing import StandardScaler  
from sklearn.metrics import mean_absolute_error   

Dataset

Manufacturing Cost

Step-by-Step Code Implementation

Import Libraries & Load Data

import pandas as pd

# Load dataset: columns 'UnitsProduced' and 'TotalCost'
df = pd.read_csv("data/manufacturing_cost.csv")
df.head()

Preprocessing & Train/Test Split

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

# Features and target
X = df[["UnitsProduced"]].values
y = df["TotalCost"].values

# Train/test split (80/20)
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# Standardize feature for stable MCMC
scaler = StandardScaler().fit(X_train)
X_train_s = scaler.transform(X_train)
X_test_s  = scaler.transform(X_test)

Define & Fit Bayesian Regression Model

  • Likelihood: Observed costs are modelled as Normal around μ = α + β·X, capturing a linear relationship under standardisation.
  • MCMC Sampling: We draw 2000 posterior samples (after 1000 tuning) with target_accept=0.9 for stable inference.
  • Priors (α, β, σ): We choose weakly informative normals for intercept and slope, and a half‑normal for noise scale, reflecting moderate prior uncertainty.
import pymc3 as pm

with pm.Model() as model:
    # Priors for intercept and slope
    α = pm.Normal("α", mu=0, sigma=10)
    β = pm.Normal("β", mu=0, sigma=10)
    σ = pm.HalfNormal("σ", sigma=10)
    
    # Expected cost
    μ = α + β * X_train_s.flatten()
    
    # Likelihood
    Y_obs = pm.Normal("Y_obs", mu=μ, sigma=σ, observed=y_train)
    
    # Sample posterior
    trace = pm.sample(
        draws=2000, tune=1000,
        target_accept=0.9,
        return_inferencedata=True
    )

Posterior Analysis & Prediction

  • Posterior Predictive: Sampling from the posterior predictive distribution yields cost forecasts and uncertainty bands.
  • Prediction & MAE: We use posterior means of α and β for point predictions and compute mean absolute error on the hold‑out set.
import arviz as az

# Trace summary
az.summary(trace, round_to=2)

# Posterior predictive sampling
with model:
    ppc = pm.sample_posterior_predictive(trace, var_names=["α","β","σ","Y_obs"])

# Compute mean prediction on test set
α_post = ppc["α"].mean()
β_post = ppc["β"].mean()
# Predict on standardized test volumes
y_pred = α_post + β_post * X_test_s.flatten()

# Compute MAE
from sklearn.metrics import mean_absolute_error
mae = mean_absolute_error(y_test, y_pred)
print(f"Test MAE: ${mae:.2f}")

Visualise Predictions with Credible Intervals

We plot the observed test data, the posterior mean line, and the 94% highest posterior density credible interval to display both the prediction and the uncertainty.

# Generate grid of standardized volumes
X_grid_s = np.linspace(X_test_s.min(), X_test_s.max(), 100)
# Draw posterior samples for predictions
pred_samples = (
    ppc["α"][:, None]
    + ppc["β"][:, None] * X_grid_s[None, :]
)

# Compute mean and 94% credible interval
pred_mean = pred_samples.mean(axis=0)
hpd_bounds = az.hdi(pred_samples, hdi_prob=0.94)

import matplotlib.pyplot as plt

# Transform back to original scale
X_grid = scaler.inverse_transform(X_grid_s.reshape(-1,1)).flatten()

plt.figure(figsize=(8,5))
plt.scatter(X_test, y_test, color="k", alpha=0.5, label="Test data")
plt.plot(X_grid, pred_mean, color="blue", label="Posterior mean")
plt.fill_between(
    X_grid,
    hpd_bounds[:,0],
    hpd_bounds[:,1],
    color="blue", alpha=0.3,
    label="94% Credible interval"
)
plt.xlabel("Units Produced")
plt.ylabel("Total Cost (USD)")
plt.title("Factory Output Cost Prediction with Bayesian Regression")
plt.legend()
plt.show()

 Summary

This Bayesian Regression approach:

1. Captures parameter uncertainty, providing credible intervals around cost forecasts, not just point estimates.

2. Handles limited data gracefully, thanks to priors that regularise slope and intercept

3. Delivers actionable insights: managers can see not only the expected cost given the planned output but also the range of plausible costs, informing risk‑aware budgeting and contingency planning.

Your opinion matters
Please write your valuable feedback about ProjectGurukul on Google | Facebook

ProjectGurukul Team

The ProjectGurukul Team delivers project-based tutorials on programming, machine learning, and web development. We simplify learning by providing hands-on projects to help you master real-world skills. We also provide free major and minor projects for enginering students.

Leave a Reply

Your email address will not be published. Required fields are marked *