Factory Output Cost Prediction using Bayesian Regression in ML
We offer you a brighter future with FREE online courses - Start Now!!
Manufacturing directors need to predict per‑unit production cost for a factory given planned output volumes, machine run‑times, raw‑material usage, and labour hours—before finalising production schedules. Costs often vary nonlinearly: per‑unit cost decreases with higher volumes (economies of scale) but can increase if overtime or excess material waste arises. Uncertainty in parameters (e.g., material yield rates) further complicates planning. By applying Bayesian Regression, we obtain not only a point estimate of cost but also credible intervals that reflect parameter uncertainty. This enables more robust budgeting and risk‑aware decision‑making.
Libraries Required
import pandas as pd # data handling import numpy as np # numerical operations import matplotlib.pyplot as plt # plotting import seaborn as sns # visualization import pymc3 as pm # Bayesian modeling import arviz as az # posterior analysis from sklearn.model_selection import train_test_split from sklearn.preprocessing import StandardScaler from sklearn.metrics import mean_absolute_error
Dataset
Step-by-Step Code Implementation
Import Libraries & Load Data
import pandas as pd
# Load dataset: columns 'UnitsProduced' and 'TotalCost'
df = pd.read_csv("data/manufacturing_cost.csv")
df.head()
Preprocessing & Train/Test Split
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
# Features and target
X = df[["UnitsProduced"]].values
y = df["TotalCost"].values
# Train/test split (80/20)
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42
)
# Standardize feature for stable MCMC
scaler = StandardScaler().fit(X_train)
X_train_s = scaler.transform(X_train)
X_test_s = scaler.transform(X_test)
Define & Fit Bayesian Regression Model
- Likelihood: Observed costs are modelled as Normal around μ = α + β·X, capturing a linear relationship under standardisation.
- MCMC Sampling: We draw 2000 posterior samples (after 1000 tuning) with target_accept=0.9 for stable inference.
- Priors (α, β, σ): We choose weakly informative normals for intercept and slope, and a half‑normal for noise scale, reflecting moderate prior uncertainty.
import pymc3 as pm
with pm.Model() as model:
# Priors for intercept and slope
α = pm.Normal("α", mu=0, sigma=10)
β = pm.Normal("β", mu=0, sigma=10)
σ = pm.HalfNormal("σ", sigma=10)
# Expected cost
μ = α + β * X_train_s.flatten()
# Likelihood
Y_obs = pm.Normal("Y_obs", mu=μ, sigma=σ, observed=y_train)
# Sample posterior
trace = pm.sample(
draws=2000, tune=1000,
target_accept=0.9,
return_inferencedata=True
)
Posterior Analysis & Prediction
- Posterior Predictive: Sampling from the posterior predictive distribution yields cost forecasts and uncertainty bands.
- Prediction & MAE: We use posterior means of α and β for point predictions and compute mean absolute error on the hold‑out set.
import arviz as az
# Trace summary
az.summary(trace, round_to=2)
# Posterior predictive sampling
with model:
ppc = pm.sample_posterior_predictive(trace, var_names=["α","β","σ","Y_obs"])
# Compute mean prediction on test set
α_post = ppc["α"].mean()
β_post = ppc["β"].mean()
# Predict on standardized test volumes
y_pred = α_post + β_post * X_test_s.flatten()
# Compute MAE
from sklearn.metrics import mean_absolute_error
mae = mean_absolute_error(y_test, y_pred)
print(f"Test MAE: ${mae:.2f}")
Visualise Predictions with Credible Intervals
We plot the observed test data, the posterior mean line, and the 94% highest posterior density credible interval to display both the prediction and the uncertainty.
# Generate grid of standardized volumes
X_grid_s = np.linspace(X_test_s.min(), X_test_s.max(), 100)
# Draw posterior samples for predictions
pred_samples = (
ppc["α"][:, None]
+ ppc["β"][:, None] * X_grid_s[None, :]
)
# Compute mean and 94% credible interval
pred_mean = pred_samples.mean(axis=0)
hpd_bounds = az.hdi(pred_samples, hdi_prob=0.94)
import matplotlib.pyplot as plt
# Transform back to original scale
X_grid = scaler.inverse_transform(X_grid_s.reshape(-1,1)).flatten()
plt.figure(figsize=(8,5))
plt.scatter(X_test, y_test, color="k", alpha=0.5, label="Test data")
plt.plot(X_grid, pred_mean, color="blue", label="Posterior mean")
plt.fill_between(
X_grid,
hpd_bounds[:,0],
hpd_bounds[:,1],
color="blue", alpha=0.3,
label="94% Credible interval"
)
plt.xlabel("Units Produced")
plt.ylabel("Total Cost (USD)")
plt.title("Factory Output Cost Prediction with Bayesian Regression")
plt.legend()
plt.show()
Summary
This Bayesian Regression approach:
1. Captures parameter uncertainty, providing credible intervals around cost forecasts, not just point estimates.
2. Handles limited data gracefully, thanks to priors that regularise slope and intercept
3. Delivers actionable insights: managers can see not only the expected cost given the planned output but also the range of plausible costs, informing risk‑aware budgeting and contingency planning.