Machine Downtime Cost Prediction using Bayesian Regression in ML

FREE Online Courses: Click, Learn, Succeed, Start Now!

Manufacturing operations incur substantial costs whenever equipment is offline—lost production, emergency repairs, and labour inefficiencies. Plant managers need to forecast the hourly downtime cost for each machine—before scheduling maintenance—using early indicators such as operating temperature, vibration level, run‐time since last maintenance, and load factor. Downtime costs per hour are nonlinear with respect to these drivers (e.g., high vibration often signals imminent failure, with steep repair surcharges). They are subject to uncertainty from unplanned breakdowns and part lead times. A single-point-estimate model hides this uncertainty, risking either overstaffing or costly delays. By applying Bayesian Regression, we produce:

1. A point forecast of downtime cost per hour.

2. A credible interval that quantifies forecast uncertainty—enabling risk‐aware maintenance scheduling and budget planning.

Libraries Required

import pandas as pd                              # data loading & manipulation  
import numpy as np                               # numerical operations  

import matplotlib.pyplot as plt                  # plotting  
import seaborn as sns                            # visualization  

import pymc3 as pm                               # Bayesian modeling  
import arviz as az                               # posterior analysis  

from sklearn.model_selection import train_test_split  
from sklearn.preprocessing import StandardScaler  
from sklearn.metrics import mean_absolute_error  

Dataset

Optimisation of Machine Downtime

Step-by-Step Code Implementation

Data Loading & Cost Computation

We multiply recorded Downtime_Hours by a fixed loss rate of $ 2,000/hour.

import pandas as pd

# Load sensor & downtime data
df = pd.read_csv("data/machine_downtime.csv")

# Assume 'Downtime_Hours' and a constant cost_per_hour
cost_per_hour = 2000.0  # USD lost per hour of downtime

# Compute downtime cost
df['downtime_cost'] = df['Downtime_Hours'] * cost_per_hour

# Preview relevant columns
df[['Operating_Temperature','Vibration_Level',
    'Hours_Since_Maintenance','Load_Factor',
    'Downtime_Hours','downtime_cost']].head()

Preprocessing & Train/Test Split

Z‑scoring sensor and runtime features ensure uniform priors and stable sampler behaviour.

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

# Features and target
features = ['Operating_Temperature','Vibration_Level',
            'Hours_Since_Maintenance','Load_Factor']
X = df[features].values
y = df['downtime_cost'].values

# Random 80/20 split
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# Standardize features for stable MCMC
scaler = StandardScaler().fit(X_train)
X_train_s = scaler.transform(X_train)
X_test_s  = scaler.transform(X_test)

Define & Fit Bayesian Regression Model

Model priors:

  • α ∼ Normal(0, 1e4) accommodates large‐scale costs.
  • β ∼ Normal(0, 1e3) reflects moderate uncertainty per standardised unit.
  • σ ∼ HalfNormal(1e4) enforces positive residual variability at cost scale.

Bayesian model: Observed downtime_cost ∼ Normal(α + β·X_standardized, σ).

MCMC sampling: We draw 2,000 posterior samples (plus 1,000 burn‑in) with target_accept=0.9 for reliable convergence.

import pymc3 as pm

with pm.Model() as downtime_model:
    # Priors
    α = pm.Normal("α", mu=0, sigma=1e4)                            # intercept
    β = pm.Normal("β", mu=0, sigma=1e3, shape=X_train_s.shape[1])  # slopes
    σ = pm.HalfNormal("σ", sigma=1e4)                              # noise scale

    # Linear predictor
    μ = α + pm.math.dot(X_train_s, β)

    # Likelihood
    Y_obs = pm.Normal("Y_obs", mu=μ, sigma=σ, observed=y_train)

    # MCMC sampling
    trace = pm.sample(
        draws=2000,       # posterior draws
        tune=1000,        # burn‐in
        target_accept=0.9,
        return_inferencedata=True
    )

Posterior Analysis & Point Predictions

  • Posterior predictive: Generating Y_obs samples yields full predictive distributions, allowing us to compute both posterior mean forecasts and 94% Highest Posterior Density intervals.
  • Evaluation: Mean Absolute Error (MAE) on held‑out data quantifies the average point‑forecast error in USD.
import arviz as az
from sklearn.metrics import mean_absolute_error

# Summarize posterior distributions
az.summary(trace, round_to=2)

# Posterior predictive sampling
with downtime_model:
    ppc = pm.sample_posterior_predictive(trace, var_names=["Y_obs"])

# Posterior mean estimates
α_post = trace.posterior["α"].mean().item()
β_post = trace.posterior["β"].mean(dim=["chain","draw"]).values

# Compute point predictions on test set
y_pred = α_post + X_test_s.dot(β_post)

# Evaluate MAE
mae = mean_absolute_error(y_test, y_pred)
print(f"Test MAE: ${mae:.2f}")

Visualise Predictions & Credible Intervals

By varying the vibration level while holding other features at their medians, we plot both the expected downtime cost curve and its credible band, illuminating how machine wear signals drive cost and how uncertain those forecasts are.

import numpy as np
import matplotlib.pyplot as plt

# Sweep vibration level; hold other features at median
vib_grid = np.linspace(X_train_s[:,1].min(), X_train_s[:,1].max(), 100)
grid = np.median(X_train_s, axis=0)[None,:].repeat(100, axis=0)
grid[:,1] = vib_grid

with downtime_model:
    ppc_grid = pm.sample_posterior_predictive(
        trace, var_names=["Y_obs"], samples=1000
    )

preds     = ppc_grid["Y_obs"]
mean_pred = preds.mean(axis=0)
hpd       = az.hdi(preds, hdi_prob=0.94)

# Back‐transform vibration level
vib_orig = scaler.inverse_transform(grid)[:,1]

plt.figure(figsize=(8,5))
plt.plot(vib_orig, mean_pred, label="Posterior mean")
plt.fill_between(vib_orig, hpd[:,0], hpd[:,1], alpha=0.3,
                 label="94% credible interval")
plt.scatter(
    scaler.inverse_transform(X_test_s)[:,1],
    y_test, color="k", alpha=0.5, label="Test data"
)
plt.xlabel("Vibration Level")
plt.ylabel("Downtime Cost (USD)")
plt.title("Bayesian Regression: Cost vs. Vibration")
plt.legend()
plt.tight_layout()
plt.show()

Summary

This Bayesian Regression pipeline for Machine Downtime Cost Prediction delivers:

1. Accurate point estimates of per‑hour downtime cost from real‑time sensor and maintenance data.

2. Credible intervals quantifying forecast uncertainty—critical for risk‑aware maintenance planning.

3. Actionable insights: operations teams can schedule preventive maintenance, allocate spare parts budgets, and negotiate service contracts with clarity on both expected costs and their uncertainty bounds.

Did you like this article? If Yes, please give ProjectGurukul 5 Stars on Google | Facebook

ProjectGurukul Team

The ProjectGurukul Team delivers project-based tutorials on programming, machine learning, and web development. We simplify learning by providing hands-on projects to help you master real-world skills. We also provide free major and minor projects for enginering students.

Leave a Reply

Your email address will not be published. Required fields are marked *