Factory Downtime Prediction using Quantile Regression in ML

FREE Online Courses: Enroll Now, Thank us Later!

Unplanned equipment downtime in manufacturing can vary dramatically—from brief stoppages to multi‑hour outages—and operators need to plan for this uncertainty. Instead of forecasting only the average downtime, we’ll predict multiple quantiles (for example, the 25th, 50th, and 75th percentiles) of downtime duration (in minutes) for production incidents.

By fitting separate quantile regression models, we’ll uncover how key factors—such as machine type, operator experience, shift timing, and past performance—drive lower-, median-, and upper‑tail downtime differently.

These tailored insights enable maintenance planners to allocate resources proactively for typical scenarios while preparing for worst‑case downtime events.

Libraries Required

import pandas as pd                      # Data loading & manipulation  
import numpy as np                       # Numerical operations  
import statsmodels.formula.api as smf    # Quantile regression via formula API  
from sklearn.model_selection import train_test_split  # Train/test split  
from sklearn.metrics import mean_pinball_loss        # Quantile-specific loss

Dataset

Manufacturing Efficiency in Downtime Operations

Step-by-Step Code Implementation

Load & Inspect Data

We load a dataset detailing machine downtime events—including machine type, operator experience, shift, previous downtime, and actual downtime in minutes—then inspect its row/column structure and summary statistics for Downtime_minutes to understand its distribution (mean, median, tails).

# Load the downtime dataset  
df = pd.read_csv("predict_manufacturing_downtime_performance_dataset.csv")

# Inspect structure and summary statistics
print(df.head())
print(df.info())
print(df['Downtime_minutes'].describe())

Preprocessing & Feature Engineering

Categorical variables (Machine_Type, Shift) are one‑hot encoded, dropping the first category to avoid multicollinearity.
We ensure no missing or invalid downtime entries remain.
We define our predictor list by excluding the target (Downtime_minutes) and any identifiers.
We rename the response column to Downtime for brevity.

# Assume columns include: 
# 'Machine_Type', 'Operator_Experience_years', 'Shift', 'Prev_Downtime_minutes', 'Downtime_minutes'

# One-hot encode categorical variables
df_enc = pd.get_dummies(df, 
                        columns=['Machine_Type','Shift'], 
                        drop_first=True)

# Drop any rows with missing or zero downtime (if these are invalid)
df_enc = df_enc.dropna(subset=['Operator_Experience_years',
                               'Prev_Downtime_minutes','Downtime_minutes'])
df_enc = df_enc[df_enc['Downtime_minutes'] >= 0]

# Define predictors and response
features = [c for c in df_enc.columns 
            if c not in ['Downtime_minutes','Machine_ID']]
data = df_enc[features + ['Downtime_minutes']]
data.rename(columns={'Downtime_minutes':'Downtime'}, inplace=True)

Train/Test Split

We randomly hold out 20% of records to evaluate how well our quantile forecasts generalize to unseen downtime events.

# Reserve 20% of data for evaluation
train, test = train_test_split(data, test_size=0.2, random_state=42)

Fit Quantile Regression Models

For each quantile (25th, 50th, 75th):

We build a formula string linking Downtime to all predictors.
We fit a QuantReg model at that percentile on the training set.
We print only the coefficient table (from .summary().tables[1]), which shows how each predictor’s marginal effect varies across the lower, median, and upper downtime distribution.

quantiles = [0.25, 0.50, 0.75]
results   = {}
formula   = "Downtime ~ " + " + ".join(features)

for q in quantiles:
    model = smf.quantreg(formula, train)
    res   = model.fit(q=q)
    results[q] = res
    print(f"\n--- {int(q*100)}th Percentile Coefficients ---")
    print(res.summary().tables[1])   # coefficient table only

Evaluation with Pinball Loss

We predict quantile‑specific downtime values on the test set.
We compute pinball loss—the recommended loss for quantile regression—for each quantile. Lower pinball loss indicates more accurate quantile forecasts, balancing under‑ and over‑prediction penalties appropriately for each percentile.

for q, res in results.items():
    preds = res.predict(test[features])
    loss  = mean_pinball_loss(test['Downtime'], preds, alpha=q)
    print(f"{int(q*100)}th percentile pinball loss: {loss:.2f}")

Summary

By modelling multiple quantiles of machine downtime rather than just the mean, maintenance planners gain distribution‑aware forecasts:

The 25th‑percentile model highlights conditions under which downtime is unusually short, informing efficient scheduling of routine tasks.
The median (50th‑percentile) model predicts typical downtime durations for day‑to‑day planning.
The 75th‑percentile model focuses on heavier downtime events, enabling reserve capacity and proactive spare‑parts logistics.

These tailored quantile estimates empower operations teams to strategically allocate maintenance crews and spare parts, reducing production risk in both standard and extreme downtime scenarios.

You give me 15 seconds I promise you best tutorials
Please share your happy experience on Google | Facebook