Factory Maintenance Cost Prediction with Ridge Regression in ML
FREE Online Courses: Enroll Now, Thank us Later!
Planned and corrective maintenance can swallow 15‑40 % of a factory’s operating budget. If the maintenance manager can forecast each production line’s monthly maintenance spend with reasonable accuracy, she can:
- order spare parts just‑in‑time,
- defer low‑risk work when cash flow is tight, and
- justify capital upgrades for machines that have become too costly to keep in service.
Using detailed shop‑floor telemetry and maintenance logs, we will build a Ridge‑regression model that predicts a line’s monthly maintenance cost in USD from routinely stored process and usage variables:
- operating hours, production count, and machine age
- average bearing temperature and vibration (RMS mm/s)
- number of minor stops and breakdowns
- shift pattern (two‑shift vs three‑shift)
- calendar month (captures seasonal effects on lubrication and cooling)
Ridge regression maintains a linear relationship and directly interpretable coefficients in dollars, while its L2 penalty stabilises weights when correlated variables (temperature, vibration, and age) move together.
Libraries Required
- pandas # load / reshape CSV
- numpy # numeric helpers
- matplotlib.pyplot # quick diagnostic plots (optional)
- scikit‑learn # preprocessing, RidgeCV, metrics
- joblib # save the fitted model
Dataset Link
Predictive Maintenance Dataset
Step-by-Step Implementation
1. Import Libraries
import pandas as pd import numpy as np import matplotlib.pyplot as plt from sklearn.model_selection import train_test_split from sklearn.preprocessing import StandardScaler, OneHotEncoder from sklearn.compose import ColumnTransformer from sklearn.pipeline import Pipeline from sklearn.linear_model import RidgeCV from sklearn.metrics import r2_score, mean_absolute_error import joblib
2. Load the dataset
df = pd.read_csv("Predictive_maintenance_Dataset.csv") # adjust if renamed
print(df.head())
3. Basic cleaning
Bring hours, units, temperature and vibration to unit variance so Ridge’s penalty treats them evenly.
required = ['maint_cost_usd','oper_hours','prod_units',
'avg_temp_c','vibration_rms','machine_age_yrs',
'minor_stops','breakdowns','shift_pattern','month']
df = df.dropna(subset=required).copy()
4. Feature lists
Converts shift pattern (two‑ vs three‑shift) and calendar month into binary columns; dropping the first level avoids perfect collinearity
num_cols = ['oper_hours','prod_units','avg_temp_c','vibration_rms',
'machine_age_yrs','minor_stops','breakdowns']
cat_cols = ['shift_pattern','month']
target = 'maint_cost_usd'
X = df[num_cols + cat_cols]
y = df[target]
5. Pre‑processing + Ridge pipeline
Searches a grid of α values via five‑fold cross‑validation and stores the model that minimises validation error—no manual tuning
preprocess = ColumnTransformer([
('cat', OneHotEncoder(drop='first', handle_unknown='ignore'), cat_cols),
('num', StandardScaler(), num_cols)
])
ridge = RidgeCV(alphas=[0.1, 1, 10, 50, 100], cv=5)
model = Pipeline([
('prep', preprocess),
('ridge', ridge)
])
6. Train‑test split & fitting
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42, shuffle=True)
model.fit(X_train, y_train)
7. Evaluation
pred = model.predict(X_test)
print(f"Selected α (L2) : {model.named_steps['ridge'].alpha_}")
print(f"R² (test set) : {r2_score(y_test, pred):.3f}")
print(f"MAE (test set) : ${mean_absolute_error(y_test, pred):,.0f}")
8. Inspecting cost drivers
Coefficients remain dollar figures. Example: a +$3,400 coefficient on vibration_rms (per σ) quantifies how sensitive costs are to elevated machine vibration; a −$1 800 coefficient on month_11 shows typical savings in November versus January.
ohe = model.named_steps['prep'].named_transformers_['cat']
ohe_names = ohe.get_feature_names_out(cat_cols)
feature_names = np.concatenate([ohe_names, num_cols])
coefs = pd.Series(model.named_steps['ridge'].coef_,
index=feature_names).sort_values()
print("\nLargest cost reducers:")
print(coefs.head(6))
print("\nLargest cost drivers:")
print(coefs.tail(6))
Numeric coefficients are measured in USD for a one‑standard‑deviation increase. Dummy‑variable coefficients show the dollar shift relative to the reference level (e.g., two‑shift pattern or January).
9. Persist the pipeline
Operating hours, production units, and breakdown count move together. Ridge shrinks unstable weights, improving generalisation while retaining a straightforward linear narrative for finance teams.
joblib.dump(model, "ridge_factory_maint_cost.pkl")
Summary
This Ridge‑regression workflow turns everyday SCADA and maintenance logs into an explainable factory‑maintenance cost predictor:
- Financial foresight: managers can spot looming budget overruns weeks before invoices land.
- Actionable levers: every coefficient maps directly to a cost driver: labour hours, machine age, vibration, or seasonality.
- Stable baseline: any future tree‑ensemble or deep‑learning model must beat this Ridge model’s mean‑absolute error and still provide a cost story the plant controller can trust.