Factory Maintenance Cost Prediction with Ridge Regression in ML

FREE Online Courses: Enroll Now, Thank us Later!

Planned and corrective maintenance can swallow 15‑40 % of a factory’s operating budget. If the maintenance manager can forecast each production line’s monthly maintenance spend with reasonable accuracy, she can:

  • order spare parts just‑in‑time,
  • defer low‑risk work when cash flow is tight, and
  • justify capital upgrades for machines that have become too costly to keep in service.

Using detailed shop‑floor telemetry and maintenance logs, we will build a Ridge‑regression model that predicts a line’s monthly maintenance cost in USD from routinely stored process and usage variables:

  • operating hours, production count, and machine age
  • average bearing temperature and vibration (RMS mm/s)
  • number of minor stops and breakdowns
  • shift pattern (two‑shift vs three‑shift)
  • calendar month (captures seasonal effects on lubrication and cooling)

Ridge regression maintains a linear relationship and directly interpretable coefficients in dollars, while its L2 penalty stabilises weights when correlated variables (temperature, vibration, and age) move together.

 Libraries Required

  • pandas # load / reshape CSV
  • numpy # numeric helpers
  • matplotlib.pyplot # quick diagnostic plots (optional)
  • scikit‑learn # preprocessing, RidgeCV, metrics
  • joblib # save the fitted model

Dataset Link

Predictive Maintenance Dataset

Step-by-Step Implementation

1. Import Libraries

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.linear_model import RidgeCV
from sklearn.metrics import r2_score, mean_absolute_error
import joblib

2. Load the dataset

df = pd.read_csv("Predictive_maintenance_Dataset.csv")   # adjust if renamed
print(df.head())

3. Basic cleaning

Bring hours, units, temperature and vibration to unit variance so Ridge’s penalty treats them evenly.

required = ['maint_cost_usd','oper_hours','prod_units',
            'avg_temp_c','vibration_rms','machine_age_yrs',
            'minor_stops','breakdowns','shift_pattern','month']
df = df.dropna(subset=required).copy()

4. Feature lists

Converts shift pattern (two‑ vs three‑shift) and calendar month into binary columns; dropping the first level avoids perfect collinearity

num_cols = ['oper_hours','prod_units','avg_temp_c','vibration_rms',
            'machine_age_yrs','minor_stops','breakdowns']
cat_cols = ['shift_pattern','month']
target   = 'maint_cost_usd'

X = df[num_cols + cat_cols]
y = df[target]

5. Pre‑processing + Ridge pipeline

Searches a grid of α values via five‑fold cross‑validation and stores the model that minimises validation error—no manual tuning

preprocess = ColumnTransformer([
        ('cat', OneHotEncoder(drop='first', handle_unknown='ignore'), cat_cols),
        ('num', StandardScaler(),                                  num_cols)
])

ridge = RidgeCV(alphas=[0.1, 1, 10, 50, 100], cv=5)

model = Pipeline([
        ('prep',  preprocess),
        ('ridge', ridge)
])

6. Train‑test split & fitting

X_train, X_test, y_train, y_test = train_test_split(
        X, y, test_size=0.2, random_state=42, shuffle=True)

model.fit(X_train, y_train)

7. Evaluation

pred = model.predict(X_test)

print(f"Selected α (L2) : {model.named_steps['ridge'].alpha_}")
print(f"R² (test set)   : {r2_score(y_test, pred):.3f}")
print(f"MAE (test set)  : ${mean_absolute_error(y_test, pred):,.0f}")

8. Inspecting cost drivers

Coefficients remain dollar figures. Example: a +$3,400 coefficient on vibration_rms (per σ) quantifies how sensitive costs are to elevated machine vibration; a −$1 800 coefficient on month_11 shows typical savings in November versus January.

ohe = model.named_steps['prep'].named_transformers_['cat']
ohe_names = ohe.get_feature_names_out(cat_cols)
feature_names = np.concatenate([ohe_names, num_cols])

coefs = pd.Series(model.named_steps['ridge'].coef_,
                  index=feature_names).sort_values()

print("\nLargest cost reducers:")
print(coefs.head(6))
print("\nLargest cost drivers:")
print(coefs.tail(6))

Numeric coefficients are measured in USD for a one‑standard‑deviation increase. Dummy‑variable coefficients show the dollar shift relative to the reference level (e.g., two‑shift pattern or January).

9. Persist the pipeline

Operating hours, production units, and breakdown count move together. Ridge shrinks unstable weights, improving generalisation while retaining a straightforward linear narrative for finance teams.

joblib.dump(model, "ridge_factory_maint_cost.pkl")

Summary

This Ridge‑regression workflow turns everyday SCADA and maintenance logs into an explainable factory‑maintenance cost predictor:

  • Financial foresight: managers can spot looming budget overruns weeks before invoices land.
  • Actionable levers: every coefficient maps directly to a cost driver: labour hours, machine age, vibration, or seasonality.
  • Stable baseline: any future tree‑ensemble or deep‑learning model must beat this Ridge model’s mean‑absolute error and still provide a cost story the plant controller can trust.

You give me 15 seconds I promise you best tutorials
Please share your happy experience on Google | Facebook

ProjectGurukul Team

ProjectGurukul Team specializes in creating project-based learning resources for programming, Java, Python, Android, AI, Webdevelopment and machine learning. Our mission is to help learners build practical skills through engaging, hands-on projects. We also offer free major and minor projects with source code for engineering students

Leave a Reply

Your email address will not be published. Required fields are marked *