Facility Maintenance Cost Trend Prediction with Polynomial Regression in ML

FREE Online Courses: Elevate Skills, Zero Cost. Enroll Now!

Facility managers and corporate real‑estate teams need to forecast annual maintenance costs (USD) for a portfolio of industrial and commercial buildings, based on early indicators—such as facility age, square footage, number of critical systems (HVAC, elevators), past preventive‐maintenance spend, and regional labour-rate indices—before budgeting cycles close.

Maintenance costs grow nonlinearly with facility age (ageing systems need exponentially more upkeep), interact with facility size (larger footprints amplify unit costs), and are tempered by past preventive investments. A simple linear model underestimates these curvatures; an unrestricted high‑degree polynomial overfits noise. By applying Polynomial Regression to thoughtfully engineered features with Ridge (ℓ²) regularisation, we can capture smooth cost‑trend dynamics and deliver reliable, interpretable forecasts for proactive budgeting.

Libraries Required

import pandas as pd                            # data loading & handling  
import numpy as np                             # numerical operations  

import matplotlib.pyplot as plt                # plotting  
import seaborn as sns                          # enhanced visualization  

from sklearn.model_selection import train_test_split, GridSearchCV  
from sklearn.preprocessing import StandardScaler, PolynomialFeatures  
from sklearn.linear_model import Ridge  
from sklearn.pipeline import Pipeline  
from sklearn.metrics import mean_squared_error, r2_score

Dataset

Predictive Maintenance Dataset

Step-by-Step Code Implementation

Load Data & Inspect

import pandas as pd

# Load and preview
df = pd.read_csv("data/predictive_maintenance.csv")
df = df.rename(columns={
    'age': 'Facility_Age',
    'usage': 'Annual_Usage_Hours',
    'cost': 'Last_Maint_Cost_USD'
})
df.head()[[
    'Facility_Age','Square_Footage','Num_Critical_Systems',
    'Annual_Usage_Hours','Last_Maint_Cost_USD','Labor_Rate_Index'
]]

Feature Engineering & Target

Feature normalisation: StandardScaler zero‑means and unit‑scales all inputs so the ℓ² penalty treats them uniformly.

# Target: we predict this year's maintenance cost trend
# Simulate next-year cost as a placeholder for supervised learning
df['Next_Year_Cost_USD'] = df['Last_Maint_Cost_USD'] * (
    1 + 0.02 * df['Facility_Age'] / 10  # cost grows ~2% per decade of age
    + 0.0005 * df['Square_Footage']
)

# Define features and target
X = df[[
    'Facility_Age', 
    'Square_Footage', 
    'Num_Critical_Systems', 
    'Annual_Usage_Hours', 
    'Last_Maint_Cost_USD', 
    'Labor_Rate_Index'
]]
y = df['Next_Year_Cost_USD']

Build a Polynomial Regression Pipeline

Polynomial expansion: PolynomialFeatures adds squares and interaction terms (e.g., Facility_Age², Last_Maint_Cost_USD×Annual_Usage_Hours) to capture nonlinear ageing and usage effects on cost.

from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler, PolynomialFeatures
from sklearn.linear_model import Ridge

pipe = Pipeline([
    ('scale', StandardScaler()),  
    ('poly', PolynomialFeatures(include_bias=False)),  
    ('ridge', Ridge(random_state=42))  
])

Train/Test Split & Hyperparameter Search

Hyperparameter tuning: grid‑searches polynomial degree (1–3) and α (10⁻³…10³) via 5‑fold CV, optimising for lowest RMSE on held‑out data.
Ridge regression: applies an ℓ² penalty (alpha) to shrink noisy high‑order coefficients, mitigating overfitting in the expanded feature space.

from sklearn.model_selection import train_test_split, GridSearchCV
import numpy as np

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

param_grid = {
    'poly__degree': [1, 2, 3],
    'ridge__alpha': np.logspace(-3, 3, 7)
}

gs = GridSearchCV(
    pipe, param_grid,
    cv=5,
    scoring='neg_root_mean_squared_error',
    n_jobs=-1, verbose=1
)
gs.fit(X_train, y_train)

print("Best polynomial degree:", gs.best_params_['poly__degree'])
print("Best Ridge α          :", gs.best_params_['ridge__alpha'])

Evaluate Model

from sklearn.metrics import mean_squared_error, r2_score

y_pred = gs.predict(X_test)
rmse = mean_squared_error(y_test, y_pred, squared=False)
r2   = r2_score(y_test, y_pred)

print(f"Test RMSE: ${rmse:,.0f}")
print(f"Test R²  : {r2:.3f}")

Inspect Key Polynomial Coefficients

Interpretability: inspecting the most significant absolute coefficients reveals which nonlinear or interaction terms (e.g., square footage × labour index) most drive maintenance‑cost trends, guiding strategic investments in preventive upkeep.

# Retrieve expanded feature names
poly   = gs.best_estimator_.named_steps['poly']
feat_names = poly.get_feature_names_out(input_features=X.columns)
coefs     = gs.best_estimator_.named_steps['ridge'].coef_

import pandas as pd
imp = pd.Series(coefs, index=feat_names) \
         .abs().sort_values(ascending=False).head(10)

import matplotlib.pyplot as plt
plt.figure(figsize=(8,5))
imp.plot(kind='barh')
plt.gca().invert_yaxis()
plt.title("Top Polynomial Features Driving Maintenance Cost")
plt.xlabel("Coefficient Magnitude")
plt.tight_layout()
plt.show()

Summary

By integrating polynomial feature engineering with Ridge regularisation in a concise pipeline, this workflow:

1. Captures nonlinear cost growth due to facility ageing, scale, and usage intensity.

2. Balances model complexity via α‑tuning, avoiding overfitting to outlier facilities.

3. Provides clear, actionable insights: key polynomial features highlight where preventive maintenance or capital reinvestment will most reduce future cost spikes, enabling data‑driven budgeting and asset‑management decisions.

Your 15 seconds will encourage us to work even harder
Please share your happy experience on Google | Facebook

Facility Maintenance Cost Trend Prediction with Polynomial Regression in ML

Libraries Required

Dataset

Step-by-Step Code Implementation

Load Data & Inspect

Feature Engineering & Target

Build a Polynomial Regression Pipeline

Train/Test Split & Hyperparameter Search

Evaluate Model

Inspect Key Polynomial Coefficients