Maintenance Fee Prediction using using Linear Regression in ML

FREE Online Courses: Click for Success, Learn for Free - Start Now!

Facility‑management teams must forecast the expected cost of upcoming maintenance work orders so they can set realistic budgets, negotiate vendor contracts, and prioritise preventive jobs. Using historical work‑order records from twelve North American university campuses, we will build a linear‑regression baseline that predicts a work order’s total maintenance fee (USD) from information known at the time the ticket is raised: labour hours, job type (planned vs unplanned), system, subsystem, building age, and current local weather.

A transparent linear model exposes first-order cost drivers—labour time, HVAC versus electrical systems, and emergency requests—and supplies a factual benchmark before experimenting with regularised or tree-based regressors.

Libraries Required

  • pandas # tabular wrangling
  • numpy # numerical helpers
  • matplotlib.pyplot # quick EDA plots
  • scikit‑learn # preprocessing, model, metrics
  • joblib # persist the trained pipeline

Dataset Link

Facility Management Unified Classification Database (FMUCD)

Step-by-Step Code Implementation

Why linear regression? For small-scope maintenance jobs, the cost scales nearly linearly with labour hours, plus additive surcharges for emergency or specialised systems. A straight-line model quantifies each contributor in clear dollar terms.

1. Import Libraries

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import OneHotEncoder, StandardScaler
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score, mean_absolute_error
import joblib

2. Load & glimpse the data

(Download the CSV pack, then point the path below to the work_order_cost.csv file.)

df = pd.read_csv("work_order_cost.csv")
print(df.head())

Typical columns

column description (example)
wo_cost_usd label – total fee of the job
labour_hours technician hours booked
request_type Planned / Unplanned
system HVAC, Electrical, Plumbing …
subsystem Chiller, Lighting, Pumps …
building_age years since construction
outside_temp_c °C at start time
humidity_pct % RH

3.  Minimal cleaning

Standard scaling puts hours, age, and weather on comparable variance, so coefficients read as $ per 1 σ change—handy for management reports.

core = ['wo_cost_usd', 'labour_hours', 'request_type',
        'system', 'subsystem', 'building_age',
        'outside_temp_c', 'humidity_pct']

df = df.dropna(subset=core).copy()

4. Define predictors & label

num_cols = ['labour_hours', 'building_age',
            'outside_temp_c', 'humidity_pct']

cat_cols = ['request_type', 'system', 'subsystem']

target   = 'wo_cost_usd'

X = df[num_cols + cat_cols]
y = df[target]

5.  Pre‑processing & model pipeline

One-hot encoding treats categorical flags as pure categories, avoiding a false numeric order between, for example, HVAC and Electrical jobs.

preproc = ColumnTransformer([
        ('cat', OneHotEncoder(handle_unknown='ignore'), cat_cols),
        ('num', StandardScaler(),                      num_cols)
])

linreg = LinearRegression()

pipe = Pipeline(steps=[
        ('prep',  preproc),
        ('model', linreg)
])

6. Train‑test split & training

X_train, X_test, y_train, y_test = train_test_split(
        X, y, test_size=0.2, random_state=42, shuffle=True)

pipe.fit(X_train, y_train)

7. Evaluation

Performance metrics – R² captures the share of cost variance explained; MAE tells planners the typical absolute dollar error—e.g., “±$180 per work order.”

y_pred = pipe.predict(X_test)
print(f"R²  : {r2_score(y_test, y_pred):.3f}")
print(f"MAE : ${mean_absolute_error(y_test, y_pred):,.0f}")

8. Inspect cost drivers

The coefficient table highlights levers: if Unplanned jobs carry the heaviest positive weight, focusing on preventive maintenance could deliver immediate savings.

# recover encoded feature names
ohe_feats = pipe.named_steps['prep']\
                .named_transformers_['cat']\
                .get_feature_names_out(cat_cols)

all_feats = list(ohe_feats) + num_cols
coef      = (pd.Series(pipe.named_steps['model'].coef_, index=all_feats)
             .sort_values())

print("\nCost‑reducing factors (negative coefficients):")
print(coef.head(8))      # most negative

print("\nCost‑increasing factors (positive coefficients):")
print(coef.tail(8))      # most positive

9.  Persist the trained pipeline

Joblib persistence packages, preprocessing, and coefficients so tomorrow’s CMMS (computerised maintenance‑management system) can load .pkl, feed new work‑order tickets, and output a cost estimate in milliseconds.

joblib.dump(pipe, "maintenance_fee_linreg.pkl")

 Summary

In fewer than 90 lines of Python code, we transformed raw work-order logs into an explainable maintenance-fee predictor. The baseline model:

  • Provides instant cost estimates for budgeting and approval workflows.
  • Reveals transparent marginal effects—every technician hour, emergency flag, or HVAC task’s extra dollar impact.

Keep this interpretable baseline as your benchmark; when you introduce regularised regressors or ensemble models, you’ll know exactly how much added accuracy justifies the extra complexity.

Your 15 seconds will encourage us to work even harder
Please share your happy experience on Google | Facebook

ProjectGurukul Team

ProjectGurukul Team specializes in creating project-based learning resources for programming, Java, Python, Android, AI, Webdevelopment and machine learning. Our mission is to help learners build practical skills through engaging, hands-on projects. We also offer free major and minor projects with source code for engineering students

Leave a Reply

Your email address will not be published. Required fields are marked *