Building Maintenance Cost Prediction using Bayesian Regression in ML

FREE Online Courses: Click, Learn, Succeed, Start Now!

Facilities managers and real‑estate operators need to forecast the annual maintenance costs for commercial buildings—before budgeting and contract negotiations—using early‑stage building attributes such as building age, floor area, number of stories, occupancy level, and the regional labour-rate index. Maintenance cost per square foot exhibits nonlinear scale effects (bulk‐service discounts on large sites) and complexity surcharges (older buildings require specialised trades). It carries uncertainty from labour-rate inflation and unplanned breakdowns. By applying Bayesian Regression, we derive both a point estimate of annual maintenance costs and a credible interval that communicates our uncertainty—enabling more reliable budgeting and risk‐aware contract planning.

Libraries Required

import pandas as pd                              # data loading & manipulation  
import numpy as np                               # numerical operations  

import matplotlib.pyplot as plt                  # plotting  
import seaborn as sns                            # visualization  

import pymc3 as pm                               # Bayesian modeling  
import arviz as az                               # posterior analysis  

from sklearn.model_selection import train_test_split  
from sklearn.preprocessing import StandardScaler  
from sklearn.metrics import mean_absolute_error 

Dataset

Construction Estimation Data 

Step-by-Step Code Implementation

Data Loading & Preprocessing

  • We rename Material_Cost to Maint_Cost to serve as our maintenance cost proxy.
  • Five features—floor area, story count, complexity, regional price, and labour cost—are standardised for uniform scaling.
import pandas as pd

# Load the simulated cost data
df = pd.read_csv("data/construction-estimation-data.csv")

# For maintenance‐cost proxy, rename Material_Cost → Maint_Cost
df = df.rename(columns={"Material_Cost":"Maint_Cost"})

# Select relevant features and drop any missing rows
features = [
    'Project_Size_sqft',    # analog: floor area
    'Num_Stories',          # height/complexity
    'Complexity_Index',     # custom complexity score
    'Region_Price_Index',   # local labor/material rates
    'Labor_Cost'            # base labor component
]
df = df[features + ['Maint_Cost']].dropna()

# Define X and y
X = df[features].values
y = df['Maint_Cost'].values

# Split data (80% train / 20% test)
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# Standardize predictors for stable MCMC
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler().fit(X_train)
X_train_s = scaler.transform(X_train)
X_test_s  = scaler.transform(X_test)

Define & Fit Bayesian Regression Model

Model Specification

  • Weakly informative Normal priors on intercept and coefficients reflect broad initial uncertainty.
  • A HalfNormal prior on σ enforces positive residual spread.
  • Observations follow Y_obs ∼ Normal(α + X·β, σ).

Sampling

  • We draw 2,000 posterior samples (after 1,000 burn‑in) with target_accept=0.9 to ensure stable convergence.
import pymc3 as pm

with pm.Model() as maintenance_model:
    # Priors
    α = pm.Normal("α", mu=0, sigma=1e4)                           # intercept
    β = pm.Normal("β", mu=0, sigma=1e3, shape=X_train_s.shape[1]) # coeffs
    σ = pm.HalfNormal("σ", sigma=1e4)                             # noise

    # Linear predictor
    μ = α + pm.math.dot(X_train_s, β)

    # Likelihood
    Y_obs = pm.Normal("Y_obs", mu=μ, sigma=σ, observed=y_train)

    # Sample posterior
    trace = pm.sample(
        draws=2000,
        tune=1000,
        target_accept=0.9,
        return_inferencedata=True
    )

Posterior Analysis & Point Predictions

Using posterior means of α and β yields point estimates; MAE quantifies accuracy on held‑out data.

import arviz as az
from sklearn.metrics import mean_absolute_error

# Summarize the posterior distributions
az.summary(trace, round_to=2)

# Posterior predictive sampling
with maintenance_model:
    ppc = pm.sample_posterior_predictive(trace, var_names=["Y_obs"])

# Extract posterior means
α_post = trace.posterior["α"].mean().item()
β_post = trace.posterior["β"].mean(dim=["chain","draw"]).values

# Compute point predictions on the test set
y_pred = α_post + X_test_s.dot(β_post)

# Evaluate with MAE
mae = mean_absolute_error(y_test, y_pred)
print(f"Test MAE: ${mae:.2f} per project proxy")

Visualise Predictions & Credible Intervals

Sweeping building area while holding other features constant, we plot the posterior mean cost and its 94% Highest Posterior Density interval—conveying both expected cost scaling and forecast uncertainty.

import numpy as np

import matplotlib.pyplot as plt


# Vary Project_Size_sqft; hold others at median
size_grid = np.linspace(X_train_s[:,0].min(), X_train_s[:,0].max(), 100)
grid = np.tile(np.median(X_train_s, axis=0), (100,1))
grid[:,0] = size_grid

with maintenance_model:
    ppc_grid = pm.sample_posterior_predictive(trace, var_names=["Y_obs"], samples=1000)

preds     = ppc_grid["Y_obs"]
mean_pred = preds.mean(axis=0)
hpd       = az.hdi(preds, hdi_prob=0.94)

# Back-transform Project_Size_sqft
size_orig = scaler.inverse_transform(
    np.column_stack([size_grid, grid[:,1], grid[:,2], grid[:,3], grid[:,4]])
)[:,0]

plt.figure(figsize=(8,5))
plt.plot(size_orig, mean_pred, label="Posterior mean")
plt.fill_between(size_orig, hpd[:,0], hpd[:,1], alpha=0.3,
                 label="94% credible interval")
plt.scatter(
    scaler.inverse_transform(X_test_s)[:,0], y_test,
    color="k", alpha=0.5, label="Test data"
)
plt.xlabel("Building Area (sqft)")
plt.ylabel("Annual Maintenance Cost (USD)")
plt.title("Bayesian Regression: Cost vs. Building Area")
plt.legend()
plt.tight_layout()
plt.show()

Summary

This Bayesian Regression framework for building maintenance‐cost prediction provides:

1. Point estimates of annual maintenance cost from early building characteristics.

2. Credible intervals quantifying uncertainty from market‐rate and complexity variability.

3. Actionable insights: facilities managers can budget with confidence bounds, negotiate maintenance contracts effectively, and plan capital reserves under uncertainty.

You give me 15 seconds I promise you best tutorials
Please share your happy experience on Google | Facebook

ProjectGurukul Team

The ProjectGurukul Team delivers project-based tutorials on programming, machine learning, and web development. We simplify learning by providing hands-on projects to help you master real-world skills. We also provide free major and minor projects for enginering students.

Leave a Reply

Your email address will not be published. Required fields are marked *