Construction Material Cost Prediction using Bayesian Regression in ML

FREE Online Courses: Elevate Your Skills, Zero Cost Attached - Enroll Now!

Project managers and cost engineers need to predict the total material cost for a new building project—before procurement—using early‑stage indicators such as project floor area, number of stories, structural complexity index, regional price index, and estimated labour cost. Material costs scale nonlinearly with size and complexity (e.g., bulk-order discounts, premium finishes) and are subject to market price volatility. A single point estimate risks budget overruns or overly conservative bids. By applying Bayesian Regression, we obtain both:

1. A point estimate of expected material cost.

2. A credible interval quantifying forecast uncertainty—enabling risk‐aware budgeting and procurement strategy.

Libraries Required

import pandas as pd                              # data loading & handling  
import numpy as np                               # numerical operations  

import matplotlib.pyplot as plt                  # plotting  
import seaborn as sns                            # visualization  

import pymc3 as pm                               # Bayesian modeling  
import arviz as az                               # posterior analysis  

from sklearn.model_selection import train_test_split  
from sklearn.preprocessing import StandardScaler  
from sklearn.metrics import mean_absolute_error  

Dataset

Construction Estimation Data

Step-by-Step Code Implementation

Import Libraries & Load Data

import pandas as pd

# Load dataset
df = pd.read_csv("data/construction-estimation-data.csv")

# Preview key columns
df[['Project_Size_sqft','Num_Stories','Complexity_Index',
    'Region_Price_Index','Labor_Cost','Material_Cost']].head()

Preprocessing & Train/Test Split

We z‑score all predictors so the priors on β can operate uniformly across features with different units

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

# Define predictors and target
feature_cols = [
    'Project_Size_sqft',
    'Num_Stories',
    'Complexity_Index',
    'Region_Price_Index',
    'Labor_Cost'
]
X = df[feature_cols].values
y = df['Material_Cost'].values  # USD

# Split data (80% train / 20% test)
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# Standardize numeric features for stable MCMC
scaler = StandardScaler().fit(X_train)
X_train_s = scaler.transform(X_train)
X_test_s  = scaler.transform(X_test)

Define & Fit Bayesian Regression Model

Priors:

  • α ∼ Normal(0, 1e5) allows broad intercept shifts for large‐scale USD costs.
  • β ∼ Normal(0, 1e4) encodes moderate uncertainty on each predictor’s effect.
  • σ ∼ HalfNormal(1e5) permits large residual variability reflecting market volatility.

Model: The linear predictor μ = α + β·X_standardized links project attributes to material cost, with observed costs modelled as Normal(μ, σ).

Inference: We draw 2,000 posterior samples (post 1,000 tuning) with target_accept=0.9 for robust convergence. Posterior predictive sampling yields full predictive distributions for new projects.

import pymc3 as pm

with pm.Model() as model:
    # Priors
    α = pm.Normal("α", mu=0, sigma=1e5)
    β = pm.Normal("β", mu=0, sigma=1e4, shape=X_train_s.shape[1])
    σ = pm.HalfNormal("σ", sigma=1e5)

    # Linear predictor
    μ = α + pm.math.dot(X_train_s, β)

    # Likelihood
    MaterialCost = pm.Normal("MaterialCost", mu=μ, sigma=σ, observed=y_train)

    # MCMC sampling
    trace = pm.sample(
        draws=2000,
        tune=1000,
        target_accept=0.9,
        return_inferencedata=True
    )

Posterior Analysis & Point Predictions

Posterior means of α and β provide point forecasts; MAE on held‑out test data quantifies average error.

import arviz as az
from sklearn.metrics import mean_absolute_error

# Summarize posterior
az.summary(trace, round_to=2)

# Posterior predictive sampling
with model:
    ppc = pm.sample_posterior_predictive(trace, var_names=["MaterialCost"])

# Extract posterior means
α_post = trace.posterior["α"].mean().item()
β_post = trace.posterior["β"].mean(dim=["chain","draw"]).values

# Point predictions on test set
y_pred = α_post + X_test_s.dot(β_post)

# Compute MAE
mae = mean_absolute_error(y_test, y_pred)
print(f"Test MAE: ${mae:.2f}")

Visualise Predictions with Credible Intervals

By sweeping one feature (project size) and holding others fixed, we plot both the posterior mean cost curve and its 94% Highest Posterior Density interval—illuminating both expected trend and our uncertainty.

import numpy as np
import matplotlib.pyplot as plt

# Vary Project_Size_sqft; hold others at median
size_grid = np.linspace(X_train_s[:,0].min(), X_train_s[:,0].max(), 100)
grid = np.tile(np.median(X_train_s, axis=0), (100,1))
grid[:,0] = size_grid

with model:
    pm.set_data({"MaterialCost": None})
    # Note: here we'd set up a new shared variable for X; for brevity, assume it's handled
    ppc_grid = pm.sample_posterior_predictive(trace, var_names=["MaterialCost"], samples=1000)

preds     = ppc_grid["MaterialCost"]
mean_pred = preds.mean(axis=0)
hpd       = az.hdi(preds, hdi_prob=0.94)

# Convert project size back to original scale
size_orig = scaler.inverse_transform(
    np.column_stack([grid[:,0], grid[:,1], grid[:,2], grid[:,3], grid[:,4]])
)[:,0]

plt.figure(figsize=(8,5))
plt.plot(size_orig, mean_pred, label="Posterior mean")
plt.fill_between(size_orig, hpd[:,0], hpd[:,1], alpha=0.3,
                 label="94% credible interval")
plt.scatter(
    scaler.inverse_transform(X_test_s)[:,0], y_test,
    color="k", alpha=0.5, label="Test data"
)
plt.xlabel("Project Size (sqft)")
plt.ylabel("Material Cost (USD)")
plt.title("Bayesian Regression: Material Cost vs. Project Size")
plt.legend()
plt.tight_layout()
plt.show()

Summary

This Bayesian Regression pipeline for construction material cost forecasting delivers:

1. Accurate point estimates of material cost from early project specifications.

2. Credible intervals that quantify uncertainty from complexity and market variation—critical for risk‐aware budgeting.

3. Actionable insights: cost engineers can use both expected cost and uncertainty bounds to negotiate supplier contracts, set contingency reserves, and optimise procurement timing under uncertainty.

Did you like our efforts? If Yes, please give ProjectGurukul 5 Stars on Google | Facebook

ProjectGurukul Team

The ProjectGurukul Team delivers project-based tutorials on programming, machine learning, and web development. We simplify learning by providing hands-on projects to help you master real-world skills. We also provide free major and minor projects for enginering students.

Leave a Reply

Your email address will not be published. Required fields are marked *