Urban Planning Cost Prediction using Bayesian Regression in ML

FREE Online Courses: Your Passport to Excellence - Start Now

City planners and municipal finance teams must estimate the total capital cost of urban development projects—before committing to design or issuing bonds—using early project metrics such as planned land area (sqft), number of infrastructure elements (roads, water lines), complexity index (mixed‐use vs. single purpose), regional price index, and baseline labor cost. Costs scale nonlinearly with size and complexity (bulk procurement discounts vs. speciality work surcharges) and are subject to uncertainty from material price volatility and permitting delays. A single point‐estimate understates this risk. By applying Bayesian Regression, we obtain both:

1. A point forecast of total project cost.

2. A credible interval quantifying uncertainty—enabling risk‑aware budgeting, contingency planning, and stakeholder communication.

Libraries Required

import pandas as pd                              # data I/O  
import numpy as np                               # numerics  

import matplotlib.pyplot as plt                  # plotting  
import seaborn as sns                            # visualization  

import pymc3 as pm                               # Bayesian modeling  
import arviz as az                               # posterior analysis  

from sklearn.model_selection import train_test_split  
from sklearn.preprocessing import StandardScaler  
from sklearn.metrics import mean_absolute_error  

Dataset

Construction Estimation Data

Step-by-Step Code Implementation

Data Loading & Target Definition

We combine Material_Cost and Labor_Cost into total_cost to proxy full urban‐planning expenditure.

import pandas as pd

# Load the dataset
df = pd.read_csv("data/construction-estimation-data.csv")

# Define features and create total_cost = Material_Cost + Labor_Cost
features = [
    "Project_Size_sqft",
    "Num_Stories",
    "Complexity_Index",
    "Region_Price_Index"
]
df = df[features + ["Material_Cost", "Labor_Cost"]].dropna()
df["total_cost"] = df["Material_Cost"] + df["Labor_Cost"]

# Prepare X and y
X = df[features].values
y = df["total_cost"].values  # USD

Train/Test Split & Standardisation

We z‑score all features so Bayesian priors (Normal(0,10 000)) apply uniformly and MCMC converges reliably.

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

# 80/20 split
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# Standardize all features for MCMC stability
scaler = StandardScaler().fit(X_train)
X_train_s = scaler.transform(X_train)
X_test_s  = scaler.transform(X_test)

Define & Fit Bayesian Regression Model

Model Priors:

  • α ∼ Normal(0, 1e5): broad intercept reflecting cost scale (~10⁶ USD).
  • β ∼ Normal(0, 1e4): moderate uncertainty per standardised feature.
  • σ ∼ HalfNormal(1e5): residual noise scale.

Likelihood: Observed total_cost ∼ Normal(α + β·X_std, σ).

Inference: We draw 2,000 posterior samples after 1,000 tuning steps, with target_accept=0.9 for stable exploration.

import pymc3 as pm

with pm.Model() as planning_cost_model:
    # Priors
    α = pm.Normal("α", mu=0, sigma=1e5)                            # intercept
    β = pm.Normal("β", mu=0, sigma=1e4, shape=X_train_s.shape[1])  # slopes
    σ = pm.HalfNormal("σ", sigma=1e5)                              # noise scale

    # Linear predictor
    μ = α + pm.math.dot(X_train_s, β)

    # Likelihood: observed total_cost
    Y_obs = pm.Normal("Y_obs", mu=μ, sigma=σ, observed=y_train)

    # Sample posterior
    trace = pm.sample(
        draws=2000,
        tune=1000,
        target_accept=0.9,
        return_inferencedata=True
    )

Posterior Analysis & Point Predictions

  • Posterior Predictive: Sampling Y_obs yields full predictive distributions; we extract the posterior mean and 94% Highest Posterior Density intervals to quantify uncertainty.
  • Evaluation: Mean Absolute Error (MAE) on held‑out test data quantifies average forecast error in USD.
import arviz as az
from sklearn.metrics import mean_absolute_error

# Summarize posterior
az.summary(trace, round_to=2)

# Posterior predictive sampling
with planning_cost_model:
    ppc = pm.sample_posterior_predictive(trace, var_names=["Y_obs"])

# Posterior means of parameters
α_post = trace.posterior["α"].mean().item()
β_post = trace.posterior["β"].mean(dim=["chain","draw"]).values

# Point forecasts on test set
y_pred = α_post + X_test_s.dot(β_post)

# Evaluate MAE
mae = mean_absolute_error(y_test, y_pred)
print(f"Test MAE: ${mae:,.2f}")

Visualise Predictions & Credible Intervals

By sweeping project size while holding other features fixed, we plot both the expected cost curve and its credible band, illuminating how scale drives cost and how uncertain that relationship is.

import numpy as np
import matplotlib.pyplot as plt

# Sweep Project_Size_sqft; hold other features at median
grid_size = np.linspace(X_train_s[:,0].min(), X_train_s[:,0].max(), 100)
grid = np.tile(np.median(X_train_s, axis=0), (100,1))
grid[:,0] = grid_size

with planning_cost_model:
    ppc_grid = pm.sample_posterior_predictive(
        trace, var_names=["Y_obs"], samples=1000
    )

preds     = ppc_grid["Y_obs"]
mean_pred = preds.mean(axis=0)
hpd       = az.hdi(preds, hdi_prob=0.94)

# Back-transform size
size_orig = scaler.inverse_transform(
    np.column_stack([grid[:,0], grid[:,1], grid[:,2], grid[:,3]])
)[:,0]

plt.figure(figsize=(8,5))
plt.plot(size_orig, mean_pred, label="Posterior mean")
plt.fill_between(size_orig, hpd[:,0], hpd[:,1], alpha=0.3,
                 label="94% credible interval")
plt.scatter(
    scaler.inverse_transform(X_test_s)[:,0], y_test,
    color="k", alpha=0.5, label="Test data"
)
plt.xlabel("Project Size (sqft)")
plt.ylabel("Total Cost (USD)")
plt.title("Bayesian Regression: Cost vs. Project Size")
plt.legend()
plt.tight_layout()
plt.show()

Summary

This Bayesian Regression workflow for Urban Planning Cost Prediction provides:

  • Point estimates of total project cost from early design metrics.
  • Credible intervals quantifying uncertainty from market and complexity variability.
  • Actionable insights: planners can allocate contingencies, justify budgets, and communicate risk bounds to stakeholders—optimising both financial stewardship and project delivery.

Your opinion matters
Please write your valuable feedback about ProjectGurukul on Google | Facebook

ProjectGurukul Team

The ProjectGurukul Team delivers project-based tutorials on programming, machine learning, and web development. We simplify learning by providing hands-on projects to help you master real-world skills. We also provide free major and minor projects for enginering students.

Leave a Reply

Your email address will not be published. Required fields are marked *