Urban Planning Cost Prediction using Bayesian Regression in ML

FREE Online Courses: Elevate Your Skills, Zero Cost Attached - Enroll Now!

City planners and municipal finance teams must estimate the total capital cost of urban development projects—before committing to design or issuing bonds—using early project metrics such as planned land area (sqft), number of infrastructure elements (roads, water lines), complexity index (mixed‐use vs. single purpose), regional price index, and baseline labor cost. Costs scale nonlinearly with size and complexity (bulk procurement discounts vs. speciality work surcharges) and are subject to uncertainty from material price volatility and permitting delays. A single point‐estimate understates this risk. By applying Bayesian Regression, we obtain both:

1. A point forecast of total project cost.

2. A credible interval quantifying uncertainty—enabling risk‑aware budgeting, contingency planning, and stakeholder communication.

Libraries Required

import pandas as pd                              # data I/O  
import numpy as np                               # numerics  

import matplotlib.pyplot as plt                  # plotting  
import seaborn as sns                            # visualization  

import pymc3 as pm                               # Bayesian modeling  
import arviz as az                               # posterior analysis  

from sklearn.model_selection import train_test_split  
from sklearn.preprocessing import StandardScaler  
from sklearn.metrics import mean_absolute_error

Dataset

Construction Estimation Data

Step-by-Step Code Implementation

Data Loading & Target Definition

We combine Material_Cost and Labor_Cost into total_cost to proxy full urban‐planning expenditure.

import pandas as pd

# Load the dataset
df = pd.read_csv("data/construction-estimation-data.csv")

# Define features and create total_cost = Material_Cost + Labor_Cost
features = [
    "Project_Size_sqft",
    "Num_Stories",
    "Complexity_Index",
    "Region_Price_Index"
]
df = df[features + ["Material_Cost", "Labor_Cost"]].dropna()
df["total_cost"] = df["Material_Cost"] + df["Labor_Cost"]

# Prepare X and y
X = df[features].values
y = df["total_cost"].values  # USD

Train/Test Split & Standardisation

We z‑score all features so Bayesian priors (Normal(0,10 000)) apply uniformly and MCMC converges reliably.

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

# 80/20 split
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# Standardize all features for MCMC stability
scaler = StandardScaler().fit(X_train)
X_train_s = scaler.transform(X_train)
X_test_s  = scaler.transform(X_test)

Define & Fit Bayesian Regression Model

Model Priors:

α ∼ Normal(0, 1e5): broad intercept reflecting cost scale (~10⁶ USD).
β ∼ Normal(0, 1e4): moderate uncertainty per standardised feature.
σ ∼ HalfNormal(1e5): residual noise scale.

Likelihood: Observed total_cost ∼ Normal(α + β·X_std, σ).

Inference: We draw 2,000 posterior samples after 1,000 tuning steps, with target_accept=0.9 for stable exploration.

import pymc3 as pm

with pm.Model() as planning_cost_model:
    # Priors
    α = pm.Normal("α", mu=0, sigma=1e5)                            # intercept
    β = pm.Normal("β", mu=0, sigma=1e4, shape=X_train_s.shape[1])  # slopes
    σ = pm.HalfNormal("σ", sigma=1e5)                              # noise scale

    # Linear predictor
    μ = α + pm.math.dot(X_train_s, β)

    # Likelihood: observed total_cost
    Y_obs = pm.Normal("Y_obs", mu=μ, sigma=σ, observed=y_train)

    # Sample posterior
    trace = pm.sample(
        draws=2000,
        tune=1000,
        target_accept=0.9,
        return_inferencedata=True
    )

Posterior Analysis & Point Predictions

Posterior Predictive: Sampling Y_obs yields full predictive distributions; we extract the posterior mean and 94% Highest Posterior Density intervals to quantify uncertainty.
Evaluation: Mean Absolute Error (MAE) on held‑out test data quantifies average forecast error in USD.

import arviz as az
from sklearn.metrics import mean_absolute_error

# Summarize posterior
az.summary(trace, round_to=2)

# Posterior predictive sampling
with planning_cost_model:
    ppc = pm.sample_posterior_predictive(trace, var_names=["Y_obs"])

# Posterior means of parameters
α_post = trace.posterior["α"].mean().item()
β_post = trace.posterior["β"].mean(dim=["chain","draw"]).values

# Point forecasts on test set
y_pred = α_post + X_test_s.dot(β_post)

# Evaluate MAE
mae = mean_absolute_error(y_test, y_pred)
print(f"Test MAE: ${mae:,.2f}")

Visualise Predictions & Credible Intervals

By sweeping project size while holding other features fixed, we plot both the expected cost curve and its credible band, illuminating how scale drives cost and how uncertain that relationship is.

import numpy as np
import matplotlib.pyplot as plt

# Sweep Project_Size_sqft; hold other features at median
grid_size = np.linspace(X_train_s[:,0].min(), X_train_s[:,0].max(), 100)
grid = np.tile(np.median(X_train_s, axis=0), (100,1))
grid[:,0] = grid_size

with planning_cost_model:
    ppc_grid = pm.sample_posterior_predictive(
        trace, var_names=["Y_obs"], samples=1000
    )

preds     = ppc_grid["Y_obs"]
mean_pred = preds.mean(axis=0)
hpd       = az.hdi(preds, hdi_prob=0.94)

# Back-transform size
size_orig = scaler.inverse_transform(
    np.column_stack([grid[:,0], grid[:,1], grid[:,2], grid[:,3]])
)[:,0]

plt.figure(figsize=(8,5))
plt.plot(size_orig, mean_pred, label="Posterior mean")
plt.fill_between(size_orig, hpd[:,0], hpd[:,1], alpha=0.3,
                 label="94% credible interval")
plt.scatter(
    scaler.inverse_transform(X_test_s)[:,0], y_test,
    color="k", alpha=0.5, label="Test data"
)
plt.xlabel("Project Size (sqft)")
plt.ylabel("Total Cost (USD)")
plt.title("Bayesian Regression: Cost vs. Project Size")
plt.legend()
plt.tight_layout()
plt.show()

Summary

This Bayesian Regression workflow for Urban Planning Cost Prediction provides:

Point estimates of total project cost from early design metrics.
Credible intervals quantifying uncertainty from market and complexity variability.
Actionable insights: planners can allocate contingencies, justify budgets, and communicate risk bounds to stakeholders—optimising both financial stewardship and project delivery.

Your 15 seconds will encourage us to work even harder
Please share your happy experience on Google | Facebook

Urban Planning Cost Prediction using Bayesian Regression in ML

Libraries Required

Dataset

Step-by-Step Code Implementation

Data Loading & Target Definition

Train/Test Split & Standardisation

Define & Fit Bayesian Regression Model

Posterior Analysis & Point Predictions

Visualise Predictions & Credible Intervals