Urban Development Cost Prediction with Ridge & Lasso Mixed Regression in ML

FREE Online Courses: Your Passport to Excellence - Start Now

City agencies, private developers, and lenders all struggle to pin down how much a new mixed‑use building, transit‑oriented block, or civic square will actually cost before breaking ground. Early estimates rely on dozens of interacting factors—gross floor area, structural system, lot slope, zoning class, material choice, labour costs, and schedule. Many of these variables are highly collinear (e.g., floor‑area ratio and total floor space), which makes ordinary least‑squares unstable. A mixed regression model—Elastic Net, which blends Ridge’s ℓ² penalty with Lasso’s ℓ¹ sparsity—can tame multicollinearity and zero‑out weak predictors, delivering a lean, transparent cost estimator that planners can trust.

Libraries Required

Purpose Library
Data manipulation pandas, numpy
Visualisation matplotlib, seaborn
Modelling workflow scikit‑learnColumnTransformer, OneHotEncoder, StandardScaler, ElasticNet, GridSearchCV, Pipeline, train_test_split
Metrics mean_squared_error, r2_score

Dataset Link

Construction Estimation Data

Step-by-Step Code Implementation

1. Import Libraries

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder, StandardScaler
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.pipeline import Pipeline
from sklearn.linear_model import ElasticNet
from sklearn.metrics import mean_squared_error, r2_score

2. Download & load dataset

1,000 synthetic but realistically structured urban‑construction projects. Each row includes scope metrics (gross floor area, floors, façade area), categorical descriptors (project type, structural system), timing variables, and the simulated Total_Cost_USD.

# One‑time shell command (needs a Kaggle API key):
# kaggle datasets download -d sasakitetsuya/construction-estimation-data -p data --unzip

df = pd.read_csv("data/construction_estimation_data.csv")   # 1,000 simulated projects

3. Initial inspection & quick EDA

print(df.head())
sns.histplot(df['Total_Cost_USD']/1e6, kde=True)
plt.xlabel('Total Cost (million USD)'); plt.title('Cost distribution'); plt.show()

4. Target & feature definition

Dividing total cost by gross floor area produces a unit cost measure (Cost_per_m2) that generalises from micro cottages to skyscrapers.

# Normalise by gross floor area to reduce heteroscedasticity
df['Cost_per_m2'] = df['Total_Cost_USD'] / df['GFA_m2']

y = df['Cost_per_m2']          # target: USD per square‑metre

X = df.drop(columns=['ID', 'Total_Cost_USD', 'Cost_per_m2'])
cat_cols = ['Project_Type', 'Structure_System', 'Zone_Class', 'City']
num_cols = [c for c in X.columns if c not in cat_cols]

5. Pre‑processing pipeline

ColumnTransformer performs one‑hot encoding and z-scaling numeric columns within the CV folds, ensuring no data leakage.

preprocess = ColumnTransformer([
    ('cat', OneHotEncoder(drop='first', sparse=False), cat_cols),
    ('num', StandardScaler(), num_cols)
])

6. Train/test split

X_train, X_test, y_train, y_test = train_test_split(
        X, y, test_size=0.2, random_state=42, stratify=df['Project_Type'])

7. Build & tune Elastic Net

  • alpha (α) scales overall shrinkage;
  • l1_ratio tilts the penalty toward Ridge (0 = pure ℓ²) or Lasso (1 = pure ℓ¹).
    A 18 × 9 grid (162 models) lets cross‑validation locate the sweet balance: sparse plus stable.
pipe = Pipeline([
    ('prep', preprocess),
    ('model', ElasticNet(max_iter=15_000, random_state=42))
])

param_grid = {
    'model__alpha'   : np.logspace(-3, 1, 18),   # penalty strength 0.001 → 10
    'model__l1_ratio': np.linspace(0.1, 0.9, 9)  # Ridge‑heavy → Lasso‑heavy
}

search = GridSearchCV(pipe, param_grid,
                      cv=5,
                      scoring='neg_root_mean_squared_error',
                      n_jobs=-1,
                      verbose=1)
search.fit(X_train, y_train)

print("Best α:", search.best_params_['model__alpha'])
print("Best l1_ratio:", search.best_params_['model__l1_ratio'])

8. Evaluate model

RMSE tells estimators the typical per‑square‑metre error; R2R^2 reports the variance explained on unseen projects

y_pred = search.predict(X_test)
rmse = mean_squared_error(y_test, y_pred, squared=False)
r2   = r2_score(y_test, y_pred)
print(f"Hold‑out RMSE: ${rmse:,.0f} per m² | R²: {r2:.3f}")

9. Interpret coefficients

The coefficient chart instantly surfaces cost levers: perhaps Steel‑Frame systems add $120 /m², High‑Rise Mixed‑Use tags on $95 /m², while extra storeys cost progressively less (economies of scale). Zeroed dummies highlight factors that don’t move the needle.

# Recover feature names after one‑hot encoding
ohe = search.best_estimator_.named_steps['prep'].named_transformers_['cat']
ohe_names = ohe.get_feature_names_out(cat_cols)
feature_names = np.hstack([ohe_names, num_cols])

# Reverse scaling for numeric coefficients
scale = search.best_estimator_.named_steps['prep'].named_transformers_['num'].scale_
coef  = search.best_estimator_.named_steps['model'].coef_
coef[-len(num_cols):] = coef[-len(num_cols):] / scale

imp = (pd.Series(coef, index=feature_names)
         .sort_values(key=abs, ascending=False))

plt.figure(figsize=(9,5))
imp.head(15).plot(kind='barh')
plt.gca().invert_yaxis()
plt.title('Top Drivers of Cost per m² (Elastic Net)')
plt.xlabel('Δ USD per m²'); plt.tight_layout(); plt.show()

Summary

With about 130 lines of Python, we have built an Elastic Net‑based “mixed regression” pipeline that:

  • Forecasts unit costs for urban development projects early in the design phase.
  • Balances sparsity and robustness, taming multicollinearity while dropping noise.
  • Explains itself through apparent coefficients—giving planners defensible numbers for lender decks and value‑engineering meetings.

Refresh the model quarterly with new bid data: replace the CSV and re‑run search.fit. Urban‑development budgeting just became a lot less guess‑and‑check, and a lot more data‑driven.

Did you know we work 24x7 to provide you best tutorials
Please encourage us - write a review on Google | Facebook

ProjectGurukul Team

ProjectGurukul Team specializes in creating project-based learning resources for programming, Java, Python, Android, AI, Webdevelopment and machine learning. Our mission is to help learners build practical skills through engaging, hands-on projects. We also offer free major and minor projects with source code for engineering students

Leave a Reply

Your email address will not be published. Required fields are marked *