Real Estate Project Cost Prediction with ElasticNet Algorithm in ML

FREE Online Courses: Click, Learn, Succeed, Start Now!

Developers, quantity surveyors, and lenders must size a construction budget long before tender documents are final. Total project cost (USD) is driven by floor area, height, façade ratio, structural system, zoning class, and build‑time—variables that are often tightly collinear (e.g., taller towers → more façade → steel frame).

Ordinary least‑squares inflates unstable coefficients; a pure Lasso model (ℓ¹) may prune genuinely informative parameters. Elastic Net—a weighted blend of Ridge (ℓ²) and Lasso (ℓ¹)—delivers a sparse yet stable regression that forecasts construction spend from early‑design inputs, giving stakeholders a defendable number for loan memos and pro formas.

Libraries Required

Purpose Library
Data & numerics pandas, numpy
Visualisation matplotlib, seaborn
ML pipeline scikit‑learnColumnTransformer, OneHotEncoder, StandardScaler, ElasticNet, GridSearchCV, Pipeline, train_test_split
Metrics mean_squared_error, r2_score

Dataset

Construction Estimation Data

Step-by-Step Code Implementation

1. Import libraries

import pandas as pd, numpy as np
import matplotlib.pyplot as plt, seaborn as sns

from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder, StandardScaler
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.pipeline import Pipeline
from sklearn.linear_model import ElasticNet
from sklearn.metrics import mean_squared_error, r2_score

2. Load dataset

df = pd.read_csv("construction_estimation_data.csv")  # adjust path if needed

3. Target & feature engineering

# cost normalised by floor area to remove scale bias
df['Cost_per_m2'] = df['Total_Cost_USD'] / df['GFA_m2']
y = df['Cost_per_m2']

X = df[['Project_Type', 'Structure_System', 'Zone_Class', 'City',
        'GFA_m2', 'Floors', 'Facade_Area_m2', 'Duration_Months']]

cat_cols = ['Project_Type','Structure_System','Zone_Class','City']
num_cols = ['GFA_m2','Floors','Facade_Area_m2','Duration_Months']

4. Elastic Net pipeline

Pipeline wraps one‑hot encoding + scaling + regression, so cross‑validation never peeks at held‑out data.

preprocess = ColumnTransformer([
    ('cat', OneHotEncoder(drop='first'), cat_cols),
    ('num', StandardScaler(), num_cols)
])

pipe = Pipeline([
    ('prep', preprocess),
    ('enet', ElasticNet(max_iter=20000, random_state=42))
])

5. Train/test split & hyper‑parameter grid

Elastic Net hyper‑tuning searches 162 combinations (18 α × 9 mix ratios). Alpha sets overall shrinkage; l1_ratio balances Ridge (handles collinearity) and Lasso (drives sparsity).

X_train, X_test, y_train, y_test = train_test_split(
        X, y, test_size=0.2, random_state=42, stratify=df['Project_Type'])

grid = {'enet__alpha':    np.logspace(-3, 1, 18),   # 0.001 → 10
        'enet__l1_ratio': np.linspace(0.1, 0.9, 9)} # Ridge‑heavy → Lasso‑heavy

gs = GridSearchCV(pipe, grid, cv=5,
                  scoring='neg_root_mean_squared_error',
                  n_jobs=-1, verbose=1).fit(X_train, y_train)

6. Evaluate

y_pred = gs.predict(X_test)
rmse = mean_squared_error(y_test, y_pred, squared=False)
r2   = r2_score(y_test, y_pred)
print(f"Hold‑out RMSE: ${rmse:,.0f} per m² | R²: {r2:.3f}")

7. Interpret drivers

Back‑scaling numeric coefficients makes the bar plot interpretable: e.g., every extra façade m² adds $42 /m², steel‑frame dummies cost $110 /m² more than concrete, while extra storeys reduce cost per m² via economies of scale.

ohe_names = gs.best_estimator_.named_steps['prep'] \
               .named_transformers_['cat'].get_feature_names_out(cat_cols)
names = np.hstack([ohe_names, num_cols])

scales = gs.best_estimator_.named_steps['prep'] \
            .named_transformers_['num'].scale_
coef = gs.best_estimator_.named_steps['enet'].coef_
coef[-len(num_cols):] = coef[-len(num_cols):] / scales

(pd.Series(coef, index=names)
   .sort_values(key=abs, ascending=False)
   .head(15)
   .plot(kind='barh', figsize=(9,5)))
plt.gca().invert_yaxis()
plt.title('Top Cost‑per‑m² Drivers (Elastic Net)')
plt.xlabel('Δ USD/m²'); plt.tight_layout(); plt.show()

Summary

This end‑to‑end Elastic Net workflow produces a fast, interpretable cost model that:

  • Predicts unit construction cost early, with low RMSE.
  • Handles multicollinearity without sacrificing sparsity.
  • Ranks scope and design choices by dollar impact—empowering smarter value‑engineering before bids arrive.

Swap in fresh tender data and run gs.fit() to keep the model current as markets shift.

Did we exceed your expectations?
If Yes, share your valuable feedback on Google | Facebook

ProjectGurukul Team

ProjectGurukul Team specializes in creating project-based learning resources for programming, Java, Python, Android, AI, Webdevelopment and machine learning. Our mission is to help learners build practical skills through engaging, hands-on projects. We also offer free major and minor projects with source code for engineering students

Leave a Reply

Your email address will not be published. Required fields are marked *