Property Value Growth Prediction using Polynomial Regression in ML

FREE Online Courses: Dive into Knowledge for Free. Learn More!

Real‑estate analysts and investors need to forecast future property‑value growth (%) for residential homes based on current market indicators—such as recent sale price, square footage, lot size, number of bedrooms/bathrooms, year built, and neighbourhood socio‑economic scores—before making purchase or development decisions. The relationship between these predictors and price appreciation is nonlinear: e.g., diminishing returns from additional square footage and interactions between lot size and neighbourhood factors. A simple linear model underfits these curves, while an unconstrained high‑degree polynomial overfits noise. By applying Polynomial Regression to carefully engineered features with Ridge regularisation, we can capture smooth, nonlinear dependencies and deliver accurate, interpretable growth forecasts.

Libraries Required

import pandas as pd                        # data manipulation  
import numpy as np                         # numerical operations  

import matplotlib.pyplot as plt            # plotting  
import seaborn as sns                      # enhanced visualization  

from sklearn.model_selection import train_test_split, GridSearchCV  
from sklearn.preprocessing import StandardScaler, PolynomialFeatures  
from sklearn.linear_model import Ridge  
from sklearn.pipeline import Pipeline  
from sklearn.metrics import mean_squared_error, r2_score

Dataset

House Prices – Advanced Regression Techniques

Step-by-Step Code Implementation

Load Libraries & Data

import pandas as pd

# Load training data
df = pd.read_csv("data/train.csv")

# Preview key columns
df.head()[['SalePrice','GrLivArea','LotArea','YearBuilt','OverallQual','Neighborhood']]

Feature Engineering & Target Definition

Target engineering: we define PriceGrowthPct as the percentage increase over a proxy for ReplacementCost, capturing appreciation relative to the perceived baseline.
Polynomial Features expands our five inputs into their squares and pairwise interactions—e.g., GrLivArea² and GrLivArea×OverallQual—capturing nonlinear effects such as diminishing returns and quality synergy.

# Compute property-value growth as percentage above replacement cost proxy:
# here we define growth = (SalePrice - OverallQual*10000) / (OverallQual*10000)
df['ReplacementCost'] = df['OverallQual'] * 10000
df['PriceGrowthPct'] = (df['SalePrice'] - df['ReplacementCost']) / df['ReplacementCost'] * 100

# Select predictors known at buy-in
features = ['GrLivArea','LotArea','YearBuilt','OverallQual','OverallCond']
X = df[features]
y = df['PriceGrowthPct']

Exploratory Data Analysis

import seaborn as sns
import matplotlib.pyplot as plt

# Nonlinear trend: living area vs growth
sns.scatterplot(x='GrLivArea', y='PriceGrowthPct', data=df, alpha=0.5)
plt.title("Living Area vs Price Growth (%)")
plt.xlabel("Above‑ground Living Area (sq ft)")
plt.ylabel("Price Growth (%)")
plt.show()

Build Polynomial Regression Pipeline

StandardScaler normalises features, so Ridge’s ℓ² penalty treats them uniformly, avoiding dominance by larger‑variance terms.
Ridge regression applies ℓ² regularisation to shrink noisy high‑order coefficients, controlling overfitting in the expanded feature space.

from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler, PolynomialFeatures
from sklearn.linear_model import Ridge

pipe = Pipeline([
    ('scale', StandardScaler()),  
    ('poly', PolynomialFeatures(include_bias=False)),  
    ('ridge', Ridge(random_state=42))  
])

Train/Test Split & Hyperparameter Search

GridSearchCV tunes polynomial degree (1–3) and regularisation strength α (10⁻³ to 10³) via 5‑fold CV, selecting the model that minimises RMSE on held‑out folds.

from sklearn.model_selection import train_test_split, GridSearchCV
import numpy as np

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

param_grid = {
    'poly__degree': [1, 2, 3],
    'ridge__alpha': np.logspace(-3, 3, 7)
}

gs = GridSearchCV(
    pipe, param_grid,
    cv=5,
    scoring='neg_root_mean_squared_error',
    n_jobs=-1, verbose=1
)
gs.fit(X_train, y_train)

print("Best parameters:", gs.best_params_)

Evaluate Model

y_pred = gs.predict(X_test)
rmse = mean_squared_error(y_test, y_pred, squared=False)
r2   = r2_score(y_test, y_pred)

print(f"Test RMSE: {rmse:.2f} % growth")
print(f"Test R²  : {r2:.3f}")

Inspect Key Polynomial Coefficients

Coefficient inspection ranks the most influential polynomial terms—guiding investors on which features (e.g., large living area in high‑quality homes) most drive future growth.

# Retrieve polynomial feature names
poly = gs.best_estimator_.named_steps['poly']
feat_names = poly.get_feature_names_out(input_features=features)

# Retrieve Ridge coefficients
coefs = gs.best_estimator_.named_steps['ridge'].coef_

import pandas as pd
coef_series = pd.Series(coefs, index=feat_names).abs().sort_values(ascending=False)

# Plot top 10
plt.figure(figsize=(8,5))
coef_series.head(10).plot(kind='barh')
plt.gca().invert_yaxis()
plt.title("Top Polynomial Features Driving Price Growth")
plt.xlabel("Coefficient Magnitude")
plt.tight_layout()
plt.show()

Summary

By integrating polynomial feature engineering with Ridge regularisation in a concise pipeline, we achieve:

1. Accurate nonlinear forecasts of property‑value growth (low RMSE, strong R²).

2. Controlled model complexity, avoiding overfitting while capturing critical curvature and interaction effects.

3. Interpretable insights: the top polynomial features highlight which combinations of size, quality, and age most influence appreciation, supporting data‑driven investment strategies.

Your 15 seconds will encourage us to work even harder
Please share your happy experience on Google | Facebook

Property Value Growth Prediction using Polynomial Regression in ML

Libraries Required

Dataset

Step-by-Step Code Implementation

Load Libraries & Data

Feature Engineering & Target Definition

Exploratory Data Analysis

Build Polynomial Regression Pipeline

Train/Test Split & Hyperparameter Search

Evaluate Model

Inspect Key Polynomial Coefficients