Property Value Prediction using Quantile Regression in ML

FREE Online Courses: Enroll Now, Thank us Later!

Traditional home‐valuation models estimate the mean sale price. Still, lenders, appraisers, and investors need to anticipate the range of plausible prices—understanding both conservative (lower‐quantile) and optimistic (upper‐quantile) outcomes.

In this property value prediction ML project, we’ll predict the 25th, 50th, and 75th percentiles of residential property sale prices in King County, WA, based on features like square footage, bedrooms, bathrooms, age, and location. By fitting separate linear quantile regression models, we’ll uncover how each predictor’s influence shifts across the lower, median, and upper tails of the price distribution—equipping stakeholders with distribution‑aware valuations for risk management and opportunity identification.

Libraries Required

import pandas as pd  
import numpy as np  
import statsmodels.formula.api as smf     # Quantile regression via formula API  
from sklearn.model_selection import train_test_split  
from sklearn.metrics import mean_pinball_loss  # Proper loss for quantile forecasts  

Dataset

House Sales in King County, USA

Step-by-Step Code Implementation

Load & Inspect Data

We load ~21,613 King County home sales—featuring sale prices and property attributes—and inspect their schema and price distribution (.describe()) to understand the range and dispersion.

# Load the King County house sales dataset
# Source: Kaggle :contentReference[oaicite:1]{index=1}
df = pd.read_csv("kc_house_data.csv")

# Inspect top rows and summary
print(df.head())
print(df.info())
print(df['price'].describe())

Preprocessing & Feature Engineering

  • We rename “price” to “Price” for clarity.
  • We select eleven predictors—living area, room counts, structural ratings, year built, and geographic coordinates—ensuring no missing values remain.
# Rename target for ease
df.rename(columns={'price':'Price'}, inplace=True)

# Select key predictors
# sqft_living, bedrooms, bathrooms, floors, waterfront, view, condition, grade, yr_built, lat, long
features = [
    'sqft_living','bedrooms','bathrooms','floors',
    'waterfront','view','condition','grade',
    'yr_built','lat','long'
]

# Drop any missing values (none expected)
df = df.dropna(subset=features + ['Price'])

Train/Test Split

We randomly hold out 20% of records for evaluation, creating train and test sets to assess out‑of‑sample quantile forecasts.

# Reserve 20% of data for out‑of‑sample evaluation
train, test = train_test_split(df[features + ['Price']],
                               test_size=0.2,
                               random_state=42)

Fit Quantile Regression Models

For each target quantile (25th, 50th, 75th):

  • We specify a formula linking Price to all predictors.
  • We fit a QuantReg model at that percentile on the training set.
  • We print only the coefficient table, which shows how each feature’s marginal effect varies across the lower, median, and upper price distribution (e.g., an extra bathroom may add more value in premium homes than entry‑level ones).
quantiles = [0.25, 0.50, 0.75]
results   = {}
formula   = "Price ~ " + " + ".join(features)

for q in quantiles:
    model = smf.quantreg(formula, train)
    res   = model.fit(q=q)
    results[q] = res
    print(f"\n--- {int(q*100)}th Percentile Coefficients ---")
    print(res.summary().tables[1])   # coefficient table only

Evaluation with Pinball Loss

  • We predict quantile‑specific prices on the held‑out test set.
  • We compute pinball loss for each quantile forecast—a loss function tailored to quantile estimates—quantifying the average weighted penalty for under‑ and over‑prediction. Lower pinball loss indicates better calibrated quantile models.
for q, res in results.items():
    preds = res.predict(test[features])
    loss  = mean_pinball_loss(test['Price'], preds, alpha=q)
    print(f"{int(q*100)}th quantile pinball loss: {loss:.2f}")

Summary

By modelling the 25th, 50th, and 75th percentiles of property prices rather than only the mean, we gain distribution‑aware valuations:

  • The 25th‑percentile model highlights features driving lower‑end home values—informing conservative appraisals and entry‑level market assessments
  • The median (50th‑percentile) model captures typical feature impacts for the bulk of the market
  • The 75th‑percentile model focuses on premium segments, showing which upgrades most boost high‑end values.

These tailored quantile estimates empower real‑estate professionals to price properties under varying market conditions—while managing risk for conservative lending and identifying high‑value opportunities in the upper tail.

Your 15 seconds will encourage us to work even harder
Please share your happy experience on Google | Facebook

ProjectGurukul Team

The ProjectGurukul Team delivers project-based tutorials on programming, machine learning, and web development. We simplify learning by providing hands-on projects to help you master real-world skills. We also provide free major and minor projects for enginering students.

Leave a Reply

Your email address will not be published. Required fields are marked *