Property Maintenance Fee Prediction using Quantile Regression in ML
FREE Online Courses: Elevate Skills, Zero Cost. Enroll Now!
Homeowners and real‑estate managers face widely varying monthly maintenance fees—from modest condo dues (10th percentile) to high‐amenity building charges (90th percentile). Budgeting based on an average fee masks these extremes and can lead to under‑ or over‑provisioning of funds.
In the property maintenance fee prediction project, we’ll predict the 10th, 50th, and 90th percentiles of monthly maintenance fees (in local currency) for residential properties based on attributes such as living area, number of bedrooms, age of property, proximity to transit, crime rate, and neighbourhood socioeconomic indicators.
By fitting separate quantile regression models, we’ll equip property managers and homeowners’ associations with distribution‑aware forecasts—planning for lean‑fee scenarios, typical dues, and high‑cost extremes.
Libraries Required
import pandas as pd import numpy as np import statsmodels.formula.api as smf # Quantile regression via formula API from sklearn.model_selection import train_test_split from sklearn.metrics import mean_pinball_loss # Proper loss for quantile forecasts
Dataset
Dataset for House Price Analysis
Step-by-Step Code Implementation
Load & Inspect Data
We load a multi‑feature housing dataset that includes Maintenance_Fees (monthly condo/HOA dues) alongside price, area, bedrooms, age, transit proximity, crime rate, and a socioeconomic index.
Initial .info() and .describe() confirm data types and reveal fee distribution (e.g., min, median, max).
# Load the house‐price analysis dataset which includes maintenance fees :contentReference[oaicite:0]{index=0}
df = pd.read_csv("dataset-for-house-price-analysis.csv")
print(df.head())
print(df.info())
print(df['Maintenance_Fees'].describe())
Preprocessing & Feature Engineering
- We drop any records missing the target or key predictors to ensure clean modelling.
- We rename Maintenance_Fees to Fee for succinct formulas.
- We select seven predictors that plausibly drive maintenance costs: property price (proxy for building quality), living area, bedroom count, property age (older buildings often have higher fees), proximity to transit (denser areas may have higher HOA services), local crime rate (security costs), and a socioeconomic index capturing neighbourhood wealth.
# Keep rows with non‐missing target and core predictors
df = df.dropna(subset=[
'Maintenance_Fees','Price','Area','Bedrooms',
'Age','Proximity_to_Transit_km','CrimeRate','SocioeconomicIndex'
])
# Rename for convenience
df = df.rename(columns={
'Area':'LivingArea',
'Maintenance_Fees':'Fee'
})
# Define predictors and response
features = [
'Price','LivingArea','Bedrooms','Age',
'Proximity_to_Transit_km','CrimeRate','SocioeconomicIndex'
]
data = df[features + ['Fee']]
Train/Test Split
We randomly hold out 20% of the data to evaluate model generalization on unseen properties.
train, test = train_test_split(data, test_size=0.2, random_state=42)
Fit Quantile Regression Models
For each target quantile (10th, 50th, 90th percentiles). We construct a formula string:
- “Fee ~ Price + LivingArea + Bedrooms + Age + Proximity_to_Transit_km + CrimeRate + SocioeconomicIndex”
- We fit a QuantReg model on the training set at that percentile via statsmodels.
- We print the coefficient table (.tables[1]), revealing how each predictor’s marginal effect on monthly fees shifts across low‑, median‑, and high‑fee scenarios (e.g., a larger area may add modest fees at the 10th percentile but much larger dues at the 90th).
quantiles = [0.10, 0.50, 0.90]
models = {}
formula = "Fee ~ " + " + ".join(features)
for q in quantiles:
mod = smf.quantreg(formula, train)
res = mod.fit(q=q)
models[q] = res
print(f"\n--- {int(q*100)}th Percentile Coefficients ---")
print(res.summary().tables[1])
Evaluation with Pinball Loss
- We predict quantile‑specific fees on the test set.
- We compute pinball loss for each quantile—a loss function tailored to quantile forecasts that penalizes under‑ and over‑predictions asymmetrically according to the target percentile. Lower pinball loss indicates better‑calibrated, distribution‑aware fee forecasts.
for q, res in models.items():
preds = res.predict(test[features])
loss = mean_pinball_loss(test['Fee'], preds, alpha=q)
print(f"{int(q*100)}th percentile pinball loss: {loss:.2f}")
Summary
Quantile regression enables distribution‑aware maintenance fee forecasting:
- 10th‑percentile predictions support conservative budgeting by anticipating low-fee scenarios (e.g., minimal-service buildings).
- Median (50th‑percentile) forecasts inform typical HOA dues and guide standard reserve fund planning.
- 90th‑percentile estimates are prepared for high-amenity, premium buildings, ensuring sufficient capital for generous service levels.
By modelling multiple quantiles, property managers and board treasurers gain robust insights into fee variability—optimizing reserve fund allocations, anticipating cash flow needs, and mitigating financial risk across the full spectrum of maintenance cost scenarios.