Property Maintenance Fee Prediction using Quantile Regression in ML

We offer you a brighter future with FREE online courses - Start Now!!

Homeowners and real‑estate managers face widely varying monthly maintenance fees—from modest condo dues (10th percentile) to high‐amenity building charges (90th percentile). Budgeting based on an average fee masks these extremes and can lead to under‑ or over‑provisioning of funds.

In the property maintenance fee prediction project, we’ll predict the 10th, 50th, and 90th percentiles of monthly maintenance fees (in local currency) for residential properties based on attributes such as living area, number of bedrooms, age of property, proximity to transit, crime rate, and neighbourhood socioeconomic indicators.

By fitting separate quantile regression models, we’ll equip property managers and homeowners’ associations with distribution‑aware forecasts—planning for lean‑fee scenarios, typical dues, and high‑cost extremes.

Libraries Required

import pandas as pd  
import numpy as np  
import statsmodels.formula.api as smf    # Quantile regression via formula API  
from sklearn.model_selection import train_test_split  
from sklearn.metrics import mean_pinball_loss  # Proper loss for quantile forecasts

Dataset

Dataset for House Price Analysis

Step-by-Step Code Implementation

Load & Inspect Data

We load a multi‑feature housing dataset that includes Maintenance_Fees (monthly condo/HOA dues) alongside price, area, bedrooms, age, transit proximity, crime rate, and a socioeconomic index.

Initial .info() and .describe() confirm data types and reveal fee distribution (e.g., min, median, max).

# Load the house‐price analysis dataset which includes maintenance fees :contentReference[oaicite:0]{index=0}
df = pd.read_csv("dataset-for-house-price-analysis.csv")

print(df.head())
print(df.info())
print(df['Maintenance_Fees'].describe())

Preprocessing & Feature Engineering

We drop any records missing the target or key predictors to ensure clean modelling.
We rename Maintenance_Fees to Fee for succinct formulas.
We select seven predictors that plausibly drive maintenance costs: property price (proxy for building quality), living area, bedroom count, property age (older buildings often have higher fees), proximity to transit (denser areas may have higher HOA services), local crime rate (security costs), and a socioeconomic index capturing neighbourhood wealth.

# Keep rows with non‐missing target and core predictors
df = df.dropna(subset=[
    'Maintenance_Fees','Price','Area','Bedrooms',
    'Age','Proximity_to_Transit_km','CrimeRate','SocioeconomicIndex'
])

# Rename for convenience
df = df.rename(columns={
    'Area':'LivingArea',
    'Maintenance_Fees':'Fee'
})

# Define predictors and response
features = [
    'Price','LivingArea','Bedrooms','Age',
    'Proximity_to_Transit_km','CrimeRate','SocioeconomicIndex'
]
data = df[features + ['Fee']]

Train/Test Split

We randomly hold out 20% of the data to evaluate model generalization on unseen properties.

train, test = train_test_split(data, test_size=0.2, random_state=42)

Fit Quantile Regression Models

For each target quantile (10th, 50th, 90th percentiles). We construct a formula string:

“Fee ~ Price + LivingArea + Bedrooms + Age + Proximity_to_Transit_km + CrimeRate + SocioeconomicIndex”
We fit a QuantReg model on the training set at that percentile via statsmodels.
We print the coefficient table (.tables[1]), revealing how each predictor’s marginal effect on monthly fees shifts across low‑, median‑, and high‑fee scenarios (e.g., a larger area may add modest fees at the 10th percentile but much larger dues at the 90th).

quantiles = [0.10, 0.50, 0.90]
models    = {}
formula   = "Fee ~ " + " + ".join(features)

for q in quantiles:
    mod = smf.quantreg(formula, train)
    res = mod.fit(q=q)
    models[q] = res
    print(f"\n--- {int(q*100)}th Percentile Coefficients ---")
    print(res.summary().tables[1])

Evaluation with Pinball Loss

We predict quantile‑specific fees on the test set.
We compute pinball loss for each quantile—a loss function tailored to quantile forecasts that penalizes under‑ and over‑predictions asymmetrically according to the target percentile. Lower pinball loss indicates better‑calibrated, distribution‑aware fee forecasts.

for q, res in models.items():
    preds = res.predict(test[features])
    loss  = mean_pinball_loss(test['Fee'], preds, alpha=q)
    print(f"{int(q*100)}th percentile pinball loss: {loss:.2f}")

Summary

Quantile regression enables distribution‑aware maintenance fee forecasting:

10th‑percentile predictions support conservative budgeting by anticipating low-fee scenarios (e.g., minimal-service buildings).
Median (50th‑percentile) forecasts inform typical HOA dues and guide standard reserve fund planning.
90th‑percentile estimates are prepared for high-amenity, premium buildings, ensuring sufficient capital for generous service levels.

By modelling multiple quantiles, property managers and board treasurers gain robust insights into fee variability—optimizing reserve fund allocations, anticipating cash flow needs, and mitigating financial risk across the full spectrum of maintenance cost scenarios.

Your opinion matters
Please write your valuable feedback about ProjectGurukul on Google | Facebook