Property Tax Cost Prediction using Bayesian Regression in ML

FREE Online Courses: Knowledge Awaits – Click for Free Access!

Municipal budget officers and real estate analysts need to predict the local property tax rate—before assessing bills—using parcel characteristics such as crime rate, zoning, lot size, building age, room count, and school-to-pupil ratio. Property‐tax rates exhibit both systematic trends (e.g., higher in dense urban zones) and local variability (e.g., funding needs, voter‐approved levies). By applying Bayesian Regression, we not only generate a point estimate for each town’s tax rate (per $10,000 of assessed value) but also credible intervals that capture our uncertainty, enabling more robust revenue forecasting and policy planning.

Libraries Required

import pandas as pd                            # data I/O  
import numpy as np                             # numerical operations  

import matplotlib.pyplot as plt                # plotting  
import seaborn as sns                          # visualization enhancements  

import pymc3 as pm                             # Bayesian modeling  
import arviz as az                             # posterior analysis  

from sklearn.model_selection import train_test_split  
from sklearn.preprocessing import StandardScaler  
from sklearn.metrics import mean_absolute_error  

Dataset

Boston Housing dataset

Step-by-Step Code Implementation

Import Libraries & Load Data

import pandas as pd

# Load it; columns include CRIM, ZN, INDUS, CHAS, NOX, RM, AGE, DIS, RAD, TAX, PTRATIO, B, LSTAT, MEDV
df = pd.read_csv("data/boston-housing-dataset/housing.csv")
df.head()

Preprocessing & Train/Test Split

  • Feature selection: We exclude TAX (the target) and MEDV (sale price) and retain the 12 socioeconomic and housing‐stock predictors.
  • StandardScaler: Zero‐means and unit‐scales each predictor so the β priors act uniformly across dimensions.
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

# Define features (all except TAX and MEDV) and target (TAX)
feature_cols = ['CRIM','ZN','INDUS','CHAS','NOX','RM','AGE','DIS','RAD','PTRATIO','B','LSTAT']
X = df[feature_cols].values
y = df['TAX'].values  # property‐tax rate per $10k

# Chronological split isn't applicable; random split suffices
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# Standardize predictors for stable MCMC
scaler = StandardScaler().fit(X_train)
X_train_s = scaler.transform(X_train)
X_test_s  = scaler.transform(X_test)

Define & Fit Bayesian Regression Model

Priors:

  • α is Normal(20, 10), reflecting median Boston tax rates of roughly 300–700 scaled by later intercept adjustment.
  • βᵢ are Normal(0, 5), allowing for moderate positive or negative effects of each standardised predictor.
  • σ is HalfNormal(5), encoding residual variability in tax rates.

Model: The linear predictor μ = α + Σ βᵢ·Xᵢ is linked to observed TAX via a Normal likelihood.

MCMC: We draw 2,000 samples (after 1,000 tuning steps) with target_accept=0.9 to ensure stable exploration of the posterior.

import pymc3 as pm

with pm.Model() as tax_model:
    # Priors: intercept and coefficients
    α = pm.Normal("α", mu=20, sigma=10)                # Typical tax rates ≈300–700; offset by scaling later
    β = pm.Normal("β", mu=0, sigma=5, shape=X_train_s.shape[1])
    σ = pm.HalfNormal("σ", sigma=5)

    # Linear predictor in standardized space
    μ = α + pm.math.dot(X_train_s, β)

    # Likelihood: observed tax rates
    Y_obs = pm.Normal("Y_obs", mu=μ, sigma=σ, observed=y_train)

    # MCMC sampling
    trace = pm.sample(
        draws=2000, tune=1000,
        target_accept=0.9,
        return_inferencedata=True
    )

Posterior Analysis & Point Predictions

Posterior predictive: Sampling Y_obs yields predictive distributions at new inputs, from which we compute both point estimates (posterior mean) and a 94% credible interval.

Evaluation: We use MAE to quantify the average deviation between point predictions and actual tax rates on held-out data.

import arviz as az

# Summarize posterior distributions
az.summary(trace, round_to=2)

# Posterior predictive sampling
with tax_model:
    ppc = pm.sample_posterior_predictive(trace, var_names=["Y_obs"])

# Compute posterior means of α and β
α_post = trace.posterior["α"].mean().item()
β_post = trace.posterior["β"].mean(dim=["chain","draw"]).values

# Point‐estimate predictions on test set
y_pred = α_post + X_test_s.dot(β_post)

# Evaluate mean absolute error
mae = mean_absolute_error(y_test, y_pred) 
print(f"Test MAE: {mae:.2f} (rate per $10k)")

Visualise Predictions & Credible Intervals

By varying a single feature (e.g., INDUS) while holding others constant, we illustrate how predicted tax rates change nonlinearly, with uncertainty bands.

# Example: vary INDUS (industrial land proportion), hold others at median
indus_grid = np.linspace(X_train_s[:,2].min(), X_train_s[:,2].max(), 100)
# Build grid of standardized inputs
grid = np.tile(np.median(X_train_s, axis=0), (100,1))
grid[:,2] = indus_grid

with tax_model:
    pm.set_data({"X": grid})
    ppc_grid = pm.sample_posterior_predictive(trace, var_names=["Y_obs"])

preds = ppc_grid["Y_obs"]
mean_pred = preds.mean(axis=0)
hpd = az.hdi(preds, hdi_prob=0.94)

# Convert INDUS back to original scale
indus_orig = scaler.inverse_transform(
    np.column_stack([grid[:,i] if i!=2 else indus_grid for i in range(grid.shape[1])])
)[:,2]

plt.figure(figsize=(8,5))
plt.plot(indus_orig, mean_pred, label="Posterior mean")
plt.fill_between(indus_orig, hpd[:,0], hpd[:,1], alpha=0.3, label="94% CI")
plt.scatter(
    scaler.inverse_transform(X_test_s)[:,2], y_test,
    color="k", alpha=0.5, label="Test data"
)
plt.xlabel("Proportion of Non‐Retail Business Acres (INDUS)")
plt.ylabel("Property‐Tax Rate per $10k")
plt.title("Bayesian Regression: TAX vs. INDUS")
plt.legend()
plt.show()

Summary

This Bayesian Regression workflow for property‐tax forecasting provides:

1. Point estimates of local tax rates (per $10 000 assessed value) from census and housing characteristics.

2. Credible intervals that quantify uncertainty from socioeconomic variability and model residuals.

3. Actionable insights: city planners and finance departments can use both expected tax‐rate forecasts and their uncertainty bounds to sharpen revenue projections, guide ballot‐measure timing, and inform equitable budgeting decisions.

Did we exceed your expectations?
If Yes, share your valuable feedback on Google | Facebook

ProjectGurukul Team

The ProjectGurukul Team delivers project-based tutorials on programming, machine learning, and web development. We simplify learning by providing hands-on projects to help you master real-world skills. We also provide free major and minor projects for enginering students.

Leave a Reply

Your email address will not be published. Required fields are marked *