Home Energy Consumption Prediction with Polynomial Regression in ML
FREE Online Courses: Click, Learn, Succeed, Start Now!
Utilities and homeowners need to forecast a home’s annual heating load (kWh/m²) based on building design parameters—relative compactness, surface area, wall and roof areas, overall height, glazing area and distribution, and orientation—before construction or retrofit decisions. These relationships are inherently nonlinear (e.g., heat loss scales with surface area and temperature differences), so a plain linear model underfits, while a naïve high‑degree polynomial overfits noise. By applying Polynomial Regression to carefully engineered features with ℓ² regularisation (Ridge), we can capture smooth, nonlinear dependencies and deliver accurate, interpretable energy‑consumption forecasts to guide efficient building designs.
Libraries Required
import pandas as pd # data loading & handling import numpy as np # numerical operations import matplotlib.pyplot as plt # plotting import seaborn as sns # enhanced visualization from sklearn.model_selection import train_test_split, GridSearchCV from sklearn.preprocessing import StandardScaler, PolynomialFeatures from sklearn.linear_model import Ridge from sklearn.pipeline import Pipeline from sklearn.metrics import mean_squared_error, r2_score
Dataset
Step-by-Step Code Implementation
Load Libraries & Data
import pandas as pd
# Load the CSV (adjust filename as needed)
df = pd.read_excel("data/ENB2012_data.xlsx")
# Preview key columns
df.head()[['X1','X2','X3','X4','X5','X6','X7','X8','Y1']]
Exploratory Data Analysis
import seaborn as sns
import matplotlib.pyplot as plt
# Scatter: relative compactness vs heating load
sns.scatterplot(x='X1', y='Y1', data=df, alpha=0.5)
plt.title("Relative Compactness vs Heating Load")
plt.xlabel("Relative Compactness")
plt.ylabel("Heating Load (kWh/m²)")
plt.show()
Define Features & Target
PolynomialFeatures augments the eight raw inputs with their squares and pairwise products (e.g., X1², X1 × X2), capturing nonlinear heat‑loss effects and interactions (e.g., compactness with glazing).
# Features: X1–X8 per UCI documentation feature_cols = ['X1','X2','X3','X4','X5','X6','X7','X8'] X = df[feature_cols] y = df['Y1'] # heating load
Build a Polynomial Regression Pipeline
- StandardScaler normalises all features so Ridge’s ℓ² penalty treats them uniformly, preventing dominance by high‑variance terms.
- Ridge regression applies ℓ² regularisation to shrink noisy high‑order coefficients, mitigating overfitting from the expanded feature space.
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler, PolynomialFeatures
from sklearn.linear_model import Ridge
pipe = Pipeline([
('scale', StandardScaler()),
('poly', PolynomialFeatures(include_bias=False)),
('ridge', Ridge(random_state=42))
])
Train/Test Split & Hyperparameter Search
GridSearchCV explores polynomial degrees (1–3) and regularisation strengths α (10⁻³ to 10³) via 5‑fold CV, optimising for the lowest RMSE on held‑out data.
from sklearn.model_selection import train_test_split, GridSearchCV
import numpy as np
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42
)
param_grid = {
'poly__degree': [1, 2, 3],
'ridge__alpha': np.logspace(-3, 3, 7)
}
gs = GridSearchCV(
pipe, param_grid,
cv=5,
scoring='neg_root_mean_squared_error',
n_jobs=-1, verbose=1
)
gs.fit(X_train, y_train)
print("Best parameters:", gs.best_params_)
Evaluate Model
y_pred = gs.predict(X_test)
rmse = mean_squared_error(y_test, y_pred, squared=False)
r2 = r2_score(y_test, y_pred)
print(f"Test RMSE: {rmse:.2f} kWh/m²")
print(f"Test R² : {r2:.3f}")
Inspect Key Polynomial Coefficients
Coefficient inspection highlights which polynomial terms—such as X1² or X4 × X7—most strongly drive predicted heating load, offering interpretable levers for building‑design adjustments (e.g., reducing glazing area or adjusting orientation).
# Retrieve polynomial feature names
poly = gs.best_estimator_.named_steps['poly']
feat_names = poly.get_feature_names_out(input_features=feature_cols)
# Retrieve Ridge coefficients
coefs = gs.best_estimator_.named_steps['ridge'].coef_
import pandas as pd
coef_series = pd.Series(coefs, index=feat_names).abs().sort_values(ascending=False)
# Plot top 10
plt.figure(figsize=(8,5))
coef_series.head(10).plot(kind='barh')
plt.gca().invert_yaxis()
plt.title("Top Polynomial Features Influencing Heating Load")
plt.xlabel("Coefficient Magnitude")
plt.tight_layout()
plt.show()
Summary
By integrating polynomial feature engineering with Ridge regularisation in a unified pipeline, this workflow delivers:
1. Accurate nonlinear forecasts of building heating load (low RMSE, strong R²).
2. Controlled model complexity, preventing overfitting while capturing critical curvature and interaction effects.
3. Interpretable insights, the most influential polynomial features guide architects and engineers on which design parameters (e.g., compactness, glazing, orientation) to prioritise for energy‑efficient buildings.