Home Energy Consumption Prediction with Polynomial Regression in ML

FREE Online Courses: Elevate Skills, Zero Cost. Enroll Now!

Utilities and homeowners need to forecast a home’s annual heating load (kWh/m²) based on building design parameters—relative compactness, surface area, wall and roof areas, overall height, glazing area and distribution, and orientation—before construction or retrofit decisions. These relationships are inherently nonlinear (e.g., heat loss scales with surface area and temperature differences), so a plain linear model underfits, while a naïve high‑degree polynomial overfits noise. By applying Polynomial Regression to carefully engineered features with ℓ² regularisation (Ridge), we can capture smooth, nonlinear dependencies and deliver accurate, interpretable energy‑consumption forecasts to guide efficient building designs.

Libraries Required

import pandas as pd                         # data loading & handling  
import numpy as np                          # numerical operations  

import matplotlib.pyplot as plt             # plotting  
import seaborn as sns                       # enhanced visualization  

from sklearn.model_selection import train_test_split, GridSearchCV  
from sklearn.preprocessing import StandardScaler, PolynomialFeatures  
from sklearn.linear_model import Ridge  
from sklearn.pipeline import Pipeline  
from sklearn.metrics import mean_squared_error, r2_score

Dataset

Energy Efficiency Data Set

Step-by-Step Code Implementation

Load Libraries & Data

import pandas as pd

# Load the CSV (adjust filename as needed)
df = pd.read_excel("data/ENB2012_data.xlsx")

# Preview key columns
df.head()[['X1','X2','X3','X4','X5','X6','X7','X8','Y1']]

Exploratory Data Analysis

import seaborn as sns
import matplotlib.pyplot as plt

# Scatter: relative compactness vs heating load
sns.scatterplot(x='X1', y='Y1', data=df, alpha=0.5)
plt.title("Relative Compactness vs Heating Load")
plt.xlabel("Relative Compactness")
plt.ylabel("Heating Load (kWh/m²)")
plt.show()

Define Features & Target

PolynomialFeatures augments the eight raw inputs with their squares and pairwise products (e.g., X1², X1 × X2), capturing nonlinear heat‑loss effects and interactions (e.g., compactness with glazing).

# Features: X1–X8 per UCI documentation
feature_cols = ['X1','X2','X3','X4','X5','X6','X7','X8']
X = df[feature_cols]
y = df['Y1']  # heating load

Build a Polynomial Regression Pipeline

  • StandardScaler normalises all features so Ridge’s ℓ² penalty treats them uniformly, preventing dominance by high‑variance terms.
  • Ridge regression applies ℓ² regularisation to shrink noisy high‑order coefficients, mitigating overfitting from the expanded feature space.
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler, PolynomialFeatures
from sklearn.linear_model import Ridge

pipe = Pipeline([
    ('scale', StandardScaler()),  
    ('poly', PolynomialFeatures(include_bias=False)),  
    ('ridge', Ridge(random_state=42))  
])

Train/Test Split & Hyperparameter Search

GridSearchCV explores polynomial degrees (1–3) and regularisation strengths α (10⁻³ to 10³) via 5‑fold CV, optimising for the lowest RMSE on held‑out data.

from sklearn.model_selection import train_test_split, GridSearchCV
import numpy as np

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

param_grid = {
    'poly__degree': [1, 2, 3],
    'ridge__alpha': np.logspace(-3, 3, 7)
}

gs = GridSearchCV(
    pipe, param_grid,
    cv=5,
    scoring='neg_root_mean_squared_error',
    n_jobs=-1, verbose=1
)
gs.fit(X_train, y_train)

print("Best parameters:", gs.best_params_)

Evaluate Model

y_pred = gs.predict(X_test)
rmse = mean_squared_error(y_test, y_pred, squared=False)
r2   = r2_score(y_test, y_pred)

print(f"Test RMSE: {rmse:.2f} kWh/m²")
print(f"Test R²  : {r2:.3f}")

Inspect Key Polynomial Coefficients

Coefficient inspection highlights which polynomial terms—such as X1² or X4 × X7—most strongly drive predicted heating load, offering interpretable levers for building‑design adjustments (e.g., reducing glazing area or adjusting orientation).

# Retrieve polynomial feature names
poly = gs.best_estimator_.named_steps['poly']
feat_names = poly.get_feature_names_out(input_features=feature_cols)

# Retrieve Ridge coefficients
coefs = gs.best_estimator_.named_steps['ridge'].coef_

import pandas as pd
coef_series = pd.Series(coefs, index=feat_names).abs().sort_values(ascending=False)

# Plot top 10
plt.figure(figsize=(8,5))
coef_series.head(10).plot(kind='barh')
plt.gca().invert_yaxis()
plt.title("Top Polynomial Features Influencing Heating Load")
plt.xlabel("Coefficient Magnitude")
plt.tight_layout()
plt.show()

Summary

By integrating polynomial feature engineering with Ridge regularisation in a unified pipeline, this workflow delivers:

1. Accurate nonlinear forecasts of building heating load (low RMSE, strong R²).

2. Controlled model complexity, preventing overfitting while capturing critical curvature and interaction effects.

3. Interpretable insights, the most influential polynomial features guide architects and engineers on which design parameters (e.g., compactness, glazing, orientation) to prioritise for energy‑efficient buildings.

Did you like this article? If Yes, please give ProjectGurukul 5 Stars on Google | Facebook

ProjectGurukul Team

The ProjectGurukul Team delivers project-based tutorials on programming, machine learning, and web development. We simplify learning by providing hands-on projects to help you master real-world skills. We also provide free major and minor projects for enginering students.

Leave a Reply

Your email address will not be published. Required fields are marked *