Energy Usage Prediction using Linear Regression in ML

FREE Online Courses: Click, Learn, Succeed, Start Now!

Commercial buildings account for a large slice of global electricity demand. Facility managers need tomorrow’s energy‑use estimate—not yesterday’s bill—to schedule chillers, negotiate power purchases, and spot waste.

In this hands‑on mini‑project, we build a linear‑regression baseline that predicts a building’s hourly meter reading (kWh) from weather, calendar information, and static building characteristics. While advanced models often improve accuracy, a transparent linear fit surfaces the first‑order drivers of consumption and provides a benchmark for future work.

Libraries Required

pandas # data wrangling
numpy # numerical helpers
matplotlib.pyplot # sanity‑check visuals
seaborn # quick EDA plots (optional)
scikit‑learn # split, pipeline, model, metrics
joblib # save the trained pipeline

Dataset Link

ASHRAE – Great Energy Predictor III B

Step by Step Code Implementation

 Import libraries

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns                         # optional
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import OneHotEncoder
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score, mean_absolute_error
import joblib

Load the data

Merging tables – Meter readings, weather, and building metadata live in separate CSVs. Merging on building_id, site_id, and timestamp squares everything into one tidy frame.

Download train.csv, weather_train.csv, and building_metadata.csv 

# base tables
meter   = pd.read_csv("train.csv")              # meter readings
weather = pd.read_csv("weather_train.csv")      # hourly weather
bmeta   = pd.read_csv("building_metadata.csv")  # static building info

# merge: meter → building metadata
df = meter.merge(bmeta, on="building_id", how="left")\
          .merge(weather, on=["site_id", "timestamp"], how="left")

# quick look
print(df.head())
print(df.shape)

 Basic cleaning

Why a log target? Energy readings vary over orders of magnitude between a small office and a vast hospital. log1p compresses extremes, helping linear regression fit residuals more evenly.

# convert timestamp
df['timestamp'] = pd.to_datetime(df['timestamp'])

# drop rows still missing critical weather fields
df = df.dropna(subset=['air_temperature', 'cloud_coverage'])

# optional: log‑transform meter readings to stabilise variance
df['log_meter_reading'] = np.log1p(df['meter_reading'])

Feature engineering

Calendar variables capture daily and seasonal rhythms that dominate demand—think weekday occupancy peaks or summer cooling loads.

# calendar cues
df['hour']      = df['timestamp'].dt.hour
df['dayofweek'] = df['timestamp'].dt.dayofweek   # 0‑Mon … 6‑Sun
df['month']     = df['timestamp'].dt.month

# predictor lists
num_cols = ['square_feet', 'air_temperature', 'dew_temperature',
            'cloud_coverage', 'precip_depth_1_hr',
            'hour', 'dayofweek', 'month']

cat_cols = ['primary_use', 'meter', 'site_id']

target   = 'log_meter_reading'

Pre‑processing & model pipeline

One‑hot encoding turns categorical flags (primary_use, meter type, site) into neutral binary columns so the model can learn a unique offset for each without imposing a false ordering.
Pipeline design ensures preprocessing and coefficient fitting travel together—vital for repeatable inference and for exporting the model into production jobs.

ohe = OneHotEncoder(handle_unknown='ignore')

preproc = ColumnTransformer([
        ('cat', ohe, cat_cols)
    ], remainder='passthrough')   # numeric columns pass through

linreg = LinearRegression(n_jobs=-1)

pipe = Pipeline(steps=[
        ('prep', preproc),
        ('model', linreg)
])

Train/test split and training

X = df[num_cols + cat_cols]
y = df[target]

X_train, X_test, y_train, y_test = train_test_split(
        X, y, test_size=0.2, random_state=42, shuffle=True)

pipe.fit(X_train, y_train)

 Evaluation

Performance metrics – R² shows share of variance explained; MAE (in log scale or kWh) gives planners a tangible forecast error band for safety‑stock electricity contracts.

y_pred = pipe.predict(X_test)
print(f"R²  : {r2_score(y_test, y_pred):.3f}")
print(f"MAE : {mean_absolute_error(y_test, y_pred):.3f} (log‑scale)")

(If you skipped the log transform, report MAE in kWh instead.)

 Inspect top coefficients

Positive values signal features that raise consumption (e.g., larger square_feet, high air_temperature for cooling), while negative values reveal potential savings levers.

# recover the feature names produced by OneHotEncoder
ohe_feats = pipe.named_steps['prep']\
                .named_transformers_['cat']\
                .get_feature_names_out(cat_cols)

feature_names = list(ohe_feats) + num_cols
coef_series   = pd.Series(pipe.named_steps['model'].coef_,
                          index=feature_names).sort_values()

print("Largest positive drivers:")
print(coef_series.tail(10))
print("\nLargest negative drivers:")
print(coef_series.head(10))

 Persist the trained pipeline.

Persisting with joblib freezes both the encoder and the regression weights, guarding against column‑order mix‑ups when scoring tomorrow’s sensor feed.

joblib.dump(pipe, "building_energy_linreg.pkl")

Summary

Starting from raw logs, this exercise walks through a data‑to‑insight pipeline for hourly building energy usage prediction. With minimal cleaning, calendar+weather features, and a linear model, we can already flag the biggest consumption drivers and deliver a same‑day forecast that operations teams can trust. Keep this interpretable baseline as your yardstick; when you graduate to regularised regressors or gradient‑boosted trees, you’ll know exactly how much real value the added complexity contributes.

Did you like our efforts? If Yes, please give ProjectGurukul 5 Stars on Google | Facebook

Energy Usage Prediction using Linear Regression in ML

Libraries Required

Dataset Link

Step by Step Code Implementation

Import libraries

Load the data

Basic cleaning

Feature engineering

Pre‑processing & model pipeline

Train/test split and training

Evaluation

Inspect top coefficients

Persist the trained pipeline.

Summary

Leave a Reply Cancel reply

 Import libraries

 Basic cleaning

 Evaluation

 Inspect top coefficients

 Persist the trained pipeline.