Gas Consumption Trend Prediction with Polynomial Regression in ML

FREE Online Courses: Transform Your Career – Enroll for Free!

Utility planners and energy‐market analysts need to forecast monthly natural gas consumption (MMcf) across sectors—residential, commercial, industrial, and electric power—using only exogenous drivers known one to two months in advance: past consumption, heating‐degree days (HDD), cooling‐degree days (CDD), average retail price, and calendar effects. Historical trends reveal nonlinear demand responses: for example, residential usage rises sharply with HDD up to a point before plateauing, while price elasticity interacts with weather extremes. A simple linear model underfits these curvatures, and a high‐degree polynomial without regularisation overfits noise. By applying Polynomial Regression with Ridge regularisation on engineered features, we capture smooth consumption‐weather‐price dynamics and deliver reliable, interpretable forecasts for capacity planning and rate‑case support.

Libraries Required

import pandas as pd                             # data handling  
import numpy as np                              # numerical operations  

import matplotlib.pyplot as plt                 # plotting  
import seaborn as sns                           # visualization  

from sklearn.model_selection import train_test_split, GridSearchCV  
from sklearn.preprocessing import StandardScaler, PolynomialFeatures  
from sklearn.linear_model import Ridge  
from sklearn.pipeline import Pipeline  
from sklearn.metrics import mean_squared_error, r2_score

Dataset

Natural Gas Usage

Step-by-Step Code Implementation

Load Data & Libraries

import pandas as pd

# Adjust path as needed
df = pd.read_csv("natural_gas_usage.csv", parse_dates=["Date"])  

# Preview key columns
df.head()[["Date","Residential","Commercial","Industrial","Electric Power","Price"]]

Feature Engineering & EDA

import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

# Rename for convenience
df = df.rename(columns={"Electric Power":"Power"})

# Compute calendar features
df["Month"] = df["Date"].dt.month
df["Year"]  = df["Date"].dt.year

# Compute HDD and CDD proxies:
# assuming we have avg temperature column; if not, synthetic:
# Here, we simulate HDD and CDD from monthly avg temp if available.
# For illustration, assume df["Temp"] exists:
# df["HDD"] = np.maximum(65 - df["Temp"], 0) * 30
# df["CDD"] = np.maximum(df["Temp"] - 65, 0) * 30

# Plot residential usage vs Month to see seasonality
sns.lineplot(x="Month", y="Residential", data=df)
plt.title("Average Monthly Residential Gas Usage")
plt.xlabel("Month")
plt.ylabel("Consumption (MMcf)")
plt.show()

Define Features & Target

Lag feature (Lag1) captures persistence: gas demand tends to follow last month’s level with adjustments.
Seasonality is modelled using month dummies to capture recurring peaks and troughs.
Price enters to capture elasticity: higher prices dampen consumption nonlinearly.

# Target: total consumption across end uses
df["Total"] = df[["Residential","Commercial","Industrial","Power"]].sum(axis=1)

# Features: lagged consumption, month dummies, price
df["Lag1"] = df["Total"].shift(1)
df = df.dropna(subset=["Lag1"])

# One‑hot encode Month
month_dummies = pd.get_dummies(df["Month"], prefix="M", drop_first=True)
X = pd.concat([df[["Lag1","Price"]], month_dummies], axis=1)
y = df["Total"]

Build Polynomial Regression Pipeline

StandardScaler z‑scores all inputs so the Ridge penalty weights features uniformly.
PolynomialFeatures generates squares and cross‑terms (e.g., Lag1², Lag1×Price, Price², Lag1×M_12), capturing curvature and interactions.

from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler, PolynomialFeatures
from sklearn.linear_model import Ridge

pipe = Pipeline([
    ("scale", StandardScaler()),
    ("poly", PolynomialFeatures(include_bias=False)),
    ("ridge", Ridge(random_state=42))
])

Train/Test Split & Hyperparameter Search

GridSearchCV tunes:

polynomial degree (1–3) to balance bias and flexibility,
regularisation strength α (10⁻³ to 10³) to control complexity,
using 5‑fold CV optimising RMSE.

from sklearn.model_selection import train_test_split, GridSearchCV

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, shuffle=False  # time‑series split
)

param_grid = {
    "poly__degree": [1, 2, 3],
    "ridge__alpha": np.logspace(-3, 3, 7)
}

gs = GridSearchCV(
    pipe, param_grid,
    cv=5,
    scoring="neg_root_mean_squared_error",
    n_jobs=-1, verbose=1
)
gs.fit(X_train, y_train)

print("Best parameters:", gs.best_params_)

Evaluate Model

y_pred = gs.predict(X_test)
rmse = mean_squared_error(y_test, y_pred, squared=False)
r2   = r2_score(y_test, y_pred)

print(f"Test RMSE: {rmse:.1f} MMcf")
print(f"Test R²  : {r2:.3f}")

Inspect Key Polynomial Coefficients

Coefficient inspection ranks the most influential polynomial features—e.g., Lag1², Lag1×Price, or seasonal cross‑terms—enabling interpretable insights into demand drivers.

# Retrieve feature names after expansion
poly = gs.best_estimator_.named_steps["poly"]
feat_names = poly.get_feature_names_out(input_features=X.columns)
coefs = gs.best_estimator_.named_steps["ridge"].coef_

import pandas as pd
imp = pd.Series(coefs, index=feat_names).abs().sort_values(ascending=False).head(10)

plt.figure(figsize=(8,5))
imp.plot(kind="barh")
plt.gca().invert_yaxis()
plt.title("Top Polynomial Features Driving Gas Demand")
plt.xlabel("Coefficient Magnitude")
plt.tight_layout()
plt.show()

Summary

This Polynomial Regression approach with Ridge regularisation:

Accurately forecasts monthly natural gas demand (low RMSE, high R²) by capturing nonlinear weather, price, and seasonality effects.
Balances model complexity via α‑tuning, safeguarding against overfitting in the presence of interaction terms.
Yields interpretable drivers such as lagged consumption squared or lag×price interactions, offering tangible levers for operational planning and rate‑case justification.

We work very hard to provide you quality material
Could you take 15 seconds and share your happy experience on Google | Facebook

Gas Consumption Trend Prediction with Polynomial Regression in ML

Libraries Required

Dataset

Step-by-Step Code Implementation

Load Data & Libraries

Feature Engineering & EDA

Define Features & Target

Build Polynomial Regression Pipeline

Train/Test Split & Hyperparameter Search

Evaluate Model

Inspect Key Polynomial Coefficients