Gas Consumption Trend Prediction with Polynomial Regression in ML

FREE Online Courses: Your Passport to Excellence - Start Now

Utility planners and energy‐market analysts need to forecast monthly natural gas consumption (MMcf) across sectors—residential, commercial, industrial, and electric power—using only exogenous drivers known one to two months in advance: past consumption, heating‐degree days (HDD), cooling‐degree days (CDD), average retail price, and calendar effects. Historical trends reveal nonlinear demand responses: for example, residential usage rises sharply with HDD up to a point before plateauing, while price elasticity interacts with weather extremes. A simple linear model underfits these curvatures, and a high‐degree polynomial without regularisation overfits noise. By applying Polynomial Regression with Ridge regularisation on engineered features, we capture smooth consumption‐weather‐price dynamics and deliver reliable, interpretable forecasts for capacity planning and rate‑case support.

Libraries Required

import pandas as pd                             # data handling  
import numpy as np                              # numerical operations  

import matplotlib.pyplot as plt                 # plotting  
import seaborn as sns                           # visualization  

from sklearn.model_selection import train_test_split, GridSearchCV  
from sklearn.preprocessing import StandardScaler, PolynomialFeatures  
from sklearn.linear_model import Ridge  
from sklearn.pipeline import Pipeline  
from sklearn.metrics import mean_squared_error, r2_score

Dataset

Natural Gas Usage

Step-by-Step Code Implementation

Load Data & Libraries

import pandas as pd

# Adjust path as needed
df = pd.read_csv("natural_gas_usage.csv", parse_dates=["Date"])  

# Preview key columns
df.head()[["Date","Residential","Commercial","Industrial","Electric Power","Price"]]

Feature Engineering & EDA

import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

# Rename for convenience
df = df.rename(columns={"Electric Power":"Power"})

# Compute calendar features
df["Month"] = df["Date"].dt.month
df["Year"]  = df["Date"].dt.year

# Compute HDD and CDD proxies:
# assuming we have avg temperature column; if not, synthetic:
# Here, we simulate HDD and CDD from monthly avg temp if available.
# For illustration, assume df["Temp"] exists:
# df["HDD"] = np.maximum(65 - df["Temp"], 0) * 30
# df["CDD"] = np.maximum(df["Temp"] - 65, 0) * 30

# Plot residential usage vs Month to see seasonality
sns.lineplot(x="Month", y="Residential", data=df)
plt.title("Average Monthly Residential Gas Usage")
plt.xlabel("Month")
plt.ylabel("Consumption (MMcf)")
plt.show()

Define Features & Target

  • Lag feature (Lag1) captures persistence: gas demand tends to follow last month’s level with adjustments.
  • Seasonality is modelled using month dummies to capture recurring peaks and troughs.
  • Price enters to capture elasticity: higher prices dampen consumption nonlinearly.
# Target: total consumption across end uses
df["Total"] = df[["Residential","Commercial","Industrial","Power"]].sum(axis=1)

# Features: lagged consumption, month dummies, price
df["Lag1"] = df["Total"].shift(1)
df = df.dropna(subset=["Lag1"])

# One‑hot encode Month
month_dummies = pd.get_dummies(df["Month"], prefix="M", drop_first=True)
X = pd.concat([df[["Lag1","Price"]], month_dummies], axis=1)
y = df["Total"]

Build Polynomial Regression Pipeline

  • StandardScaler z‑scores all inputs so the Ridge penalty weights features uniformly.
  • PolynomialFeatures generates squares and cross‑terms (e.g., Lag1², Lag1×Price, Price², Lag1×M_12), capturing curvature and interactions.
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler, PolynomialFeatures
from sklearn.linear_model import Ridge

pipe = Pipeline([
    ("scale", StandardScaler()),
    ("poly", PolynomialFeatures(include_bias=False)),
    ("ridge", Ridge(random_state=42))
])

Train/Test Split & Hyperparameter Search

GridSearchCV tunes:

  • polynomial degree (1–3) to balance bias and flexibility,
  • regularisation strength α (10⁻³ to 10³) to control complexity,
  • using 5‑fold CV optimising RMSE.
from sklearn.model_selection import train_test_split, GridSearchCV

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, shuffle=False  # time‑series split
)

param_grid = {
    "poly__degree": [1, 2, 3],
    "ridge__alpha": np.logspace(-3, 3, 7)
}

gs = GridSearchCV(
    pipe, param_grid,
    cv=5,
    scoring="neg_root_mean_squared_error",
    n_jobs=-1, verbose=1
)
gs.fit(X_train, y_train)

print("Best parameters:", gs.best_params_)

Evaluate Model

y_pred = gs.predict(X_test)
rmse = mean_squared_error(y_test, y_pred, squared=False)
r2   = r2_score(y_test, y_pred)

print(f"Test RMSE: {rmse:.1f} MMcf")
print(f"Test R²  : {r2:.3f}")

Inspect Key Polynomial Coefficients

Coefficient inspection ranks the most influential polynomial features—e.g., Lag1², Lag1×Price, or seasonal cross‑terms—enabling interpretable insights into demand drivers.

# Retrieve feature names after expansion
poly = gs.best_estimator_.named_steps["poly"]
feat_names = poly.get_feature_names_out(input_features=X.columns)
coefs = gs.best_estimator_.named_steps["ridge"].coef_

import pandas as pd
imp = pd.Series(coefs, index=feat_names).abs().sort_values(ascending=False).head(10)

plt.figure(figsize=(8,5))
imp.plot(kind="barh")
plt.gca().invert_yaxis()
plt.title("Top Polynomial Features Driving Gas Demand")
plt.xlabel("Coefficient Magnitude")
plt.tight_layout()
plt.show()

Summary

This Polynomial Regression approach with Ridge regularisation:

  • Accurately forecasts monthly natural gas demand (low RMSE, high R²) by capturing nonlinear weather, price, and seasonality effects.
  • Balances model complexity via α‑tuning, safeguarding against overfitting in the presence of interaction terms.
  • Yields interpretable drivers such as lagged consumption squared or lag×price interactions, offering tangible levers for operational planning and rate‑case justification.

Your 15 seconds will encourage us to work even harder
Please share your happy experience on Google | Facebook

ProjectGurukul Team

The ProjectGurukul Team delivers project-based tutorials on programming, machine learning, and web development. We simplify learning by providing hands-on projects to help you master real-world skills. We also provide free major and minor projects for enginering students.

Leave a Reply

Your email address will not be published. Required fields are marked *