Hydro Power Cost Prediction with Ridge & Lasso Mixed Regression in ML

FREE Online Courses: Click, Learn, Succeed, Start Now!

Hydro plants appear “fuel‑free,” yet their actual production cost varies hour‑to‑hour with head height, water inflow, turbine efficiency, gate position, and start‑up wear. Dispatchers who can forecast this marginal cost ($ / MWh) one step ahead can schedule turbines more profitably and bid smarter in day‑ahead markets.

However, raw SCADA features are highly collinear: head ≈ reservoir level, flow ≈ gate opening, etc. A pure Ridge model keeps every noisy term; a pure Lasso may over‑shrink. Elastic Net blends both penalties, yielding a sparse, stable regression.

Libraries Required

Purpose Library
Data & time handling pandas, numpy, datetime
Visualisation matplotlib, seaborn
Modelling pipeline scikit‑learnColumnTransformer, StandardScaler, ElasticNet, GridSearchCV, Pipeline, train_test_split
Evaluation mean_squared_error, r2_score

Dataset Link

Hydropower Plant Dataset

Step-by-Step Code Implementation

1. Import Libraries

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime, timedelta

from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.pipeline import Pipeline
from sklearn.linear_model import ElasticNet
from sklearn.metrics import mean_squared_error, r2_score

2. Download and load the dataset

Dataset: hourly SCADA + cost estimates for a run‑of‑river hydro station: head height, turbine flow, gate opening, efficiency, MW output, and historical variable cost per MWh.

# one‑time (needs Kaggle API token):
# kaggle datasets download -d hemantk/hydropower-plant-dataset -p data --unzip

df = pd.read_csv("data/hydropower_generation.csv")   # adjust filename if different

3. Quick EDA & target engineering

Cost one hour ahead supports look‑ahead bidding; a simple target shift creates supervised labels without leakage.

print(df.head())
# Assume dataset columns: DateTime, Head_m, Flow_m3s, Gate_Open_pct,
# Turbine_Eff_pct, Power_MW, Variable_Cost_USD_MWh
df['DateTime'] = pd.to_datetime(df['DateTime'])

# We’ll predict Variable_Cost_USD_MWh one hour ahead
df = df.sort_values('DateTime')
df['Cost_t+1'] = df['Variable_Cost_USD_MWh'].shift(-1)
df = df.dropna(subset=['Cost_t+1'])

4. Feature matrix & target

Why Elastic Net? — Head m and Flow m³/s are correlated; Gate_Open_pct and Flow likewise. Elastic Net’s ℓ² term stabilises coefficients, while its ℓ¹ term drives tiny effects to zero—yielding a concise, robust model.

num_cols = ['Head_m', 'Flow_m3s', 'Gate_Open_pct', 'Turbine_Eff_pct', 'Power_MW']
df['Hour']  = df['DateTime'].dt.hour
df['Month'] = df['DateTime'].dt.month
num_cols += ['Hour', 'Month']

X = df[num_cols]
y = df['Cost_t+1']            # $ / MWh one‑hour‑ahead cost

5. Build an Elastic Net pipeline

Numeric features are z‑scaled before modelling; the entire workflow is wrapped so cross‑validation cannot peek at the future.

preprocess = ColumnTransformer([
        ('num', StandardScaler(), num_cols)
    ])

pipe = Pipeline([
        ('prep', preprocess),
        ('enet', ElasticNet(max_iter=15000, random_state=42))
    ])

param_grid = {
    'enet__alpha': np.logspace(-3, 1, 20),      # 0.001 → 10
    'enet__l1_ratio': np.linspace(0.1, 0.9, 9)  # Ridge‑heavy → Lasso‑heavy
}

6. Train/test split & hyper‑parameter search

Twenty α values × nine l1‑ratios yield 180 candidate models; five‑fold CV picks the one with the lowest RMSE.

X_train, X_test, y_train, y_test = train_test_split(
        X, y, test_size=0.2, random_state=42, shuffle=False)   # keep time order

search = GridSearchCV(pipe, param_grid,
                      cv=5,
                      scoring='neg_root_mean_squared_error',
                      n_jobs=-1, verbose=1)
search.fit(X_train, y_train)

print("Best α:", search.best_params_['enet__alpha'])
print("Best l1_ratio:", search.best_params_['enet__l1_ratio'])

7. Evaluate on the hold‑out set

y_pred = search.predict(X_test)
rmse = mean_squared_error(y_test, y_pred, squared=False)
r2   = r2_score(y_test, y_pred)

print(f"Hold‑out RMSE: ${rmse:,.2f} /MWh | R²: {r2:.3f}")

8. Interpret coefficients

The bar chart reveals physical drivers of cost: high head (↓ cost), poor efficiency (↑ cost), or off‑season months where water rents change.

scales = search.best_estimator_.named_steps['prep'].named_transformers_['num'].scale_
coef   = search.best_estimator_.named_steps['enet'].coef_ / scales

imp = pd.Series(coef, index=num_cols).sort_values(key=abs, ascending=False)
plt.figure(figsize=(8,5))
imp.plot(kind='barh'); plt.gca().invert_yaxis()
plt.title('Elastic Net Coefficients – Cost Drivers')
plt.xlabel('Δ Cost ($/MWh) per unit change'); plt.show()

Summary

This notebook demonstrates how an Elastic Net mixed regression model can:

  • Forecast variable generation cost for hydro power one hour in advance, enabling economically optimal dispatch.
  • Handle multicollinearity among hydro‑physics inputs while automatically pruning noise.
  • Provide interpretable coefficients that pinpoint which operational levers (head, flow, efficiency) drive costs the most.

The entire pipeline retrains with a single .fit() when new SCADA data arrives—keeping the model fresh and decision‑ready.

Did you like our efforts? If Yes, please give ProjectGurukul 5 Stars on Google | Facebook

ProjectGurukul Team

ProjectGurukul Team specializes in creating project-based learning resources for programming, Java, Python, Android, AI, Webdevelopment and machine learning. Our mission is to help learners build practical skills through engaging, hands-on projects. We also offer free major and minor projects with source code for engineering students

Leave a Reply

Your email address will not be published. Required fields are marked *