Hydro Power Cost Prediction with Ridge & Lasso Mixed Regression in ML
FREE Online Courses: Click, Learn, Succeed, Start Now!
Hydro plants appear “fuel‑free,” yet their actual production cost varies hour‑to‑hour with head height, water inflow, turbine efficiency, gate position, and start‑up wear. Dispatchers who can forecast this marginal cost ($ / MWh) one step ahead can schedule turbines more profitably and bid smarter in day‑ahead markets.
However, raw SCADA features are highly collinear: head ≈ reservoir level, flow ≈ gate opening, etc. A pure Ridge model keeps every noisy term; a pure Lasso may over‑shrink. Elastic Net blends both penalties, yielding a sparse, stable regression.
Libraries Required
| Purpose | Library |
| Data & time handling | pandas, numpy, datetime |
| Visualisation | matplotlib, seaborn |
| Modelling pipeline | scikit‑learn → ColumnTransformer, StandardScaler, ElasticNet, GridSearchCV, Pipeline, train_test_split |
| Evaluation | mean_squared_error, r2_score |
Dataset Link
Step-by-Step Code Implementation
1. Import Libraries
import pandas as pd import numpy as np import matplotlib.pyplot as plt import seaborn as sns from datetime import datetime, timedelta from sklearn.compose import ColumnTransformer from sklearn.preprocessing import StandardScaler from sklearn.model_selection import train_test_split, GridSearchCV from sklearn.pipeline import Pipeline from sklearn.linear_model import ElasticNet from sklearn.metrics import mean_squared_error, r2_score
2. Download and load the dataset
Dataset: hourly SCADA + cost estimates for a run‑of‑river hydro station: head height, turbine flow, gate opening, efficiency, MW output, and historical variable cost per MWh.
# one‑time (needs Kaggle API token):
# kaggle datasets download -d hemantk/hydropower-plant-dataset -p data --unzip
df = pd.read_csv("data/hydropower_generation.csv") # adjust filename if different
3. Quick EDA & target engineering
Cost one hour ahead supports look‑ahead bidding; a simple target shift creates supervised labels without leakage.
print(df.head())
# Assume dataset columns: DateTime, Head_m, Flow_m3s, Gate_Open_pct,
# Turbine_Eff_pct, Power_MW, Variable_Cost_USD_MWh
df['DateTime'] = pd.to_datetime(df['DateTime'])
# We’ll predict Variable_Cost_USD_MWh one hour ahead
df = df.sort_values('DateTime')
df['Cost_t+1'] = df['Variable_Cost_USD_MWh'].shift(-1)
df = df.dropna(subset=['Cost_t+1'])
4. Feature matrix & target
Why Elastic Net? — Head m and Flow m³/s are correlated; Gate_Open_pct and Flow likewise. Elastic Net’s ℓ² term stabilises coefficients, while its ℓ¹ term drives tiny effects to zero—yielding a concise, robust model.
num_cols = ['Head_m', 'Flow_m3s', 'Gate_Open_pct', 'Turbine_Eff_pct', 'Power_MW'] df['Hour'] = df['DateTime'].dt.hour df['Month'] = df['DateTime'].dt.month num_cols += ['Hour', 'Month'] X = df[num_cols] y = df['Cost_t+1'] # $ / MWh one‑hour‑ahead cost
5. Build an Elastic Net pipeline
Numeric features are z‑scaled before modelling; the entire workflow is wrapped so cross‑validation cannot peek at the future.
preprocess = ColumnTransformer([
('num', StandardScaler(), num_cols)
])
pipe = Pipeline([
('prep', preprocess),
('enet', ElasticNet(max_iter=15000, random_state=42))
])
param_grid = {
'enet__alpha': np.logspace(-3, 1, 20), # 0.001 → 10
'enet__l1_ratio': np.linspace(0.1, 0.9, 9) # Ridge‑heavy → Lasso‑heavy
}
6. Train/test split & hyper‑parameter search
Twenty α values × nine l1‑ratios yield 180 candidate models; five‑fold CV picks the one with the lowest RMSE.
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42, shuffle=False) # keep time order
search = GridSearchCV(pipe, param_grid,
cv=5,
scoring='neg_root_mean_squared_error',
n_jobs=-1, verbose=1)
search.fit(X_train, y_train)
print("Best α:", search.best_params_['enet__alpha'])
print("Best l1_ratio:", search.best_params_['enet__l1_ratio'])
7. Evaluate on the hold‑out set
y_pred = search.predict(X_test)
rmse = mean_squared_error(y_test, y_pred, squared=False)
r2 = r2_score(y_test, y_pred)
print(f"Hold‑out RMSE: ${rmse:,.2f} /MWh | R²: {r2:.3f}")
8. Interpret coefficients
The bar chart reveals physical drivers of cost: high head (↓ cost), poor efficiency (↑ cost), or off‑season months where water rents change.
scales = search.best_estimator_.named_steps['prep'].named_transformers_['num'].scale_
coef = search.best_estimator_.named_steps['enet'].coef_ / scales
imp = pd.Series(coef, index=num_cols).sort_values(key=abs, ascending=False)
plt.figure(figsize=(8,5))
imp.plot(kind='barh'); plt.gca().invert_yaxis()
plt.title('Elastic Net Coefficients – Cost Drivers')
plt.xlabel('Δ Cost ($/MWh) per unit change'); plt.show()
Summary
This notebook demonstrates how an Elastic Net mixed regression model can:
- Forecast variable generation cost for hydro power one hour in advance, enabling economically optimal dispatch.
- Handle multicollinearity among hydro‑physics inputs while automatically pruning noise.
- Provide interpretable coefficients that pinpoint which operational levers (head, flow, efficiency) drive costs the most.
The entire pipeline retrains with a single .fit() when new SCADA data arrives—keeping the model fresh and decision‑ready.