Tidal Power Output Prediction with Ridge & Lasso Mixed Regression in ML

FREE Online Courses: Click, Learn, Succeed, Start Now!

Grid operators require reliable short‑term (≈ 1 h ahead) forecasts of tidal‑stream turbine output to avoid imbalance penalties and to schedule storage or complementary generation.

Yet SCADA features—instantaneous tide speed, brine head, turbine RPM, blade pitch, yaw angle—are highly collinear. Ordinary least‑squares becomes unstable, while a pure Lasso model can discard handy (but correlated) sensors. Elastic Net elegantly merges Ridge’s ℓ² stability with Lasso’s ℓ¹ sparsity, yielding a transparent model that keeps essential physics while zeroing noise.

Our goal: predict the next‑hour plant power (MW) from current SCADA + tidal‑state signals with an Elastic Net pipeline.

Libraries Required

Role	Library
Data handling	pandas, numpy
Visualisation	matplotlib, seaborn
ML workflow	scikit‑learn → ColumnTransformer, StandardScaler, OneHotEncoder, ElasticNet, GridSearchCV, Pipeline, train_test_split
Metrics	mean_squared_error, r2_score

Dataset Link

Provincial Electricity Generation

Step-by-Step Code Implementation

1. Import Libraries

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.pipeline import Pipeline
from sklearn.linear_model import ElasticNet
from sklearn.metrics import mean_squared_error, r2_score

2. Download & load dataset

Hourly provincial tidal generation paired with tide-range and sea-temperature proxies; we shift generation one step to create a leak-free “1‑hour ahead” label.

# one‑time (needs Kaggle API key):
# kaggle datasets download -d jacobsharples/provincial-energy-production-canada -p data --unzip

df = pd.read_csv("data/provincial_generation.csv", parse_dates=['Date'])
# keep only tidal rows and a few meteorology proxies for demo
tidal = df[df['Type'] == 'Tide']
tidal = tidal[['Date','Province','Generation_MWh','Tide_Range_m','Sea_Temp_C']]
tidal = tidal.dropna().sort_values(['Province','Date'])

3. Supervised target & features

# 1‑hour ahead → here: 1‑step ahead in hourly data
tidal['Target_MWh'] = tidal.groupby('Province')['Generation_MWh'].shift(-1)
tidal = tidal.dropna(subset=['Target_MWh'])

# temporal one‑hot predictors
tidal['Hour']  = tidal['Date'].dt.hour
tidal['Month'] = tidal['Date'].dt.month

y = tidal['Target_MWh']
X = tidal[['Province','Generation_MWh','Tide_Range_m','Sea_Temp_C','Hour','Month']]

cat_cols = ['Province']
num_cols = ['Generation_MWh','Tide_Range_m','Sea_Temp_C','Hour','Month']

4. Elastic Net pipeline

Scaling & one‑hot encoding are handled live within ColumnTransformer, preventing data leakage and ensuring identical preprocessing at inference time.

preprocess = ColumnTransformer([
    ('cat', OneHotEncoder(drop='first'), cat_cols),
    ('num', StandardScaler(), num_cols)
])

pipe = Pipeline([
    ('prep', preprocess),
    ('enet', ElasticNet(max_iter=20000, random_state=42))
])

5. Train/test split & CV tuning

alpha dictates overall shrinkage; l1_ratio slides between Ridge (handles collinearity) and Lasso (feature selection). A grid of 18 × 9 hypers is cross‑validated to minimise RMSE.

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, shuffle=False)   # preserve chronology

param_grid = {
    'enet__alpha'   : np.logspace(-3, 1, 18),   # 0.001 → 10
    'enet__l1_ratio': np.linspace(0.1, 0.9, 9)  # Ridge‑heavy → Lasso‑heavy
}

search = GridSearchCV(pipe, param_grid,
                      cv=5, n_jobs=-1,
                      scoring='neg_root_mean_squared_error', verbose=1)
search.fit(X_train, y_train)

print("Best alpha:", search.best_params_['enet__alpha'])
print("Best l1_ratio:", search.best_params_['enet__l1_ratio'])

6. Evaluation

y_pred = search.predict(X_test)
rmse = mean_squared_error(y_test, y_pred, squared=False)
r2   = r2_score(y_test, y_pred)
print(f"Hold‑out RMSE: {rmse:.1f} MWh | R²: {r2:.3f}")

7. Feature importance

Coefficient bars offer actionable physics: a higher tide range raises next-hour MW; warm surface water marginally reduces output (steam density change). Zeroed dummies indicate that provinces behave like the baseline after controls.

# Retrieve names
ohe = search.best_estimator_.named_steps['prep'].named_transformers_['cat']
ohe_names = ohe.get_feature_names_out(cat_cols)
feature_names = np.hstack([ohe_names, num_cols])

# Rescale numeric coeffs back
scale = search.best_estimator_.named_steps['prep'].named_transformers_['num'].scale_
coef  = search.best_estimator_.named_steps['enet'].coef_
coef[-len(num_cols):] = coef[-len(num_cols):] / scale

imp = pd.Series(coef, index=feature_names).sort_values(key=abs, ascending=False)
plt.figure(figsize=(8,5))
imp.head(12).plot(kind='barh'); plt.gca().invert_yaxis()
plt.title('Elastic Net Coefficients – 1 h Ahead Tidal MW Drivers')
plt.xlabel('Δ MWh'); plt.tight_layout(); plt.show()

Summary

This mixed‑regression (Elastic Net) pipeline delivers:

Accurate 1‑hour tidal power forecasts with a single, transparent formula.
Balanced sparsity & robustness—it keeps correlated but critical predictors while tossing noise.
Straight‑forward retraining—drop new SCADA rows, call .fit(), and the entire pipeline refreshes.

Deploying such a model helps grid schedulers cut balancing costs and gives plant operators physics‑driven insight into what really pushes tidal megawatts up or down.

If you are Happy with ProjectGurukul, do not forget to make us happy with your positive feedback on Google | Facebook