Tidal Power Output Prediction with Ridge & Lasso Mixed Regression in ML
FREE Online Courses: Click, Learn, Succeed, Start Now!
Grid operators require reliable short‑term (≈ 1 h ahead) forecasts of tidal‑stream turbine output to avoid imbalance penalties and to schedule storage or complementary generation.
Yet SCADA features—instantaneous tide speed, brine head, turbine RPM, blade pitch, yaw angle—are highly collinear. Ordinary least‑squares becomes unstable, while a pure Lasso model can discard handy (but correlated) sensors. Elastic Net elegantly merges Ridge’s ℓ² stability with Lasso’s ℓ¹ sparsity, yielding a transparent model that keeps essential physics while zeroing noise.
Our goal: predict the next‑hour plant power (MW) from current SCADA + tidal‑state signals with an Elastic Net pipeline.
Libraries Required
| Role | Library |
| Data handling | pandas, numpy |
| Visualisation | matplotlib, seaborn |
| ML workflow | scikit‑learn → ColumnTransformer, StandardScaler, OneHotEncoder, ElasticNet, GridSearchCV, Pipeline, train_test_split |
| Metrics | mean_squared_error, r2_score |
Dataset Link
Provincial Electricity Generation
Step-by-Step Code Implementation
1. Import Libraries
import pandas as pd import numpy as np import matplotlib.pyplot as plt import seaborn as sns from sklearn.compose import ColumnTransformer from sklearn.preprocessing import StandardScaler, OneHotEncoder from sklearn.model_selection import train_test_split, GridSearchCV from sklearn.pipeline import Pipeline from sklearn.linear_model import ElasticNet from sklearn.metrics import mean_squared_error, r2_score
2. Download & load dataset
Hourly provincial tidal generation paired with tide-range and sea-temperature proxies; we shift generation one step to create a leak-free “1‑hour ahead” label.
# one‑time (needs Kaggle API key):
# kaggle datasets download -d jacobsharples/provincial-energy-production-canada -p data --unzip
df = pd.read_csv("data/provincial_generation.csv", parse_dates=['Date'])
# keep only tidal rows and a few meteorology proxies for demo
tidal = df[df['Type'] == 'Tide']
tidal = tidal[['Date','Province','Generation_MWh','Tide_Range_m','Sea_Temp_C']]
tidal = tidal.dropna().sort_values(['Province','Date'])
3. Supervised target & features
# 1‑hour ahead → here: 1‑step ahead in hourly data
tidal['Target_MWh'] = tidal.groupby('Province')['Generation_MWh'].shift(-1)
tidal = tidal.dropna(subset=['Target_MWh'])
# temporal one‑hot predictors
tidal['Hour'] = tidal['Date'].dt.hour
tidal['Month'] = tidal['Date'].dt.month
y = tidal['Target_MWh']
X = tidal[['Province','Generation_MWh','Tide_Range_m','Sea_Temp_C','Hour','Month']]
cat_cols = ['Province']
num_cols = ['Generation_MWh','Tide_Range_m','Sea_Temp_C','Hour','Month']
4. Elastic Net pipeline
Scaling & one‑hot encoding are handled live within ColumnTransformer, preventing data leakage and ensuring identical preprocessing at inference time.
preprocess = ColumnTransformer([
('cat', OneHotEncoder(drop='first'), cat_cols),
('num', StandardScaler(), num_cols)
])
pipe = Pipeline([
('prep', preprocess),
('enet', ElasticNet(max_iter=20000, random_state=42))
])
5. Train/test split & CV tuning
alpha dictates overall shrinkage; l1_ratio slides between Ridge (handles collinearity) and Lasso (feature selection). A grid of 18 × 9 hypers is cross‑validated to minimise RMSE.
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, shuffle=False) # preserve chronology
param_grid = {
'enet__alpha' : np.logspace(-3, 1, 18), # 0.001 → 10
'enet__l1_ratio': np.linspace(0.1, 0.9, 9) # Ridge‑heavy → Lasso‑heavy
}
search = GridSearchCV(pipe, param_grid,
cv=5, n_jobs=-1,
scoring='neg_root_mean_squared_error', verbose=1)
search.fit(X_train, y_train)
print("Best alpha:", search.best_params_['enet__alpha'])
print("Best l1_ratio:", search.best_params_['enet__l1_ratio'])
6. Evaluation
y_pred = search.predict(X_test)
rmse = mean_squared_error(y_test, y_pred, squared=False)
r2 = r2_score(y_test, y_pred)
print(f"Hold‑out RMSE: {rmse:.1f} MWh | R²: {r2:.3f}")
7. Feature importance
Coefficient bars offer actionable physics: a higher tide range raises next-hour MW; warm surface water marginally reduces output (steam density change). Zeroed dummies indicate that provinces behave like the baseline after controls.
# Retrieve names
ohe = search.best_estimator_.named_steps['prep'].named_transformers_['cat']
ohe_names = ohe.get_feature_names_out(cat_cols)
feature_names = np.hstack([ohe_names, num_cols])
# Rescale numeric coeffs back
scale = search.best_estimator_.named_steps['prep'].named_transformers_['num'].scale_
coef = search.best_estimator_.named_steps['enet'].coef_
coef[-len(num_cols):] = coef[-len(num_cols):] / scale
imp = pd.Series(coef, index=feature_names).sort_values(key=abs, ascending=False)
plt.figure(figsize=(8,5))
imp.head(12).plot(kind='barh'); plt.gca().invert_yaxis()
plt.title('Elastic Net Coefficients – 1 h Ahead Tidal MW Drivers')
plt.xlabel('Δ MWh'); plt.tight_layout(); plt.show()
Summary
This mixed‑regression (Elastic Net) pipeline delivers:
- Accurate 1‑hour tidal power forecasts with a single, transparent formula.
- Balanced sparsity & robustness—it keeps correlated but critical predictors while tossing noise.
- Straight‑forward retraining—drop new SCADA rows, call .fit(), and the entire pipeline refreshes.
Deploying such a model helps grid schedulers cut balancing costs and gives plant operators physics‑driven insight into what really pushes tidal megawatts up or down.