Wind Farm Output Prediction with Ridge & Lasso Mixed Regression in ML
FREE Online Courses: Your Passport to Excellence - Start Now
Wind‐farm operators need to forecast the power output of individual turbines (or an entire farm) up to the next 24 hours—before commitment to the grid—using early‐hour inputs such as wind speed, wind direction, ambient temperature, and turbine operating parameters. Power curves exhibit nonlinear behaviour (e.g., cut‐in/cut‐out thresholds, rated‐power plateau), and different covariates may dominate under various conditions (e.g., temperature effects at high speeds). A pure Lasso model may over‐shrink significant nonlinear effects, while a pure Ridge model may fail to zero out irrelevant sensor noise. By applying ElasticNet (mixed ℓ₁+ℓ₂) regression, we gain both sparsity (variable selection) and stability (coefficient shrinkage), yielding robust, interpretable forecasts of wind‐farm output.
Dataset Link
Step-by-Step Code Implementation
1. Import Libraries
import pandas as pd # data I/O import numpy as np # numerics import matplotlib.pyplot as plt # plotting import seaborn as sns # visualization from sklearn.model_selection import train_test_split, GridSearchCV from sklearn.preprocessing import StandardScaler from sklearn.linear_model import ElasticNet from sklearn.pipeline import Pipeline from sklearn.metrics import mean_squared_error, r2_score
2. Load Data & Preprocessing
import pandas as pd
# Load SCADA data
df = pd.read_csv("data/wind-turbine-scada-dataset/Wind_Turbine_01.csv")
# Select relevant features and target
features = ["Wind Speed (m/s)", "Wind Direction (°)", "Ambient Temperature (°C)"]
X = df[features]
y = df["Target Power (kW)"]
# Drop rows with missing values
df = df.dropna(subset=features + ["Target Power (kW)"])
3. Train/Test Split & Scaling
- We use a chronological split (first 80% as training) to avoid leakage from future records.
- StandardScaler: Zero‑means and unit‑scales all predictors so the ElasticNet penalty treats them uniformly.
from sklearn.model_selection import train_test_split from sklearn.preprocessing import StandardScaler # Chronological split to avoid look‑ahead (80% train, 20% test) split_idx = int(len(df) * 0.8) X_train, X_test = X.iloc[:split_idx], X.iloc[split_idx:] y_train, y_test = y.iloc[:split_idx], y.iloc[split_idx:] # Standardize features scaler = StandardScaler().fit(X_train) X_train_s = scaler.transform(X_train) X_test_s = scaler.transform(X_test)
4. Build ElasticNet Pipeline & Hyperparameter Search
- ElasticNet:
- α (overall penalty) controls total regularisation strength, balancing bias‐variance.
- l1_ratio (0→1) blends Lasso (ℓ₁) for sparsity and Ridge (ℓ₂) for coefficient shrinkage.
- GridSearchCV: tunes α ∈ [0.01, 10] and l1_ratio ∈ [0,1] over 5‑fold CV, optimising RMSE on held‑out folds.
from sklearn.pipeline import Pipeline
from sklearn.linear_model import ElasticNet
from sklearn.model_selection import GridSearchCV
import numpy as np
pipe = Pipeline([
("scale", StandardScaler()),
("enet", ElasticNet(max_iter=5000, random_state=42))
])
param_grid = {
"enet__alpha": np.logspace(-2, 1, 10), # penalty strength
"enet__l1_ratio": np.linspace(0, 1, 6) # mix between L1 and L2
}
gs = GridSearchCV(
pipe, param_grid,
cv=5,
scoring="neg_root_mean_squared_error",
n_jobs=-1, verbose=1
)
gs.fit(X, y)
print("Best params:", gs.best_params_)
5. Evaluate on Test Set
RMSE quantifies average prediction error in kW; R² indicates variance explained.
from sklearn.metrics import mean_squared_error, r2_score
y_pred = gs.predict(X_test)
rmse = mean_squared_error(y_test, y_pred, squared=False)
r2 = r2_score(y_test, y_pred)
print(f"Test RMSE: {rmse:.2f} kW")
print(f"Test R² : {r2:.3f}")
6. Inspect Model Coefficients
Coefficient Inspection: sparse coefficients (near zero) indicate less‑informative features, while larger values highlight key drivers of turbine output.
import pandas as pd
import matplotlib.pyplot as plt
coef = gs.best_estimator_.named_steps["enet"].coef_
imp = pd.Series(coef, index=features).sort_values()
plt.figure(figsize=(6,4))
imp.plot(kind="barh")
plt.title("ElasticNet Coefficients")
plt.xlabel("Coefficient Value")
plt.tight_layout()
plt.show()
Summary
By applying ElasticNet—a mixed ℓ₁+ℓ₂ regression approach—to wind‐turbine SCADA data, operators gain:
- Accurate, stable forecasts of short‐term power output (low RMSE, high R²).
- Automatic feature selection, zeroing out noisy sensors while preserving key nonlinear effects.
- Interpretability, with coefficient magnitudes revealing which environmental factors (e.g., wind speed vs. temperature) most influence output—guiding maintenance and operational strategies.