Wind Farm Output Prediction with Ridge & Lasso Mixed Regression in ML

FREE Online Courses: Your Passport to Excellence - Start Now

Wind‐farm operators need to forecast the power output of individual turbines (or an entire farm) up to the next 24 hours—before commitment to the grid—using early‐hour inputs such as wind speed, wind direction, ambient temperature, and turbine operating parameters. Power curves exhibit nonlinear behaviour (e.g., cut‐in/cut‐out thresholds, rated‐power plateau), and different covariates may dominate under various conditions (e.g., temperature effects at high speeds). A pure Lasso model may over‐shrink significant nonlinear effects, while a pure Ridge model may fail to zero out irrelevant sensor noise. By applying ElasticNet (mixed ℓ₁+ℓ₂) regression, we gain both sparsity (variable selection) and stability (coefficient shrinkage), yielding robust, interpretable forecasts of wind‐farm output.

Dataset Link

Wind Turbine SCADA

Step-by-Step Code Implementation

1. Import Libraries

import pandas as pd                              # data I/O  
import numpy as np                               # numerics  

import matplotlib.pyplot as plt                  # plotting  
import seaborn as sns                            # visualization  

from sklearn.model_selection import train_test_split, GridSearchCV  
from sklearn.preprocessing import StandardScaler  
from sklearn.linear_model import ElasticNet  
from sklearn.pipeline import Pipeline  
from sklearn.metrics import mean_squared_error, r2_score

2. Load Data & Preprocessing

import pandas as pd

# Load SCADA data
df = pd.read_csv("data/wind-turbine-scada-dataset/Wind_Turbine_01.csv")

# Select relevant features and target
features = ["Wind Speed (m/s)", "Wind Direction (°)", "Ambient Temperature (°C)"]
X = df[features]
y = df["Target Power (kW)"]

# Drop rows with missing values
df = df.dropna(subset=features + ["Target Power (kW)"])

3. Train/Test Split & Scaling

We use a chronological split (first 80% as training) to avoid leakage from future records.
StandardScaler: Zero‑means and unit‑scales all predictors so the ElasticNet penalty treats them uniformly.

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

# Chronological split to avoid look‑ahead (80% train, 20% test)
split_idx = int(len(df) * 0.8)
X_train, X_test = X.iloc[:split_idx], X.iloc[split_idx:]
y_train, y_test = y.iloc[:split_idx], y.iloc[split_idx:]

# Standardize features
scaler = StandardScaler().fit(X_train)
X_train_s = scaler.transform(X_train)
X_test_s  = scaler.transform(X_test)

4. Build ElasticNet Pipeline & Hyperparameter Search

ElasticNet:
- α (overall penalty) controls total regularisation strength, balancing bias‐variance.
- l1_ratio (0→1) blends Lasso (ℓ₁) for sparsity and Ridge (ℓ₂) for coefficient shrinkage.
GridSearchCV: tunes α ∈ [0.01, 10] and l1_ratio ∈ [0,1] over 5‑fold CV, optimising RMSE on held‑out folds.

from sklearn.pipeline import Pipeline
from sklearn.linear_model import ElasticNet
from sklearn.model_selection import GridSearchCV
import numpy as np

pipe = Pipeline([
    ("scale", StandardScaler()),
    ("enet", ElasticNet(max_iter=5000, random_state=42))
])

param_grid = {
    "enet__alpha": np.logspace(-2, 1, 10),    # penalty strength
    "enet__l1_ratio": np.linspace(0, 1, 6)    # mix between L1 and L2
}

gs = GridSearchCV(
    pipe, param_grid,
    cv=5,
    scoring="neg_root_mean_squared_error",
    n_jobs=-1, verbose=1
)
gs.fit(X, y)
print("Best params:", gs.best_params_)

5. Evaluate on Test Set

RMSE quantifies average prediction error in kW; R² indicates variance explained.

from sklearn.metrics import mean_squared_error, r2_score

y_pred = gs.predict(X_test)
rmse = mean_squared_error(y_test, y_pred, squared=False)
r2   = r2_score(y_test, y_pred)

print(f"Test RMSE: {rmse:.2f} kW")
print(f"Test R²  : {r2:.3f}")

6. Inspect Model Coefficients

Coefficient Inspection: sparse coefficients (near zero) indicate less‑informative features, while larger values highlight key drivers of turbine output.

import pandas as pd
import matplotlib.pyplot as plt

coef = gs.best_estimator_.named_steps["enet"].coef_
imp = pd.Series(coef, index=features).sort_values()

plt.figure(figsize=(6,4))
imp.plot(kind="barh")
plt.title("ElasticNet Coefficients")
plt.xlabel("Coefficient Value")
plt.tight_layout()
plt.show()

Summary

By applying ElasticNet—a mixed ℓ₁+ℓ₂ regression approach—to wind‐turbine SCADA data, operators gain:

Accurate, stable forecasts of short‐term power output (low RMSE, high R²).
Automatic feature selection, zeroing out noisy sensors while preserving key nonlinear effects.
Interpretability, with coefficient magnitudes revealing which environmental factors (e.g., wind speed vs. temperature) most influence output—guiding maintenance and operational strategies.

Did we exceed your expectations?
If Yes, share your valuable feedback on Google | Facebook

Wind Farm Output Prediction with Ridge & Lasso Mixed Regression in ML

Dataset Link

Step-by-Step Code Implementation

1. Import Libraries

2. Load Data & Preprocessing

3. Train/Test Split & Scaling

4. Build ElasticNet Pipeline & Hyperparameter Search

5. Evaluate on Test Set

6. Inspect Model Coefficients