Wind Energy Quantile Prediction using Quantile Regression in ML

FREE Online Courses: Your Passport to Excellence - Start Now

Grid operators and renewable energy planners require not only an average forecast of wind farm output but also an understanding of the range of possible generation levels—anticipating both low‐output periods (10th percentile) and peak production events (90th percentile).

In this project, we will predict the 10th, 50th, and 90th percentiles of hourly wind power output (MW) at a set of German transmission system operators using historical SCADA data (wind speed, wind direction, temperature, and prior output). By fitting separate quantile regression models, we’ll uncover how each meteorological and operational factor drives low‐, median‐, and high‐output scenarios—enabling robust grid balancing and more resilient integration of wind energy.

Libraries Required

import pandas as pd  
import numpy as np  
import statsmodels.formula.api as smf           # Quantile regression via formula API  
from sklearn.model_selection import train_test_split  
from sklearn.metrics import mean_pinball_loss    # Proper loss for quantile forecasts  
import matplotlib.pyplot as plt                  # Visualization of residuals

Dataset

Wind Power Generation Data

Step-by-Step Code Implementation

Load & Inspect Data

We ingest SCADA‐style records—hourly power output (Power in MW), wind speed (WindSpeed m/s), wind direction (WindDirection degrees), and air temperature (AirTemp °C)—from four major German TSOs (Kaggle). Initial .info() and .describe() calls verify data types, ranges, and any missingness.

# Load the “Wind Power Generation” SCADA dataset for four German TSOs :contentReference[oaicite:1]{index=1}
df = pd.read_csv("wind-power-generation.csv")

# Quick inspection
print(df.head())
print(df.info())
print(df[['Power']].describe())

Preprocessing & Feature Engineering

We drop incomplete records for model integrity.
We add a lagged output feature (Power_lag1) to capture inertia in generation dynamics.
We define features as our predictor set and rename the target column to Output for clarity in formulas.

# Drop rows with missing core variables
df = df.dropna(subset=['Power','WindSpeed','WindDirection','AirTemp'])

# (Optional) Create a lag feature: prior hour’s power output
df['Power_lag1'] = df['Power'].shift(1).fillna(method='bfill')

# Define predictors and target
features = ['WindSpeed','WindDirection','AirTemp','Power_lag1']
df_model = df[features + ['Power']].copy()
df_model.rename(columns={'Power':'Output'}, inplace=True)

Train/Test Split

We randomly hold out 20% of the data for evaluation, ensuring our quantile models generalize to unseen weather and operational conditions.

# Reserve 20% of the data for out‑of‑sample evaluation
train, test = train_test_split(df_model, test_size=0.2, random_state=42)

Fit Quantile Regression Models

For each quantile (10th, 50th, 90th percentiles):

We build a statsmodels formula (“Output ~ WindSpeed + WindDirection + AirTemp + Power_lag1”).
We fit a QuantReg model on the training set at that percentile.
We print only the coefficient table—showing how each predictor’s effect on generation shifts across low, median, and high output levels (e.g., wind speed may have a stronger marginal effect on the 90th percentile).

quantiles = [0.10, 0.50, 0.90]
results   = {}
formula   = "Output ~ " + " + ".join(features)

for q in quantiles:
    mod = smf.quantreg(formula, train)
    res = mod.fit(q=q)
    results[q] = res
    print(f"\n--- {int(q*100)}th Percentile Coefficients ---")
    print(res.summary().tables[1])   # coefficients only

Evaluation with Pinball Loss

We produce quantile‐specific output forecasts on the test set.
We compute pinball loss for each quantile—a proper scoring rule for quantile estimates—quantifying the asymmetric penalty for under‑ versus over‑prediction at each percentile. Lower pinball loss means a better‐calibrated quantile model.

for q, res in results.items():
    preds = res.predict(test[features])
    loss  = mean_pinball_loss(test['Output'], preds, alpha=q)
    print(f"{int(q*100)}th quantile pinball loss: {loss:.2f}")

Residual Diagnostics (Example for Median)

# Plot residuals for the 50th‑percentile model
median_res = results[0.50]
preds_med  = median_res.predict(test[features])
resid_med  = test['Output'] - preds_med

plt.scatter(preds_med, resid_med, alpha=0.3)
plt.axhline(0, linestyle='--')
plt.xlabel("Predicted Median Output (MW)")
plt.ylabel("Residuals")
plt.title("Residuals vs. Predicted (50th Percentile)")
plt.show()

Summary

By employing quantile regression on wind SCADA data, we obtain distribution‑aware forecasts of wind farm output:

The 10th‑percentile model prepares operators for low‐wind scenarios, guiding reserve capacity planning.
The median (50th‑percentile) model provides typical output expectations for routine balancing.
The 90th‑percentile model anticipates peak generation events, informing grid‐injection strategies and curtailment decisions.

These tailored quantile forecasts empower grid operators and renewable planners with robust tools to manage variability, optimize storage dispatch, and improve the reliability of integrating wind energy into the power system.

We work very hard to provide you quality material
Could you take 15 seconds and share your happy experience on Google | Facebook