Biomass Energy Output Prediction with Ridge & Lasso Mixed Regression in ML

We offer you a brighter future with FREE online courses - Start Now!!

Grid operators in countries that co‑fire biomass with other renewables still need an hour‑ahead forecast of how much power (in MW) their biomass units will inject. Accurate short‑term predictions improve dispatch plans, slash balancing costs, and cut unnecessary start‑ups. Yet classic linear models struggle:

Multicollinearity – temperature, dew‑point, humidity, and hour‑of‑day all co‑vary with demand and with each other.
Over‑fitting vs. under‑fitting – pure Ridge keeps every noisy signal, pure Lasso may over‑shrink and drop genuinely helpful sensors.

An Elastic Net (mixed Ridge + Lasso) regression balances these extremes, selecting only the stable weather‑time features that truly drive biomass output while damping collinear noise.

Libraries Required

Data handling	pandas, numpy
Visuals	matplotlib, seaborn
ML workflow	scikit‑learn → ColumnTransformer, OneHotEncoder, StandardScaler, ElasticNet, GridSearchCV, Pipeline, train_test_split
Evaluation	mean_squared_error, r2_score

Dataset Link

Energy Consumption, Generation, Prices & Weather

Step-by-Step Code Implementation

1. Import Libraries

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder, StandardScaler
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.pipeline import Pipeline
from sklearn.linear_model import ElasticNet
from sklearn.metrics import mean_squared_error, r2_score

2. Download and load the dataset

Four years of Spanish hourly electricity data: generation split by fuel (biomass, wind, solar…), prices, demand, plus matching weather from AEMET.

# one‑time only (needs Kaggle token):
# kaggle datasets download -d nicholasjhana/energy-consumption-generation-prices-and-weather -p data --unzip

df = pd.read_csv("data/energy_dataset.csv", parse_dates=['time'])

3. Feature & target engineering

Predicting biomass MW 1 hour into the future helps dispatchers adapt set points in real time—a simple shift(-1) forms leak‑free labels.

# focus on Spanish data in MW for simplicity
# 'biomass_generation' is already in dataset (MW)
# Weather columns: 'temperature', 'dewpoint', etc. (°C); will be averaged by hour.

keep_cols = ['time', 'biomass_generation', 'temperature', 'dewpoint',
             'wind_speed', 'wind_direction', 'cloud_cover']
data = df[keep_cols].dropna()

# target: one‑hour‑ahead biomass output
data = data.sort_values('time')
data['target_MW'] = data['biomass_generation'].shift(-1)
data = data.dropna(subset=['target_MW'])

# temporal signals
data['hour']   = data['time'].dt.hour
data['month']  = data['time'].dt.month
data['dow']    = data['time'].dt.dayofweek

4. Define features & target

Current biomass output (auto-regression), surface weather, and categorical time parts (hour, month, day-of-week) capture diurnal/seasonal cycles.

y = data['target_MW']
X = data.drop(columns=['time', 'target_MW'])

cat_cols = ['hour', 'month', 'dow']          # treat time parts as categorical dummies
num_cols = ['biomass_generation', 'temperature', 'dewpoint',
            'wind_speed', 'wind_direction', 'cloud_cover']

5. Elastic Net pipeline

One-hot + scaling wrapped inside the Pipeline eliminates leakage across CV folds.

preprocess = ColumnTransformer([
        ('cat', OneHotEncoder(drop='first'), cat_cols),
        ('num', StandardScaler(), num_cols)
])

pipe = Pipeline([
        ('prep', preprocess),
        ('enet', ElasticNet(max_iter=20000, random_state=42))
])

6. Train/test split & hyper‑parameter grid‑search

α (tuning across 0.001→10) sets overall penalty strength;
l1_ratio (0.1→0.9) slides from Ridge stability to Lasso sparsity.
Five‑fold CV picks the combo with the lowest RMSE while guarding against collinear meteorology.

X_train, X_test, y_train, y_test = train_test_split(
        X, y, test_size=0.2, shuffle=False)   # preserve chronology

param_grid = {
    'enet__alpha': np.logspace(-3, 1, 18),      # 0.001 → 10
    'enet__l1_ratio': np.linspace(0.1, 0.9, 9)  # blend: Ridge‑heavy → Lasso‑heavy
}

search = GridSearchCV(pipe, param_grid,
                      cv=5,
                      scoring='neg_root_mean_squared_error',
                      n_jobs=-1, verbose=1)
search.fit(X_train, y_train)

print("Best α:", search.best_params_['enet__alpha'])
print("Best l1_ratio:", search.best_params_['enet__l1_ratio'])

7. Evaluate on the hold‑out set

y_pred = search.predict(X_test)
rmse = mean_squared_error(y_test, y_pred, squared=False)
r2   = r2_score(y_test, y_pred)

print(f"Hold‑out RMSE: {rmse:.2f} MW | R²: {r2:.3f}")

8. Interpret coefficients

The coefficient chart shows which signals shift output the most: e.g., rising wind speed (often co‑burned for cooling fans) may increase or depress biomass generation, while certain hours (nighttime maintenance) drop production. Zeroed coefficients mark negligible drivers.

# recover full feature names
ohe_names = search.best_estimator_.named_steps['prep'] \
              .named_transformers_['cat'].get_feature_names_out(cat_cols)
feature_names = np.hstack([ohe_names, num_cols])

# rescale numeric coefficients back to original units
scales = search.best_estimator_.named_steps['prep'] \
           .named_transformers_['num'].scale_
coeffs = search.best_estimator_.named_steps['enet'].coef_
coeffs[-len(num_cols):] = coeffs[-len(num_cols):] / scales

imp = pd.Series(coeffs, index=feature_names).sort_values(key=abs, ascending=False)

plt.figure(figsize=(9,5))
imp.head(15).plot(kind='barh')
plt.gca().invert_yaxis()
plt.title('Elastic Net Coefficients – Biomass MW Drivers')
plt.xlabel('Δ MW one hour ahead'); plt.tight_layout(); plt.show()

Summary

This mixed‑regression (Elastic Net) workflow turns raw SCADA + weather CSVs into:

A one‑hour biomass MW forecast is accurate within a few megawatts RMSE.
A concise, interpretable list of key drivers, balancing sparsity and stability without collinear freak‑outs.
A one‑command retrain path (search.fit) so operators can refresh the model daily as new data streams in.

Deploying the resulting model helps scheduling teams slash balancing penalties, optimise co‑firing strategy, and plan maintenance around low‑output windows—unlocking more predictable, profitable biomass generation.

Did you know we work 24x7 to provide you best tutorials
Please encourage us - write a review on Google | Facebook

Biomass Energy Output Prediction with Ridge & Lasso Mixed Regression in ML

Libraries Required

Dataset Link

Step-by-Step Code Implementation

1. Import Libraries

2. Download and load the dataset

3. Feature & target engineering

4. Define features & target

5. Elastic Net pipeline

6. Train/test split & hyper‑parameter grid‑search

7. Evaluate on the hold‑out set

8. Interpret coefficients