Biomass Energy Output Prediction with Ridge & Lasso Mixed Regression in ML
We offer you a brighter future with FREE online courses - Start Now!!
Grid operators in countries that co‑fire biomass with other renewables still need an hour‑ahead forecast of how much power (in MW) their biomass units will inject. Accurate short‑term predictions improve dispatch plans, slash balancing costs, and cut unnecessary start‑ups. Yet classic linear models struggle:
- Multicollinearity – temperature, dew‑point, humidity, and hour‑of‑day all co‑vary with demand and with each other.
- Over‑fitting vs. under‑fitting – pure Ridge keeps every noisy signal, pure Lasso may over‑shrink and drop genuinely helpful sensors.
An Elastic Net (mixed Ridge + Lasso) regression balances these extremes, selecting only the stable weather‑time features that truly drive biomass output while damping collinear noise.
Libraries Required
| Data handling | pandas, numpy |
| Visuals | matplotlib, seaborn |
| ML workflow | scikit‑learn → ColumnTransformer, OneHotEncoder, StandardScaler, ElasticNet, GridSearchCV, Pipeline, train_test_split |
| Evaluation | mean_squared_error, r2_score |
Dataset Link
Energy Consumption, Generation, Prices & Weather
Step-by-Step Code Implementation
1. Import Libraries
import pandas as pd import numpy as np import matplotlib.pyplot as plt import seaborn as sns from sklearn.compose import ColumnTransformer from sklearn.preprocessing import OneHotEncoder, StandardScaler from sklearn.model_selection import train_test_split, GridSearchCV from sklearn.pipeline import Pipeline from sklearn.linear_model import ElasticNet from sklearn.metrics import mean_squared_error, r2_score
2. Download and load the dataset
Four years of Spanish hourly electricity data: generation split by fuel (biomass, wind, solar…), prices, demand, plus matching weather from AEMET.
# one‑time only (needs Kaggle token):
# kaggle datasets download -d nicholasjhana/energy-consumption-generation-prices-and-weather -p data --unzip
df = pd.read_csv("data/energy_dataset.csv", parse_dates=['time'])
3. Feature & target engineering
Predicting biomass MW 1 hour into the future helps dispatchers adapt set points in real time—a simple shift(-1) forms leak‑free labels.
# focus on Spanish data in MW for simplicity
# 'biomass_generation' is already in dataset (MW)
# Weather columns: 'temperature', 'dewpoint', etc. (°C); will be averaged by hour.
keep_cols = ['time', 'biomass_generation', 'temperature', 'dewpoint',
'wind_speed', 'wind_direction', 'cloud_cover']
data = df[keep_cols].dropna()
# target: one‑hour‑ahead biomass output
data = data.sort_values('time')
data['target_MW'] = data['biomass_generation'].shift(-1)
data = data.dropna(subset=['target_MW'])
# temporal signals
data['hour'] = data['time'].dt.hour
data['month'] = data['time'].dt.month
data['dow'] = data['time'].dt.dayofweek
4. Define features & target
Current biomass output (auto-regression), surface weather, and categorical time parts (hour, month, day-of-week) capture diurnal/seasonal cycles.
y = data['target_MW']
X = data.drop(columns=['time', 'target_MW'])
cat_cols = ['hour', 'month', 'dow'] # treat time parts as categorical dummies
num_cols = ['biomass_generation', 'temperature', 'dewpoint',
'wind_speed', 'wind_direction', 'cloud_cover']
5. Elastic Net pipeline
One-hot + scaling wrapped inside the Pipeline eliminates leakage across CV folds.
preprocess = ColumnTransformer([
('cat', OneHotEncoder(drop='first'), cat_cols),
('num', StandardScaler(), num_cols)
])
pipe = Pipeline([
('prep', preprocess),
('enet', ElasticNet(max_iter=20000, random_state=42))
])
6. Train/test split & hyper‑parameter grid‑search
- α (tuning across 0.001→10) sets overall penalty strength;
- l1_ratio (0.1→0.9) slides from Ridge stability to Lasso sparsity.
- Five‑fold CV picks the combo with the lowest RMSE while guarding against collinear meteorology.
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, shuffle=False) # preserve chronology
param_grid = {
'enet__alpha': np.logspace(-3, 1, 18), # 0.001 → 10
'enet__l1_ratio': np.linspace(0.1, 0.9, 9) # blend: Ridge‑heavy → Lasso‑heavy
}
search = GridSearchCV(pipe, param_grid,
cv=5,
scoring='neg_root_mean_squared_error',
n_jobs=-1, verbose=1)
search.fit(X_train, y_train)
print("Best α:", search.best_params_['enet__alpha'])
print("Best l1_ratio:", search.best_params_['enet__l1_ratio'])
7. Evaluate on the hold‑out set
y_pred = search.predict(X_test)
rmse = mean_squared_error(y_test, y_pred, squared=False)
r2 = r2_score(y_test, y_pred)
print(f"Hold‑out RMSE: {rmse:.2f} MW | R²: {r2:.3f}")
8. Interpret coefficients
The coefficient chart shows which signals shift output the most: e.g., rising wind speed (often co‑burned for cooling fans) may increase or depress biomass generation, while certain hours (nighttime maintenance) drop production. Zeroed coefficients mark negligible drivers.
# recover full feature names
ohe_names = search.best_estimator_.named_steps['prep'] \
.named_transformers_['cat'].get_feature_names_out(cat_cols)
feature_names = np.hstack([ohe_names, num_cols])
# rescale numeric coefficients back to original units
scales = search.best_estimator_.named_steps['prep'] \
.named_transformers_['num'].scale_
coeffs = search.best_estimator_.named_steps['enet'].coef_
coeffs[-len(num_cols):] = coeffs[-len(num_cols):] / scales
imp = pd.Series(coeffs, index=feature_names).sort_values(key=abs, ascending=False)
plt.figure(figsize=(9,5))
imp.head(15).plot(kind='barh')
plt.gca().invert_yaxis()
plt.title('Elastic Net Coefficients – Biomass MW Drivers')
plt.xlabel('Δ MW one hour ahead'); plt.tight_layout(); plt.show()
Summary
This mixed‑regression (Elastic Net) workflow turns raw SCADA + weather CSVs into:
- A one‑hour biomass MW forecast is accurate within a few megawatts RMSE.
- A concise, interpretable list of key drivers, balancing sparsity and stability without collinear freak‑outs.
- A one‑command retrain path (search.fit) so operators can refresh the model daily as new data streams in.
Deploying the resulting model helps scheduling teams slash balancing penalties, optimise co‑firing strategy, and plan maintenance around low‑output windows—unlocking more predictable, profitable biomass generation.