Geothermal Energy Output Prediction with Ridge & Lasso Mixed Regression in ML

FREE Online Courses: Click for Success, Learn for Free - Start Now!

Independent power producers that operate geothermal wells must tell the grid operator, day‑ahead, how many megawatts their field will supply. The amount varies with brine temperature, mass flow rate, steam pressure, and even ambient weather conditions. If those forecasts miss high, plants pay imbalance penalties; if they miss low, they forfeit revenue. Because many sensor channels are collinear (e.g., wellhead T ↔ steam P), ordinary least‑squares explodes, pure Ridge keeps every noisy feature, and pure Lasso may drop proper signals.

A mixed (Elastic Net) regression—which blends the ℓ² penalty of Ridge with the ℓ¹ penalty of Lasso—delivers a sparse yet stable model that predicts one‑hour‑ahead power while automatically trimming redundant inputs.

Libraries Required

Purpose	Python package
Data wrangling	pandas, numpy
Visualisation	matplotlib, seaborn
ML pipeline	scikit‑learn → ColumnTransformer, StandardScaler, ElasticNet, GridSearchCV, Pipeline, train_test_split
Metrics	mean_squared_error, r2_score

Dataset Link

Geothermal Power

Step-by-Step Code Implementation

1. Import Libraries

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.pipeline import Pipeline
from sklearn.linear_model import ElasticNet
from sklearn.metrics import mean_squared_error, r2_score

2. Download and load the dataset

The Kaggle file logs 15‑minute SCADA from a geothermal plant; shifting power_MW four rows forward creates a one‑hour‑ahead target with zero leakage.

# one‑time shell command (needs Kaggle API token):
# kaggle datasets download -d mathurinache/geothermal-power -p data --unzip

df = pd.read_csv("data/geothermal_power.csv", parse_dates=['timestamp'])

3. Feature & target preparation

We combine physical sensors (wellhead T, flow, pressure), ambient temperature (which affects condenser efficiency), and cyclical time parts (hour, month).

# Example column names: adjust if different
cols = ['timestamp', 'wellhead_temp_C', 'brine_flow_kg_s', 'steam_pressure_bar',
        'ambient_temp_C', 'power_MW']
df = df[cols].dropna().sort_values('timestamp')

# one‑hour‑ahead power target
df['target_MW'] = df['power_MW'].shift(-4)          # 4 × 15‑min = 1 h
df = df.dropna(subset=['target_MW'])

# time‑of‑day signals
df['hour']   = df['timestamp'].dt.hour
df['month']  = df['timestamp'].dt.month
num_cols = ['wellhead_temp_C', 'brine_flow_kg_s', 'steam_pressure_bar',
            'ambient_temp_C', 'power_MW', 'hour', 'month']

X = df[num_cols]
y = df['target_MW']

4. Build an Elastic Net pipeline

Pipeline: StandardScaler lives inside the Pipeline, so each CV fold scales the training data only, preventing look‑ahead bias.

preprocess = ColumnTransformer([
        ('num', StandardScaler(), num_cols)
    ])

pipe = Pipeline([
        ('prep', preprocess),
        ('enet', ElasticNet(max_iter=20000, random_state=42))
])

5. Train/test split & hyper‑parameter search

alpha scales the total penalty;
l1_ratio slides between Ridge (robust to collinearity) and Lasso (feature selection).
A 5‑fold grid‑search over 162 candidates (18 α × 9 ratios) finds the sweet spot.

X_train, X_test, y_train, y_test = train_test_split(
        X, y, test_size=0.2, shuffle=False)    # keep chronological order

param_grid = {
    'enet__alpha'   : np.logspace(-3, 1, 18),  # 0.001 → 10
    'enet__l1_ratio': np.linspace(0.1, 0.9, 9) # 0.1 ≈ Ridge‑heavy, 0.9 ≈ Lasso‑heavy
}

grid = GridSearchCV(pipe, param_grid,
                    cv=5,
                    scoring='neg_root_mean_squared_error',
                    n_jobs=-1, verbose=1)
grid.fit(X_train, y_train)

print("Best α:",       grid.best_params_['enet__alpha'])
print("Best l1_ratio:", grid.best_params_['enet__l1_ratio'])

6. Hold‑out evaluation

RMSE (in MW) tells operators the typical one‑hour forecast error; R2R^{2} shows explanatory strength.

y_pred = grid.predict(X_test)
rmse = mean_squared_error(y_test, y_pred, squared=False)
r2   = r2_score(y_test, y_pred)
print(f"RMSE (1 h ahead): {rmse:.2f} MW | R²: {r2:.3f}")

7. Interpret coefficients

The coefficient chart shows, for instance, that a 10 °C rise in wellhead temperature boosts next‑hour output more than the same rise in ambient T, while very low steam pressure unexpectedly reduces production; zeroed coefficients reveal negligible drivers.

# reverse‑scale numeric coeffs for interpretability
scales = grid.best_estimator_.named_steps['prep'].named_transformers_['num'].scale_
coefs  = grid.best_estimator_.named_steps['enet'].coef_ / scales

imp = pd.Series(coefs, index=num_cols).sort_values(key=abs, ascending=False)

plt.figure(figsize=(8,5))
imp.head(12).plot(kind='barh')
plt.gca().invert_yaxis()
plt.title('Elastic Net Coefficients – Drivers of Geothermal MW (1 h ahead)')
plt.xlabel('Δ MW per unit change'); plt.tight_layout(); plt.show()

Summary

With fewer than 140 lines of code, we created a mixed‑regression (Elastic Net) model that:

Predicts geothermal power one hour ahead with low RMSE.
Balances sparsity and stability, trimming redundant sensors while guarding against collinear blow‑ups.
Offers clear physical insight, ranking temperature, flow, and pressure by MW impact—helpful for both operators and maintenance teams.

Because preprocessing, tuning, and inference sit within a single Pipeline, updating the model with tomorrow’s SCADA file is a single .fit() call, keeping forecasts accurate as age and seasons shift.

Did you like this article? If Yes, please give ProjectGurukul 5 Stars on Google | Facebook

Geothermal Energy Output Prediction with Ridge & Lasso Mixed Regression in ML

Libraries Required

Dataset Link

Step-by-Step Code Implementation

1. Import Libraries

2. Download and load the dataset

3. Feature & target preparation

4. Build an Elastic Net pipeline

5. Train/test split & hyper‑parameter search

6. Hold‑out evaluation