Geothermal Energy Output Prediction with Ridge & Lasso Mixed Regression in ML

FREE Online Courses: Click for Success, Learn for Free - Start Now!

Independent power producers that operate geothermal wells must tell the grid operator, day‑ahead, how many megawatts their field will supply. The amount varies with brine temperature, mass flow rate, steam pressure, and even ambient weather conditions. If those forecasts miss high, plants pay imbalance penalties; if they miss low, they forfeit revenue. Because many sensor channels are collinear (e.g., wellhead T ↔ steam P), ordinary least‑squares explodes, pure Ridge keeps every noisy feature, and pure Lasso may drop proper signals.

A mixed (Elastic Net) regression—which blends the ℓ² penalty of Ridge with the ℓ¹ penalty of Lasso—delivers a sparse yet stable model that predicts one‑hour‑ahead power while automatically trimming redundant inputs.

Libraries Required

Purpose Python package
Data wrangling pandas, numpy
Visualisation matplotlib, seaborn
ML pipeline scikit‑learnColumnTransformer, StandardScaler, ElasticNet, GridSearchCV, Pipeline, train_test_split
Metrics mean_squared_error, r2_score

Dataset Link

Geothermal Power

Step-by-Step Code Implementation

1. Import Libraries

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.pipeline import Pipeline
from sklearn.linear_model import ElasticNet
from sklearn.metrics import mean_squared_error, r2_score

2. Download and load the dataset

The Kaggle file logs 15‑minute SCADA from a geothermal plant; shifting power_MW four rows forward creates a one‑hour‑ahead target with zero leakage.

# one‑time shell command (needs Kaggle API token):
# kaggle datasets download -d mathurinache/geothermal-power -p data --unzip

df = pd.read_csv("data/geothermal_power.csv", parse_dates=['timestamp'])

3. Feature & target preparation

We combine physical sensors (wellhead T, flow, pressure), ambient temperature (which affects condenser efficiency), and cyclical time parts (hour, month).

# Example column names: adjust if different
cols = ['timestamp', 'wellhead_temp_C', 'brine_flow_kg_s', 'steam_pressure_bar',
        'ambient_temp_C', 'power_MW']
df = df[cols].dropna().sort_values('timestamp')

# one‑hour‑ahead power target
df['target_MW'] = df['power_MW'].shift(-4)          # 4 × 15‑min = 1 h
df = df.dropna(subset=['target_MW'])

# time‑of‑day signals
df['hour']   = df['timestamp'].dt.hour
df['month']  = df['timestamp'].dt.month
num_cols = ['wellhead_temp_C', 'brine_flow_kg_s', 'steam_pressure_bar',
            'ambient_temp_C', 'power_MW', 'hour', 'month']

X = df[num_cols]
y = df['target_MW']

4. Build an Elastic Net pipeline

Pipeline: StandardScaler lives inside the Pipeline, so each CV fold scales the training data only, preventing look‑ahead bias.

preprocess = ColumnTransformer([
        ('num', StandardScaler(), num_cols)
    ])

pipe = Pipeline([
        ('prep', preprocess),
        ('enet', ElasticNet(max_iter=20000, random_state=42))
])

5. Train/test split & hyper‑parameter search

  • alpha scales the total penalty;
  • l1_ratio slides between Ridge (robust to collinearity) and Lasso (feature selection).
  • A 5‑fold grid‑search over 162 candidates (18 α × 9 ratios) finds the sweet spot.
X_train, X_test, y_train, y_test = train_test_split(
        X, y, test_size=0.2, shuffle=False)    # keep chronological order

param_grid = {
    'enet__alpha'   : np.logspace(-3, 1, 18),  # 0.001 → 10
    'enet__l1_ratio': np.linspace(0.1, 0.9, 9) # 0.1 ≈ Ridge‑heavy, 0.9 ≈ Lasso‑heavy
}

grid = GridSearchCV(pipe, param_grid,
                    cv=5,
                    scoring='neg_root_mean_squared_error',
                    n_jobs=-1, verbose=1)
grid.fit(X_train, y_train)

print("Best α:",       grid.best_params_['enet__alpha'])
print("Best l1_ratio:", grid.best_params_['enet__l1_ratio'])

6. Hold‑out evaluation

RMSE (in MW) tells operators the typical one‑hour forecast error; R2R^{2} shows explanatory strength.

y_pred = grid.predict(X_test)
rmse = mean_squared_error(y_test, y_pred, squared=False)
r2   = r2_score(y_test, y_pred)
print(f"RMSE (1 h ahead): {rmse:.2f} MW | R²: {r2:.3f}")

7. Interpret coefficients

The coefficient chart shows, for instance, that a 10 °C rise in wellhead temperature boosts next‑hour output more than the same rise in ambient T, while very low steam pressure unexpectedly reduces production; zeroed coefficients reveal negligible drivers.

# reverse‑scale numeric coeffs for interpretability
scales = grid.best_estimator_.named_steps['prep'].named_transformers_['num'].scale_
coefs  = grid.best_estimator_.named_steps['enet'].coef_ / scales

imp = pd.Series(coefs, index=num_cols).sort_values(key=abs, ascending=False)

plt.figure(figsize=(8,5))
imp.head(12).plot(kind='barh')
plt.gca().invert_yaxis()
plt.title('Elastic Net Coefficients – Drivers of Geothermal MW (1 h ahead)')
plt.xlabel('Δ MW per unit change'); plt.tight_layout(); plt.show()

Summary

With fewer than 140 lines of code, we created a mixed‑regression (Elastic Net) model that:

  • Predicts geothermal power one hour ahead with low RMSE.
  • Balances sparsity and stability, trimming redundant sensors while guarding against collinear blow‑ups.
  • Offers clear physical insight, ranking temperature, flow, and pressure by MW impact—helpful for both operators and maintenance teams.

Because preprocessing, tuning, and inference sit within a single Pipeline, updating the model with tomorrow’s SCADA file is a single .fit() call, keeping forecasts accurate as age and seasons shift.

Did you like this article? If Yes, please give ProjectGurukul 5 Stars on Google | Facebook

ProjectGurukul Team

ProjectGurukul Team specializes in creating project-based learning resources for programming, Java, Python, Android, AI, Webdevelopment and machine learning. Our mission is to help learners build practical skills through engaging, hands-on projects. We also offer free major and minor projects with source code for engineering students

Leave a Reply

Your email address will not be published. Required fields are marked *