Solar Panel Output Prediction with Ridge & Lasso Mixed Regression in ML

FREE Online Courses: Your Passport to Excellence - Start Now

Utility‑scale PV farms must commit day‑ahead energy schedules to the grid operator. Over‑prediction triggers costly imbalance penalties, while under‑prediction squanders market revenue. Pure Ridge regression copes well with collinear weather inputs (irradiance, temperature, wind), but it keeps every noisy feature; pure Lasso produces sparsity yet can over‑shrink under cloud‑cover multicollinearity.

An Elastic Net—which mixes the ℓ² (Ridge) and ℓ¹ (Lasso) penalties—offers the best trade‑off, yielding a parsimonious yet stable model. We will:

  • Forecast 15‑minute DC power (kW) for each inverter string one hour ahead, using weather sensor data (irradiance, ambient T, module T, wind speed, etc.) and time signals.
  • Identify the handful of predictors that truly drive output while keeping the model robust to correlated sunlight metrics.

Libraries Required

Role Library
Data & dates pandas, numpy, datetime
Visuals matplotlib, seaborn
ML workflow scikit‑learnColumnTransformer, StandardScaler, OneHotEncoder, ElasticNet, GridSearchCV, Pipeline, train_test_split
Metrics mean_squared_error, r2_score

Dataset Link

Solar Power Generation Data

Step-by-Step Code Implementation

1. Import Libraries

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import timedelta

from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.pipeline import Pipeline
from sklearn.linear_model import ElasticNet
from sklearn.metrics import mean_squared_error, r2_score

2. Download & load dataset

Dataset — two Indian PV plants with 34 days of 15‑min SCADA & meteorology (irradiance, module & ambient temperature, wind). We used Plant 1 for brevity.

# One‑time terminal command (needs Kaggle API key):
# kaggle datasets download -d anikannal/solar-power-generation-data -p data --unzip

# Plant 1 generation & weather (only one plant used for demo)
gen  = pd.read_csv("data/Plant_1_Generation_Data.csv")
wthr = pd.read_csv("data/Plant_1_Weather_Sensor_Data.csv")

3. Merge & resample to 15‑min resolution

gen['DATE_TIME']  = pd.to_datetime(gen['DATE_TIME'])
wthr['DATE_TIME'] = pd.to_datetime(wthr['DATE_TIME'])

# Aggregate inverter strings to plant‑level DC power (kW)
dc_kw = gen.groupby('DATE_TIME')['DC_POWER'].sum().div(1000)   # W → kW
wthr   = wthr.set_index('DATE_TIME').resample('15T').mean()

data = pd.concat([dc_kw, wthr], axis=1).dropna()
data = data.rename(columns={'DC_POWER': 'Total_DC_kW'})

4. Create “one‑hour‑ahead” supervised pairs

Supervised framing — for every 15-minute record, we predict plant DC power one hour later; a four‑step shift creates the target.

# Shift target 4 × 15 min steps = 60 min
data['Target_kW'] = data['Total_DC_kW'].shift(-4)
data = data.dropna(subset=['Target_kW'])

5. Feature engineering

Irradiance dominates, but module temperature, ambient T, wind speed (cooling), and time‑of‑day signals improve shading & low‑sun‑angle periods.

data['Hour']  = data.index.hour
data['Month'] = data.index.month
num_cols = ['IRRADIATION', 'AMBIENT_TEMPERATURE', 'MODULE_TEMPERATURE',
            'WindSpeed', 'Hour', 'Month']
X = data[num_cols]
y = data['Target_kW']

6. Pre‑processing & Elastic Net pipeline

Pipeline — scaling + model executed inside a single Pipeline eliminates leakage when CV splits the time‑ordered data.

preprocess = ColumnTransformer([
        ('num', StandardScaler(), num_cols)
])

pipe = Pipeline([
    ('prep', preprocess),
    ('enet', ElasticNet(max_iter=20000, random_state=42))
])

7. Train/test split & hyper‑parameter search

  • alpha controls overall penalty strength (higher → more shrinkage).
  • l1_ratio tilts the penalty toward Lasso (1.0) or Ridge (0.0).
  • Five‑fold CV chooses the best pair to minimise RMSE.
X_train, X_test, y_train, y_test = train_test_split(
        X, y, test_size=0.2, random_state=42, shuffle=False)   # keep temporal order

param_grid = {
    'enet__alpha':     np.logspace(-2, 1, 15),   # 0.01 → 10
    'enet__l1_ratio':  np.linspace(0.1, 0.9, 9)  # 0.1≈Ridge‑heavy, 0.9≈Lasso‑heavy
}

search = GridSearchCV(pipe, param_grid,
                      cv=5,
                      scoring='neg_root_mean_squared_error',
                      n_jobs=-1, verbose=1)
search.fit(X_train, y_train)

print("Best α:", search.best_params_['enet__alpha'])
print("Best l1_ratio:", search.best_params_['enet__l1_ratio'])

8. Evaluate on the hold‑out set

y_pred = search.predict(X_test)
rmse = mean_squared_error(y_test, y_pred, squared=False)
r2   = r2_score(y_test, y_pred)

print(f"Hold‑out RMSE: {rmse:.1f} kW | R²: {r2:.3f}")

9. Inspect coefficients

Non-zero coefficients rank drivers: irradiance tops the list, but module temperature (with a negative sign) reflects the efficiency drop under heat. The ridge component prevents the model from discarding correlated sensor channels altogether.

scales = search.best_estimator_.named_steps['prep'] \
                    .named_transformers_['num'].scale_
coefs  = search.best_estimator_.named_steps['enet'].coef_ / scales

imp = pd.Series(coefs, index=num_cols).sort_values(key=abs, ascending=False)
plt.figure(figsize=(7,4))
imp.plot(kind='barh'); plt.gca().invert_yaxis()
plt.title('Elastic Net Coefficients – Solar Output Drivers')
plt.xlabel('Δ kW (1 hour ahead)'); plt.show()

Summary

With ~120 lines of Python, we produced an Elastic Net model that:

  • Forecasts plant DC power one hour ahead with a low RMSE, giving operators time to adjust inverter set‑points or battery dispatch.
  • Balances Ridge’s stability and Lasso’s sparsity, yielding an interpretable set of weather drivers rather than a black‑box ensemble.
  • Streamlines retraining—drop in tomorrow’s SCADA CSV, call .fit(), and the entire pipeline updates automatically.

The result: more accurate, transparent solar‑output predictions that translate directly into better grid bids and reduced imbalance charges.

If you are Happy with ProjectGurukul, do not forget to make us happy with your positive feedback on Google | Facebook

ProjectGurukul Team

ProjectGurukul Team specializes in creating project-based learning resources for programming, Java, Python, Android, AI, Webdevelopment and machine learning. Our mission is to help learners build practical skills through engaging, hands-on projects. We also offer free major and minor projects with source code for engineering students

Leave a Reply

Your email address will not be published. Required fields are marked *