Solar Panel Output Prediction with Ridge & Lasso Mixed Regression in ML
FREE Online Courses: Enroll Now, Thank us Later!
Utility‑scale PV farms must commit day‑ahead energy schedules to the grid operator. Over‑prediction triggers costly imbalance penalties, while under‑prediction squanders market revenue. Pure Ridge regression copes well with collinear weather inputs (irradiance, temperature, wind), but it keeps every noisy feature; pure Lasso produces sparsity yet can over‑shrink under cloud‑cover multicollinearity.
An Elastic Net—which mixes the ℓ² (Ridge) and ℓ¹ (Lasso) penalties—offers the best trade‑off, yielding a parsimonious yet stable model. We will:
- Forecast 15‑minute DC power (kW) for each inverter string one hour ahead, using weather sensor data (irradiance, ambient T, module T, wind speed, etc.) and time signals.
- Identify the handful of predictors that truly drive output while keeping the model robust to correlated sunlight metrics.
Libraries Required
| Role | Library |
| Data & dates | pandas, numpy, datetime |
| Visuals | matplotlib, seaborn |
| ML workflow | scikit‑learn → ColumnTransformer, StandardScaler, OneHotEncoder, ElasticNet, GridSearchCV, Pipeline, train_test_split |
| Metrics | mean_squared_error, r2_score |
Dataset Link
Step-by-Step Code Implementation
1. Import Libraries
import pandas as pd import numpy as np import matplotlib.pyplot as plt import seaborn as sns from datetime import timedelta from sklearn.compose import ColumnTransformer from sklearn.preprocessing import StandardScaler, OneHotEncoder from sklearn.model_selection import train_test_split, GridSearchCV from sklearn.pipeline import Pipeline from sklearn.linear_model import ElasticNet from sklearn.metrics import mean_squared_error, r2_score
2. Download & load dataset
Dataset — two Indian PV plants with 34 days of 15‑min SCADA & meteorology (irradiance, module & ambient temperature, wind). We used Plant 1 for brevity.
# One‑time terminal command (needs Kaggle API key):
# kaggle datasets download -d anikannal/solar-power-generation-data -p data --unzip
# Plant 1 generation & weather (only one plant used for demo)
gen = pd.read_csv("data/Plant_1_Generation_Data.csv")
wthr = pd.read_csv("data/Plant_1_Weather_Sensor_Data.csv")
3. Merge & resample to 15‑min resolution
gen['DATE_TIME'] = pd.to_datetime(gen['DATE_TIME'])
wthr['DATE_TIME'] = pd.to_datetime(wthr['DATE_TIME'])
# Aggregate inverter strings to plant‑level DC power (kW)
dc_kw = gen.groupby('DATE_TIME')['DC_POWER'].sum().div(1000) # W → kW
wthr = wthr.set_index('DATE_TIME').resample('15T').mean()
data = pd.concat([dc_kw, wthr], axis=1).dropna()
data = data.rename(columns={'DC_POWER': 'Total_DC_kW'})
4. Create “one‑hour‑ahead” supervised pairs
Supervised framing — for every 15-minute record, we predict plant DC power one hour later; a four‑step shift creates the target.
# Shift target 4 × 15 min steps = 60 min data['Target_kW'] = data['Total_DC_kW'].shift(-4) data = data.dropna(subset=['Target_kW'])
5. Feature engineering
Irradiance dominates, but module temperature, ambient T, wind speed (cooling), and time‑of‑day signals improve shading & low‑sun‑angle periods.
data['Hour'] = data.index.hour
data['Month'] = data.index.month
num_cols = ['IRRADIATION', 'AMBIENT_TEMPERATURE', 'MODULE_TEMPERATURE',
'WindSpeed', 'Hour', 'Month']
X = data[num_cols]
y = data['Target_kW']
6. Pre‑processing & Elastic Net pipeline
Pipeline — scaling + model executed inside a single Pipeline eliminates leakage when CV splits the time‑ordered data.
preprocess = ColumnTransformer([
('num', StandardScaler(), num_cols)
])
pipe = Pipeline([
('prep', preprocess),
('enet', ElasticNet(max_iter=20000, random_state=42))
])
7. Train/test split & hyper‑parameter search
- alpha controls overall penalty strength (higher → more shrinkage).
- l1_ratio tilts the penalty toward Lasso (1.0) or Ridge (0.0).
- Five‑fold CV chooses the best pair to minimise RMSE.
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42, shuffle=False) # keep temporal order
param_grid = {
'enet__alpha': np.logspace(-2, 1, 15), # 0.01 → 10
'enet__l1_ratio': np.linspace(0.1, 0.9, 9) # 0.1≈Ridge‑heavy, 0.9≈Lasso‑heavy
}
search = GridSearchCV(pipe, param_grid,
cv=5,
scoring='neg_root_mean_squared_error',
n_jobs=-1, verbose=1)
search.fit(X_train, y_train)
print("Best α:", search.best_params_['enet__alpha'])
print("Best l1_ratio:", search.best_params_['enet__l1_ratio'])
8. Evaluate on the hold‑out set
y_pred = search.predict(X_test)
rmse = mean_squared_error(y_test, y_pred, squared=False)
r2 = r2_score(y_test, y_pred)
print(f"Hold‑out RMSE: {rmse:.1f} kW | R²: {r2:.3f}")
9. Inspect coefficients
Non-zero coefficients rank drivers: irradiance tops the list, but module temperature (with a negative sign) reflects the efficiency drop under heat. The ridge component prevents the model from discarding correlated sensor channels altogether.
scales = search.best_estimator_.named_steps['prep'] \
.named_transformers_['num'].scale_
coefs = search.best_estimator_.named_steps['enet'].coef_ / scales
imp = pd.Series(coefs, index=num_cols).sort_values(key=abs, ascending=False)
plt.figure(figsize=(7,4))
imp.plot(kind='barh'); plt.gca().invert_yaxis()
plt.title('Elastic Net Coefficients – Solar Output Drivers')
plt.xlabel('Δ kW (1 hour ahead)'); plt.show()
Summary
With ~120 lines of Python, we produced an Elastic Net model that:
- Forecasts plant DC power one hour ahead with a low RMSE, giving operators time to adjust inverter set‑points or battery dispatch.
- Balances Ridge’s stability and Lasso’s sparsity, yielding an interpretable set of weather drivers rather than a black‑box ensemble.
- Streamlines retraining—drop in tomorrow’s SCADA CSV, call .fit(), and the entire pipeline updates automatically.
The result: more accurate, transparent solar‑output predictions that translate directly into better grid bids and reduced imbalance charges.