Solar Output Efficiency Prediction with Polynomial Regression in ML
FREE Online Courses: Enroll Now, Thank us Later!
Solar-farm operators and designers need to estimate panel-array efficiency (%)—the ratio of DC power output to incident irradiance—based on environmental and system parameters measured at deployment time. Historical SCADA and weather data show that efficiency depends nonlinearly on module temperature, irradiance, ambient temperature, wind speed, and panel tilt angle. A simple linear model fails to capture key curvatures (e.g. thermal derating at high temperatures), while a naïve high‑degree polynomial overfits measurement noise. By applying Polynomial Regression to engineered features with Ridge regularisation, we can model smooth, nonlinear effects and deliver reliable, interpretable efficiency forecasts for design optimisation and real‑time performance monitoring.
Dataset
Forecasting Solar Energy Efficiency
Step-by-Step Code Implementation
1. Libraries Required
import pandas as pd # data loading & handling import numpy as np # numerical operations import matplotlib.pyplot as plt # plotting import seaborn as sns # enhanced visualization from sklearn.model_selection import train_test_split, GridSearchCV from sklearn.preprocessing import StandardScaler, PolynomialFeatures from sklearn.linear_model import Ridge from sklearn.pipeline import Pipeline from sklearn.metrics import mean_squared_error, r2_score
2. Load Data & Libraries
import pandas as pd
import numpy as np
# Adjust filename/path as needed
df = pd.read_csv("data/SolarPrediction.csv")
# Preview relevant columns
df.head()[[
'T','SOLAR_RADIATION','AMB_TEMP','MODULE_TEMP','WIND_SPEED','TIME'
]]
3. Target Engineering & Exploratory Analysis
Target engineering: Efficiency = MODULE_TEMP / SOLAR_RADIATION serves as a proxy for instantaneous module efficiency, filtering out zero‑irradiance samples.
import seaborn as sns
import matplotlib.pyplot as plt
# Compute efficiency proxy as MODULE_TEMP-adjusted ratio
# (assuming 'SOLAR_RADIATION' >0)
df = df[df['SOLAR_RADIATION'] > 0].copy()
df['Efficiency'] = df['MODULE_TEMP'] / df['SOLAR_RADIATION']
# Visualize nonlinear trend: module temperature vs efficiency
sns.scatterplot(x='MODULE_TEMP', y='Efficiency', data=df, alpha=0.3)
plt.title("Module Temperature vs Efficiency")
plt.xlabel("Module Temperature (°C)")
plt.ylabel("Efficiency (ratio)")
plt.show()
4. Define Features & Target
PolynomialFeatures: expands the six inputs into their squares and pairwise interactions (e.g. SOLAR_RADIATION², MODULE_TEMP×WIND_SPEED), capturing nonlinear derating and cooling effects.
# Features available at design or per-sample time X = df[['T', 'SOLAR_RADIATION', 'AMB_TEMP', 'MODULE_TEMP', 'WIND_SPEED', 'TIME']] y = df['Efficiency']
5. Build Polynomial Regression Pipeline
StandardScaler: normalises inputs so that Ridge’s ℓ² penalty treats each term uniformly, preventing dominance by high‑variance features.
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler, PolynomialFeatures
from sklearn.linear_model import Ridge
pipe = Pipeline([
('scale', StandardScaler()),
('poly', PolynomialFeatures(include_bias=False)),
('ridge', Ridge(random_state=42))
])
6. Train/Test Split & Hyperparameter Search
GridSearchCV: tunes the polynomial degree (1–3) and regularisation strength α (10⁻³–10³) via 5‑fold cross‑validation, optimising for lowest RMSE on held‑out data.
from sklearn.model_selection import train_test_split, GridSearchCV
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42
)
param_grid = {
'poly__degree': [1, 2, 3],
'ridge__alpha': np.logspace(-3, 3, 7)
}
gs = GridSearchCV(
pipe, param_grid,
cv=5,
scoring='neg_root_mean_squared_error',
n_jobs=-1, verbose=1
)
gs.fit(X_train, y_train)
print("Best parameters:", gs.best_params_)
7. Evaluate Model
y_pred = gs.predict(X_test)
rmse = mean_squared_error(y_test, y_pred, squared=False)
r2 = r2_score(y_test, y_pred)
print(f"Test RMSE: {rmse:.4f} (efficiency ratio)")
print(f"Test R² : {r2:.3f}")
8. Inspect Key Polynomial Coefficients
Coefficient inspection: ranks the most influential polynomial features—guiding engineers on which parameters (e.g. squared irradiance or temperature interactions) most affect predicted efficiency and thus guide cooling or tilt‐control strategies.
# Retrieve polynomial feature names
poly = gs.best_estimator_.named_steps['poly']
feat_names = poly.get_feature_names_out(input_features=X.columns)
# Retrieve Ridge coefficients
coefs = gs.best_estimator_.named_steps['ridge'].coef_
import pandas as pd
coef_series = pd.Series(coefs, index=feat_names).abs().sort_values(ascending=False)
# Plot top 10 drivers
plt.figure(figsize=(8,5))
coef_series.head(10).plot(kind='barh')
plt.gca().invert_yaxis()
plt.title("Top Polynomial Features Influencing Efficiency")
plt.xlabel("Coefficient Magnitude")
plt.tight_layout()
plt.show()
Summary
By integrating polynomial feature engineering with Ridge regularisation into a concise pipeline, this workflow:
- Accurately models nonlinear efficiency effects (low RMSE, strong R²).
- Controls model complexity, avoiding overfitting while capturing key curvatures.
- Provides interpretable insights: top polynomial features reveal actionable levers—such as irradiance squared or module‑temperature interactions—for optimizing panel design and operational controls.