Solar Energy Cost Prediction using Bayesian Regression in ML
We offer you a brighter future with FREE online courses - Start Now!!
Solar farm operators and utility planners need to forecast the total monthly cost of solar energy production—before the billing cycle closes—using early‐month indicators such as daily solar irradiance, panel temperature, inverter efficiency, and cumulative capacity factor. Production cost per kWh exhibits nonlinear dependencies on temperature (efficiency losses) and irradiance (diminishing returns during peak sun), as well as on weather variability. By applying Bayesian Regression, we obtain both a point estimate of total cost and a credible interval quantifying our uncertainty—enabling more reliable budgeting, tariff setting, and risk‐aware operational planning.
Libraries Required
import pandas as pd # data loading & handling import numpy as np # numerical operations import matplotlib.pyplot as plt # plotting import seaborn as sns # visualization import pymc3 as pm # Bayesian modeling import arviz as az # posterior analysis from sklearn.model_selection import train_test_split from sklearn.preprocessing import StandardScaler from sklearn.metrics import mean_absolute_error
Dataset
Solar Power Generation & Energy Consumption
Step-by-Step Code Implementation
Import Libraries & Load Data
Data aggregation: We convert daily solar generation into monthly totals and means for irradiance and temperature.
import pandas as pd
# Load generation data
df = pd.read_csv("data/SolarPowerGenerationData.csv", parse_dates=["Date"])
df = df.rename(columns={
"Plant_ID":"plant",
"Solar_Irradiance":"irradiance", # W/m²
"Ambient_Temperature":"temp", # °C
"Generation":"energy_kwh" # kWh produced that day
})
df.head()
Compute Daily Cost & Preprocessing
Target: monthly_cost = monthly_kwh × tariff ($0.12/kWh).
Features:
- irradiance (mean W/m²) captures plant input energy,
- temp (mean °C) captures efficiency losses at high cell temperature,
- monthly_kwh (sum) captures scale.
# Assume fixed tariff
tariff = 0.12 # USD per kWh
df["daily_cost"] = df["energy_kwh"] * tariff
# Aggregate to monthly level
df["month"] = df["Date"].dt.to_period("M").dt.to_timestamp()
monthly = df.groupby("month").agg({
"irradiance":"mean",
"temp":"mean",
"energy_kwh":"sum",
"daily_cost":"sum"
}).rename(columns={"energy_kwh":"monthly_kwh","daily_cost":"monthly_cost"}).reset_index()
# Features & target
X = monthly[["irradiance","temp","monthly_kwh"]].values
y = monthly["monthly_cost"].values
Train/Test Split & Standardisation
Zero‐means and unit‐scales features so the ℓ² penalty and priors operate uniformly.
from sklearn.model_selection import train_test_split from sklearn.preprocessing import StandardScaler # Chronological split: first 80% months train, last 20% test split = int(len(X) * 0.8) X_train, X_test = X[:split], X[split:] y_train, y_test = y[:split], y[split:] # Standardize predictors for stable MCMC scaler = StandardScaler().fit(X_train) X_train_s = scaler.transform(X_train) X_test_s = scaler.transform(X_test)
Define & Fit Bayesian Regression Model
Priors:
- α ∼ Normal(0, 1e4) is a broad prior on baseline cost,
- β ∼ Normal(0, 500) for each standardised feature coefficient, reflecting moderate uncertainty,
- σ ∼ HalfNormal(1000) encodes residual variability.
Model: Linear predictor μ = α + β·X_standardized; observed monthly_cost ∼ Normal(μ, σ).
MCMC: 2,000 draws (plus 1,000 tuning) with target_accept=0.9 for stable inference.
import pymc3 as pm
with pm.Model() as model:
# Priors
α = pm.Normal("α", mu=0, sigma=1e4)
β = pm.Normal("β", mu=0, sigma=500, shape=X_train_s.shape[1])
σ = pm.HalfNormal("σ", sigma=1000)
# Linear predictor
μ = α + pm.math.dot(X_train_s, β)
# Likelihood
Y_obs = pm.Normal("Y_obs", mu=μ, sigma=σ, observed=y_train)
# Sample posterior
trace = pm.sample(
draws=2000, tune=1000,
target_accept=0.9,
return_inferencedata=True
)
Posterior Analysis & Point Predictions
- Sampling Y_obs yields predictive distributions for new months, allowing us to compute point forecasts and 94% Highest Posterior Density (HPD) intervals.
- Mean Absolute Error (MAE) quantifies the average deviation of point predictions from actual monthly costs on held‐out data.
import arviz as az
# Summarize posterior
az.summary(trace, round_to=2)
# Posterior predictive sampling
with model:
ppc = pm.sample_posterior_predictive(trace, var_names=["Y_obs"])
# Extract posterior means
α_post = trace.posterior["α"].mean().item()
β_post = trace.posterior["β"].mean(dim=["chain","draw"]).values
# Point predictions
y_pred = α_post + X_test_s.dot(β_post)
# Evaluate MAE
from sklearn.metrics import mean_absolute_error
mae = mean_absolute_error(y_test, y_pred)
print(f"Test MAE: ${mae:.2f}")
Visualise Predictions & Credible Intervals
Varying monthly_kwh while holding other features fixed, we plot the posterior mean cost curve and its 94% credible band—showing both expected cost scaling and uncertainty.
import numpy as np
import matplotlib.pyplot as plt
# Vary monthly_kwh; fix irradiance & temp at median
kwh_grid = np.linspace(X_train_s[:,2].min(), X_train_s[:,2].max(), 100)
grid = np.tile(np.median(X_train_s, axis=0), (100,1))
grid[:,2] = kwh_grid
with model:
pm.set_data({"X": grid})
ppc_grid = pm.sample_posterior_predictive(trace, var_names=["Y_obs"])
preds = ppc_grid["Y_obs"]
mean_pred = preds.mean(axis=0)
hpd = az.hdi(preds, hdi_prob=0.94)
# Convert kWh back to original scale
kwh_orig = scaler.inverse_transform(
np.column_stack([grid[:,0], grid[:,1], grid[:,2]])
)[:,2]
plt.figure(figsize=(8,5))
plt.plot(kwh_orig, mean_pred, label="Posterior mean")
plt.fill_between(kwh_orig, hpd[:,0], hpd[:,1], alpha=0.3,
label="94% Credible interval")
plt.scatter(
scaler.inverse_transform(X_test_s)[:,2], y_test,
color="k", alpha=0.5, label="Test data"
)
plt.xlabel("Monthly Solar Energy (kWh)")
plt.ylabel("Monthly Cost (USD)")
plt.title("Bayesian Regression: Cost vs. Production")
plt.legend()
plt.tight_layout()
plt.show()
Summary
This Bayesian Regression workflow for solar energy cost forecasting provides:
- Point estimates of monthly production cost from early indicators (irradiance, temperature, energy output).
- Credible intervals quantifying uncertainty from weather and operational variability.
- Actionable insights: solar operators and planners gain both expected cost and its uncertainty bounds, enabling more reliable budgeting, tariff negotiations, and risk‐aware operational decisions.