Solar Energy Cost Prediction using Bayesian Regression in ML

FREE Online Courses: Elevate Skills, Zero Cost. Enroll Now!

Solar farm operators and utility planners need to forecast the total monthly cost of solar energy production—before the billing cycle closes—using early‐month indicators such as daily solar irradiance, panel temperature, inverter efficiency, and cumulative capacity factor. Production cost per kWh exhibits nonlinear dependencies on temperature (efficiency losses) and irradiance (diminishing returns during peak sun), as well as on weather variability. By applying Bayesian Regression, we obtain both a point estimate of total cost and a credible interval quantifying our uncertainty—enabling more reliable budgeting, tariff setting, and risk‐aware operational planning.

Libraries Required

import pandas as pd                              # data loading & handling  
import numpy as np                               # numerical operations  

import matplotlib.pyplot as plt                  # plotting  
import seaborn as sns                            # visualization  

import pymc3 as pm                               # Bayesian modeling  
import arviz as az                               # posterior analysis  

from sklearn.model_selection import train_test_split  
from sklearn.preprocessing import StandardScaler  
from sklearn.metrics import mean_absolute_error

Dataset

Solar Power Generation & Energy Consumption

Step-by-Step Code Implementation

Import Libraries & Load Data

Data aggregation: We convert daily solar generation into monthly totals and means for irradiance and temperature.

import pandas as pd

# Load generation data
df = pd.read_csv("data/SolarPowerGenerationData.csv", parse_dates=["Date"])
df = df.rename(columns={
    "Plant_ID":"plant",
    "Solar_Irradiance":"irradiance",   # W/m²
    "Ambient_Temperature":"temp",      # °C
    "Generation":"energy_kwh"          # kWh produced that day
})
df.head()

Compute Daily Cost & Preprocessing

Target: monthly_cost = monthly_kwh × tariff ($0.12/kWh).

Features:

irradiance (mean W/m²) captures plant input energy,
temp (mean °C) captures efficiency losses at high cell temperature,
monthly_kwh (sum) captures scale.

# Assume fixed tariff
tariff = 0.12  # USD per kWh
df["daily_cost"] = df["energy_kwh"] * tariff

# Aggregate to monthly level
df["month"] = df["Date"].dt.to_period("M").dt.to_timestamp()
monthly = df.groupby("month").agg({
    "irradiance":"mean",
    "temp":"mean",
    "energy_kwh":"sum",
    "daily_cost":"sum"
}).rename(columns={"energy_kwh":"monthly_kwh","daily_cost":"monthly_cost"}).reset_index()

# Features & target
X = monthly[["irradiance","temp","monthly_kwh"]].values
y = monthly["monthly_cost"].values

Train/Test Split & Standardisation

Zero‐means and unit‐scales features so the ℓ² penalty and priors operate uniformly.

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

# Chronological split: first 80% months train, last 20% test
split = int(len(X) * 0.8)
X_train, X_test = X[:split], X[split:]
y_train, y_test = y[:split], y[split:]

# Standardize predictors for stable MCMC
scaler = StandardScaler().fit(X_train)
X_train_s = scaler.transform(X_train)
X_test_s  = scaler.transform(X_test)

Define & Fit Bayesian Regression Model

Priors:

α ∼ Normal(0, 1e4) is a broad prior on baseline cost,
β ∼ Normal(0, 500) for each standardised feature coefficient, reflecting moderate uncertainty,
σ ∼ HalfNormal(1000) encodes residual variability.

Model: Linear predictor μ = α + β·X_standardized; observed monthly_cost ∼ Normal(μ, σ).

MCMC: 2,000 draws (plus 1,000 tuning) with target_accept=0.9 for stable inference.

import pymc3 as pm

with pm.Model() as model:
    # Priors
    α = pm.Normal("α", mu=0, sigma=1e4)
    β = pm.Normal("β", mu=0, sigma=500, shape=X_train_s.shape[1])
    σ = pm.HalfNormal("σ", sigma=1000)

    # Linear predictor
    μ = α + pm.math.dot(X_train_s, β)

    # Likelihood
    Y_obs = pm.Normal("Y_obs", mu=μ, sigma=σ, observed=y_train)

    # Sample posterior
    trace = pm.sample(
        draws=2000, tune=1000,
        target_accept=0.9,
        return_inferencedata=True
    )

Posterior Analysis & Point Predictions

Sampling Y_obs yields predictive distributions for new months, allowing us to compute point forecasts and 94% Highest Posterior Density (HPD) intervals.
Mean Absolute Error (MAE) quantifies the average deviation of point predictions from actual monthly costs on held‐out data.

import arviz as az

# Summarize posterior
az.summary(trace, round_to=2)

# Posterior predictive sampling
with model:
    ppc = pm.sample_posterior_predictive(trace, var_names=["Y_obs"])

# Extract posterior means
α_post = trace.posterior["α"].mean().item()
β_post = trace.posterior["β"].mean(dim=["chain","draw"]).values

# Point predictions
y_pred = α_post + X_test_s.dot(β_post)

# Evaluate MAE
from sklearn.metrics import mean_absolute_error
mae = mean_absolute_error(y_test, y_pred)
print(f"Test MAE: ${mae:.2f}")

Visualise Predictions & Credible Intervals

Varying monthly_kwh while holding other features fixed, we plot the posterior mean cost curve and its 94% credible band—showing both expected cost scaling and uncertainty.

import numpy as np
import matplotlib.pyplot as plt

# Vary monthly_kwh; fix irradiance & temp at median
kwh_grid = np.linspace(X_train_s[:,2].min(), X_train_s[:,2].max(), 100)
grid = np.tile(np.median(X_train_s, axis=0), (100,1))
grid[:,2] = kwh_grid

with model:
    pm.set_data({"X": grid})
    ppc_grid = pm.sample_posterior_predictive(trace, var_names=["Y_obs"])

preds     = ppc_grid["Y_obs"]
mean_pred = preds.mean(axis=0)
hpd       = az.hdi(preds, hdi_prob=0.94)

# Convert kWh back to original scale
kwh_orig = scaler.inverse_transform(
    np.column_stack([grid[:,0], grid[:,1], grid[:,2]])
)[:,2]

plt.figure(figsize=(8,5))
plt.plot(kwh_orig, mean_pred, label="Posterior mean")
plt.fill_between(kwh_orig, hpd[:,0], hpd[:,1], alpha=0.3,
                 label="94% Credible interval")
plt.scatter(
    scaler.inverse_transform(X_test_s)[:,2], y_test,
    color="k", alpha=0.5, label="Test data"
)
plt.xlabel("Monthly Solar Energy (kWh)")
plt.ylabel("Monthly Cost (USD)")
plt.title("Bayesian Regression: Cost vs. Production")
plt.legend()
plt.tight_layout()
plt.show()

Summary

This Bayesian Regression workflow for solar energy cost forecasting provides:

Point estimates of monthly production cost from early indicators (irradiance, temperature, energy output).
Credible intervals quantifying uncertainty from weather and operational variability.
Actionable insights: solar operators and planners gain both expected cost and its uncertainty bounds, enabling more reliable budgeting, tariff negotiations, and risk‐aware operational decisions.

If you are Happy with ProjectGurukul, do not forget to make us happy with your positive feedback on Google | Facebook

Solar Energy Cost Prediction using Bayesian Regression in ML

Libraries Required