Biomass Energy Cost Prediction using Bayesian Regression in ML
FREE Online Courses: Your Passport to Excellence - Start Now
Biomass‐to‐energy plant managers must forecast the delivered energy cost ($/MJ or $/kWh)—before contracting feedstock—using early feedstock quality indicators such as moisture content (%), ash content (%), volatile matter (%), fixed carbon (%), and higher heating value (HHV, MJ/kg). Delivered energy cost is nonlinear with respect to these predictors (e.g., high moisture dramatically lowers net calorific output, thereby raising cost) and is subject to uncertainty from supply chain variability and price fluctuations. A single point‐estimate hides this risk, potentially leading to uneconomic bidding or supply shortages. By applying Bayesian Regression, we obtain:
1. A point estimate of delivered energy cost.
2. A credible interval quantifying our uncertainty—enabling risk‐aware feedstock contracting and budgeting.
Libraries Required
import pandas as pd # data loading & manipulation import numpy as np # numerical operations import matplotlib.pyplot as plt # plotting import seaborn as sns # visualization import pymc3 as pm # Bayesian modeling import arviz as az # posterior analysis from sklearn.model_selection import train_test_split from sklearn.preprocessing import StandardScaler from sklearn.metrics import mean_absolute_error
Dataset
Step-by-Step Code Implementation
Data Loading & Synthetic Cost Computation
Synthetic target (cost_per_MJ): We convert feedstock price ($100/tonne) and dry‐basis heating value into a delivered energy cost per MJ.
import pandas as pd
# Load biomass feedstock data
df = pd.read_csv("data/biomass-data/biomass.csv")
# Preview columns
# df.columns => ['SampleID','Moisture','Ash','VolatileMatter','FixedCarbon','HHV']
# Assume a base feedstock price of $100 per tonne
# Delivered energy cost ($/MJ) ≈ (feedstock_price_per_ton / (HHV * (1 - moisture/100))) * 1e3
feedstock_price = 100.0 # USD per tonne
df['dry_HHV'] = df['HHV'] * (1 - df['Moisture'] / 100) # MJ per kg of dry biomass
# Compute cost per MJ
# price per kg = price_per_ton / 1e3
df['cost_per_MJ'] = (feedstock_price / 1e3) / df['dry_HHV']
# Features & target
features = ['Moisture','Ash','VolatileMatter','FixedCarbon','HHV']
X = df[features].values
y = df['cost_per_MJ'].values # USD per MJ
Preprocessing & Train/Test Split
StandardScaler: Z‑scoring each feedstock feature ensures uniform priors on β and stable MCMC sampling.
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
# Random 80% train / 20% test split
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42
)
# Standardize features for stable MCMC
scaler = StandardScaler().fit(X_train)
X_train_s = scaler.transform(X_train)
X_test_s = scaler.transform(X_test)
Define & Fit Bayesian Regression Model
Priors:
- α centred on the empirical mean cost;
- β ∼ Normal(0, 0.5) allows moderate sensitivity per standardised feature;
- σ small HalfNormal to reflect relatively low noise in cost/MJ.
Model: cost_per_MJ ∼ Normal(α + β·X_std, σ).
Sampling: 2,000 posterior draws (plus 1,000 tuning) with target_accept=0.9 for robust convergence.
import pymc3 as pm
with pm.Model() as biomass_cost_model:
# Priors
α = pm.Normal("α", mu=y_train.mean(), sigma=1.0) # intercept around average cost
β = pm.Normal("β", mu=0, sigma=0.5, shape=X_train_s.shape[1]) # slopes for each feature
σ = pm.HalfNormal("σ", sigma=0.1) # noise scale
# Linear predictor for cost_per_MJ
μ = α + pm.math.dot(X_train_s, β)
# Likelihood
Y_obs = pm.Normal("Y_obs", mu=μ, sigma=σ, observed=y_train)
# MCMC sampling
trace = pm.sample(
draws=2000,
tune=1000,
target_accept=0.9,
return_inferencedata=True
)
Posterior Analysis & Point Predictions
- Posterior predictive: Generates full predictive distributions, from which we extract the posterior mean forecast and 94% Highest Posterior Density interval.
- Evaluation: MAE quantifies average error in USD/MJ on held‑out biomass samples.
import arviz as az
from sklearn.metrics import mean_absolute_error
# Summarize posterior distributions
az.summary(trace, round_to=2)
# Posterior predictive sampling
with biomass_cost_model:
ppc = pm.sample_posterior_predictive(trace, var_names=["Y_obs"])
# Posterior means
α_post = trace.posterior["α"].mean().item()
β_post = trace.posterior["β"].mean(dim=["chain","draw"]).values
# Point predictions on test set
y_pred = α_post + X_test_s.dot(β_post)
# Evaluate MAE in USD/MJ
mae = mean_absolute_error(y_test, y_pred)
print(f"Test MAE: ${mae:.4f} per MJ")
Visualise Predictions & Credible Intervals
Sweeping HHV (a key quality metric) while holding other feedstock properties constant reveals both the expected cost dependence and its uncertainty band.
import numpy as np
import matplotlib.pyplot as plt
# Sweep HHV while holding other features at their median
hhev_grid = np.linspace(X_train_s[:,4].min(), X_train_s[:,4].max(), 100)
grid = np.median(X_train_s, axis=0)[None,:].repeat(100, axis=0)
grid[:,4] = hhev_grid
with biomass_cost_model:
ppc_grid = pm.sample_posterior_predictive(
trace, var_names=["Y_obs"], samples=1000
)
preds = ppc_grid["Y_obs"]
mean_pred = preds.mean(axis=0)
hpd = az.hdi(preds, hdi_prob=0.94)
# Back‑transform HHV to original scale
hhv_orig = scaler.inverse_transform(grid)[:,4]
plt.figure(figsize=(8,5))
plt.plot(hhv_orig, mean_pred, label="Posterior mean")
plt.fill_between(hhv_orig, hpd[:,0], hpd[:,1], alpha=0.3,
label="94% credible interval")
plt.scatter(
scaler.inverse_transform(X_test_s)[:,4],
y_test, color="k", alpha=0.5, label="Test data"
)
plt.xlabel("Higher Heating Value (MJ/kg)")
plt.ylabel("Delivered Energy Cost (USD/MJ)")
plt.title("Bayesian Regression: Cost vs. HHV")
plt.legend()
plt.tight_layout()
plt.show()
Summary
This Bayesian Regression workflow for Biomass Energy Cost Prediction provides:
1. Point forecasts of delivered energy cost per MJ from early feedstock quality metrics.
2. Credible intervals capturing uncertainty from moisture variability and price assumptions.
3. Actionable insights: procurement teams can negotiate feedstock contracts with explicit cost‐risk bounds and optimise feedstock selection to minimise energy‐production cost under uncertainty.