Biomass Energy Cost Prediction using Bayesian Regression in ML

FREE Online Courses: Elevate Skills, Zero Cost. Enroll Now!

Biomass‐to‐energy plant managers must forecast the delivered energy cost ($/MJ or $/kWh)—before contracting feedstock—using early feedstock quality indicators such as moisture content (%), ash content (%), volatile matter (%), fixed carbon (%), and higher heating value (HHV, MJ/kg). Delivered energy cost is nonlinear with respect to these predictors (e.g., high moisture dramatically lowers net calorific output, thereby raising cost) and is subject to uncertainty from supply chain variability and price fluctuations. A single point‐estimate hides this risk, potentially leading to uneconomic bidding or supply shortages. By applying Bayesian Regression, we obtain:

1. A point estimate of delivered energy cost.

2. A credible interval quantifying our uncertainty—enabling risk‐aware feedstock contracting and budgeting.

Libraries Required

import pandas as pd                              # data loading & manipulation  
import numpy as np                               # numerical operations  
import matplotlib.pyplot as plt                  # plotting  
import seaborn as sns                            # visualization  

import pymc3 as pm                               # Bayesian modeling  
import arviz as az                               # posterior analysis  

from sklearn.model_selection import train_test_split  
from sklearn.preprocessing import StandardScaler  
from sklearn.metrics import mean_absolute_error

Dataset

Biomass Data

Step-by-Step Code Implementation

Data Loading & Synthetic Cost Computation

Synthetic target (cost_per_MJ): We convert feedstock price ($100/tonne) and dry‐basis heating value into a delivered energy cost per MJ.

import pandas as pd

# Load biomass feedstock data
df = pd.read_csv("data/biomass-data/biomass.csv")

# Preview columns
# df.columns  => ['SampleID','Moisture','Ash','VolatileMatter','FixedCarbon','HHV']

# Assume a base feedstock price of $100 per tonne
# Delivered energy cost ($/MJ) ≈ (feedstock_price_per_ton / (HHV * (1 - moisture/100))) * 1e3
feedstock_price = 100.0  # USD per tonne
df['dry_HHV'] = df['HHV'] * (1 - df['Moisture'] / 100)  # MJ per kg of dry biomass

# Compute cost per MJ
# price per kg = price_per_ton / 1e3
df['cost_per_MJ'] = (feedstock_price / 1e3) / df['dry_HHV']

# Features & target
features = ['Moisture','Ash','VolatileMatter','FixedCarbon','HHV']
X = df[features].values
y = df['cost_per_MJ'].values  # USD per MJ

Preprocessing & Train/Test Split

StandardScaler: Z‑scoring each feedstock feature ensures uniform priors on β and stable MCMC sampling.

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

# Random 80% train / 20% test split
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# Standardize features for stable MCMC
scaler = StandardScaler().fit(X_train)
X_train_s = scaler.transform(X_train)
X_test_s  = scaler.transform(X_test)

Define & Fit Bayesian Regression Model

Priors:

α centred on the empirical mean cost;
β ∼ Normal(0, 0.5) allows moderate sensitivity per standardised feature;
σ small HalfNormal to reflect relatively low noise in cost/MJ.

Model: cost_per_MJ ∼ Normal(α + β·X_std, σ).

Sampling: 2,000 posterior draws (plus 1,000 tuning) with target_accept=0.9 for robust convergence.

import pymc3 as pm

with pm.Model() as biomass_cost_model:
    # Priors
    α = pm.Normal("α", mu=y_train.mean(), sigma=1.0)                   # intercept around average cost
    β = pm.Normal("β", mu=0, sigma=0.5, shape=X_train_s.shape[1])      # slopes for each feature
    σ = pm.HalfNormal("σ", sigma=0.1)                                  # noise scale

    # Linear predictor for cost_per_MJ
    μ = α + pm.math.dot(X_train_s, β)

    # Likelihood
    Y_obs = pm.Normal("Y_obs", mu=μ, sigma=σ, observed=y_train)

    # MCMC sampling
    trace = pm.sample(
        draws=2000,
        tune=1000,
        target_accept=0.9,
        return_inferencedata=True
    )

Posterior Analysis & Point Predictions

Posterior predictive: Generates full predictive distributions, from which we extract the posterior mean forecast and 94% Highest Posterior Density interval.
Evaluation: MAE quantifies average error in USD/MJ on held‑out biomass samples.

import arviz as az
from sklearn.metrics import mean_absolute_error

# Summarize posterior distributions
az.summary(trace, round_to=2)

# Posterior predictive sampling
with biomass_cost_model:
    ppc = pm.sample_posterior_predictive(trace, var_names=["Y_obs"])

# Posterior means
α_post = trace.posterior["α"].mean().item()
β_post = trace.posterior["β"].mean(dim=["chain","draw"]).values

# Point predictions on test set
y_pred = α_post + X_test_s.dot(β_post)

# Evaluate MAE in USD/MJ
mae = mean_absolute_error(y_test, y_pred)
print(f"Test MAE: ${mae:.4f} per MJ")

Visualise Predictions & Credible Intervals

Sweeping HHV (a key quality metric) while holding other feedstock properties constant reveals both the expected cost dependence and its uncertainty band.

import numpy as np
import matplotlib.pyplot as plt

# Sweep HHV while holding other features at their median
hhev_grid = np.linspace(X_train_s[:,4].min(), X_train_s[:,4].max(), 100)
grid = np.median(X_train_s, axis=0)[None,:].repeat(100, axis=0)
grid[:,4] = hhev_grid

with biomass_cost_model:
    ppc_grid = pm.sample_posterior_predictive(
        trace, var_names=["Y_obs"], samples=1000
    )

preds     = ppc_grid["Y_obs"]
mean_pred = preds.mean(axis=0)
hpd       = az.hdi(preds, hdi_prob=0.94)

# Back‑transform HHV to original scale
hhv_orig = scaler.inverse_transform(grid)[:,4]

plt.figure(figsize=(8,5))
plt.plot(hhv_orig, mean_pred, label="Posterior mean")
plt.fill_between(hhv_orig, hpd[:,0], hpd[:,1], alpha=0.3,
                 label="94% credible interval")
plt.scatter(
    scaler.inverse_transform(X_test_s)[:,4],
    y_test, color="k", alpha=0.5, label="Test data"
)
plt.xlabel("Higher Heating Value (MJ/kg)")
plt.ylabel("Delivered Energy Cost (USD/MJ)")
plt.title("Bayesian Regression: Cost vs. HHV")
plt.legend()
plt.tight_layout()
plt.show()

Summary

This Bayesian Regression workflow for Biomass Energy Cost Prediction provides:

1. Point forecasts of delivered energy cost per MJ from early feedstock quality metrics.

2. Credible intervals capturing uncertainty from moisture variability and price assumptions.

3. Actionable insights: procurement teams can negotiate feedstock contracts with explicit cost‐risk bounds and optimise feedstock selection to minimise energy‐production cost under uncertainty.

You give me 15 seconds I promise you best tutorials
Please share your happy experience on Google | Facebook

Biomass Energy Cost Prediction using Bayesian Regression in ML

Libraries Required

Dataset

Step-by-Step Code Implementation

Data Loading & Synthetic Cost Computation

Preprocessing & Train/Test Split

Define & Fit Bayesian Regression Model

Posterior Analysis & Point Predictions

Visualise Predictions & Credible Intervals