Biomass Energy Cost Prediction using Bayesian Regression in ML

FREE Online Courses: Your Passport to Excellence - Start Now

Biomass‐to‐energy plant managers must forecast the delivered energy cost ($/MJ or $/kWh)—before contracting feedstock—using early feedstock quality indicators such as moisture content (%), ash content (%), volatile matter (%), fixed carbon (%), and higher heating value (HHV, MJ/kg). Delivered energy cost is nonlinear with respect to these predictors (e.g., high moisture dramatically lowers net calorific output, thereby raising cost) and is subject to uncertainty from supply chain variability and price fluctuations. A single point‐estimate hides this risk, potentially leading to uneconomic bidding or supply shortages. By applying Bayesian Regression, we obtain:

1. A point estimate of delivered energy cost.

2. A credible interval quantifying our uncertainty—enabling risk‐aware feedstock contracting and budgeting.

Libraries Required

import pandas as pd                              # data loading & manipulation  
import numpy as np                               # numerical operations  
import matplotlib.pyplot as plt                  # plotting  
import seaborn as sns                            # visualization  

import pymc3 as pm                               # Bayesian modeling  
import arviz as az                               # posterior analysis  

from sklearn.model_selection import train_test_split  
from sklearn.preprocessing import StandardScaler  
from sklearn.metrics import mean_absolute_error  

Dataset

Biomass Data

Step-by-Step Code Implementation

Data Loading & Synthetic Cost Computation

Synthetic target (cost_per_MJ): We convert feedstock price ($100/tonne) and dry‐basis heating value into a delivered energy cost per MJ.

import pandas as pd

# Load biomass feedstock data
df = pd.read_csv("data/biomass-data/biomass.csv")

# Preview columns
# df.columns  => ['SampleID','Moisture','Ash','VolatileMatter','FixedCarbon','HHV']

# Assume a base feedstock price of $100 per tonne
# Delivered energy cost ($/MJ) ≈ (feedstock_price_per_ton / (HHV * (1 - moisture/100))) * 1e3
feedstock_price = 100.0  # USD per tonne
df['dry_HHV'] = df['HHV'] * (1 - df['Moisture'] / 100)  # MJ per kg of dry biomass

# Compute cost per MJ
# price per kg = price_per_ton / 1e3
df['cost_per_MJ'] = (feedstock_price / 1e3) / df['dry_HHV']

# Features & target
features = ['Moisture','Ash','VolatileMatter','FixedCarbon','HHV']
X = df[features].values
y = df['cost_per_MJ'].values  # USD per MJ

Preprocessing & Train/Test Split

StandardScaler: Z‑scoring each feedstock feature ensures uniform priors on β and stable MCMC sampling.

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

# Random 80% train / 20% test split
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# Standardize features for stable MCMC
scaler = StandardScaler().fit(X_train)
X_train_s = scaler.transform(X_train)
X_test_s  = scaler.transform(X_test)

Define & Fit Bayesian Regression Model

Priors:

  • α centred on the empirical mean cost;
  • β ∼ Normal(0, 0.5) allows moderate sensitivity per standardised feature;
  • σ small HalfNormal to reflect relatively low noise in cost/MJ.

Model: cost_per_MJ ∼ Normal(α + β·X_std, σ).

Sampling: 2,000 posterior draws (plus 1,000 tuning) with target_accept=0.9 for robust convergence.

import pymc3 as pm

with pm.Model() as biomass_cost_model:
    # Priors
    α = pm.Normal("α", mu=y_train.mean(), sigma=1.0)                   # intercept around average cost
    β = pm.Normal("β", mu=0, sigma=0.5, shape=X_train_s.shape[1])      # slopes for each feature
    σ = pm.HalfNormal("σ", sigma=0.1)                                  # noise scale

    # Linear predictor for cost_per_MJ
    μ = α + pm.math.dot(X_train_s, β)

    # Likelihood
    Y_obs = pm.Normal("Y_obs", mu=μ, sigma=σ, observed=y_train)

    # MCMC sampling
    trace = pm.sample(
        draws=2000,
        tune=1000,
        target_accept=0.9,
        return_inferencedata=True
    )

Posterior Analysis & Point Predictions

  • Posterior predictive: Generates full predictive distributions, from which we extract the posterior mean forecast and 94% Highest Posterior Density interval.
  • Evaluation: MAE quantifies average error in USD/MJ on held‑out biomass samples.
import arviz as az
from sklearn.metrics import mean_absolute_error

# Summarize posterior distributions
az.summary(trace, round_to=2)

# Posterior predictive sampling
with biomass_cost_model:
    ppc = pm.sample_posterior_predictive(trace, var_names=["Y_obs"])

# Posterior means
α_post = trace.posterior["α"].mean().item()
β_post = trace.posterior["β"].mean(dim=["chain","draw"]).values

# Point predictions on test set
y_pred = α_post + X_test_s.dot(β_post)

# Evaluate MAE in USD/MJ
mae = mean_absolute_error(y_test, y_pred)
print(f"Test MAE: ${mae:.4f} per MJ")

Visualise Predictions & Credible Intervals

Sweeping HHV (a key quality metric) while holding other feedstock properties constant reveals both the expected cost dependence and its uncertainty band.

import numpy as np
import matplotlib.pyplot as plt

# Sweep HHV while holding other features at their median
hhev_grid = np.linspace(X_train_s[:,4].min(), X_train_s[:,4].max(), 100)
grid = np.median(X_train_s, axis=0)[None,:].repeat(100, axis=0)
grid[:,4] = hhev_grid

with biomass_cost_model:
    ppc_grid = pm.sample_posterior_predictive(
        trace, var_names=["Y_obs"], samples=1000
    )

preds     = ppc_grid["Y_obs"]
mean_pred = preds.mean(axis=0)
hpd       = az.hdi(preds, hdi_prob=0.94)

# Back‑transform HHV to original scale
hhv_orig = scaler.inverse_transform(grid)[:,4]

plt.figure(figsize=(8,5))
plt.plot(hhv_orig, mean_pred, label="Posterior mean")
plt.fill_between(hhv_orig, hpd[:,0], hpd[:,1], alpha=0.3,
                 label="94% credible interval")
plt.scatter(
    scaler.inverse_transform(X_test_s)[:,4],
    y_test, color="k", alpha=0.5, label="Test data"
)
plt.xlabel("Higher Heating Value (MJ/kg)")
plt.ylabel("Delivered Energy Cost (USD/MJ)")
plt.title("Bayesian Regression: Cost vs. HHV")
plt.legend()
plt.tight_layout()
plt.show()

Summary

This Bayesian Regression workflow for Biomass Energy Cost Prediction provides:

1. Point forecasts of delivered energy cost per MJ from early feedstock quality metrics.

2. Credible intervals capturing uncertainty from moisture variability and price assumptions.

3. Actionable insights: procurement teams can negotiate feedstock contracts with explicit cost‐risk bounds and optimise feedstock selection to minimise energy‐production cost under uncertainty.

Did you know we work 24x7 to provide you best tutorials
Please encourage us - write a review on Google | Facebook

ProjectGurukul Team

The ProjectGurukul Team delivers project-based tutorials on programming, machine learning, and web development. We simplify learning by providing hands-on projects to help you master real-world skills. We also provide free major and minor projects for enginering students.

Leave a Reply

Your email address will not be published. Required fields are marked *