Emission Cost Prediction using Bayesian Regression in ML

FREE Online Courses: Elevate Skills, Zero Cost. Enroll Now!

Environmental managers and fleet operators need to forecast the regulatory cost of greenhouse‑gas emissions—before budgeting for compliance—using vehicle or process features such as engine size, fuel type, vehicle weight, and annual mileage. Emission costs are computed as CO₂ emitted × carbon tax rate, so small changes in engine characteristics yield nonlinear cost effects (e.g. larger engines emit disproportionately more), and future costs carry uncertainty from fluctuating carbon prices. By applying Bayesian Regression, we not only predict a best estimate of emission cost per vehicle (or process) but also derive credible intervals that capture forecast uncertainty—enabling robust compliance budgeting and cap-and-trade strategy.

Libraries Required

import pandas as pd                              # data I/O  
import numpy as np                               # numerics  

import matplotlib.pyplot as plt                  # plotting  
import seaborn as sns                            # visualization  

import pymc3 as pm                               # Bayesian modeling  
import arviz as az                               # posterior analysis  

from sklearn.model_selection import train_test_split  
from sklearn.preprocessing import StandardScaler  
from sklearn.metrics import mean_absolute_error

Dataset

Vehicle Dataset

Step-by-Step Code Implementation

Data Loading & Cost Computation

Cost Computation: We convert vehicle CO₂ emissions (g/km) into tons per year using a standard mileage assumption, then multiply by the carbon tax rate to obtain the annual cost.

Data Prep:

Numeric features (Engine HP, Engine Cylinders) are standardised for stable sampling.
Categorical features (E.g., Fuel Type) are one‑hot encoded to capture discrete effects.

import pandas as pd

# Load raw vehicle data
df = pd.read_csv("data/Car_data.csv")

# Compute annual emission cost: CO2 emissions (g/km) × annual_km × tax_rate ($ per ton)
# Assume average 15,000 km/year and carbon tax of $50 per metric ton (1 ton = 1e6 g)
annual_km = 15000  
tax_rate  = 50.0      # USD per ton

# Convert g/km to grams per year, then to tons
df["CO2_per_year_tons"] = (df["CO2 Emission g/km"] * annual_km) / 1e6
df["emission_cost"]     = df["CO2_per_year_tons"] * tax_rate

# Select predictors and target
features = ["Engine HP", "Engine Cylinders", "Fuel Type", "Transmission", "Vehicle_Class"]
df = pd.get_dummies(df[features + ["emission_cost"]], drop_first=True)
X = df.drop(columns="emission_cost").values
y = df["emission_cost"].values

Train/Test Split & Scaling

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

# Split data
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# Standardize numeric columns (first two: HP and Cylinders)
scaler = StandardScaler().fit(X_train[:, :2])
X_train_s = X_train.copy()
X_train_s[:, :2] = scaler.transform(X_train[:, :2])
X_test_s  = X_test.copy()
X_test_s[:, :2] = scaler.transform(X_test[:, :2])

Define & Fit Bayesian Regression Model

Bayesian Model:

α (Intercept) and β (Coefficients) have weakly informative Normal priors.
σ (Residual noise) has a Half‑Normal before enforcing positivity.
Observed emission_cost is modelled as Normal(μ, σ).

MCMC Sampling: We draw 2,000 posterior samples (plus 1,000 burn-in) with target_accept=0.9 to ensure good convergence.

import pymc3 as pm

with pm.Model() as emission_cost_model:
    # Priors
    α = pm.Normal("α", mu=0, sigma=100)                                # baseline cost
    β = pm.Normal("β", mu=0, sigma=20, shape=X_train_s.shape[1])      # coefficients
    σ = pm.HalfNormal("σ", sigma=50)                                   # noise scale

    # Linear predictor
    μ = α + pm.math.dot(X_train_s, β)

    # Likelihood
    Y_obs = pm.Normal("Y_obs", mu=μ, sigma=σ, observed=y_train)

    # Sample posterior
    trace = pm.sample(
        draws=2000, tune=1000,
        target_accept=0.9,
        return_inferencedata=True
    )

Posterior Analysis & Predictions

Posterior Predictive: Sampling Y_obs yields a full distribution over emission‐cost forecasts, from which we extract the posterior mean and 94% credible intervals.
Evaluation: Mean Absolute Error (MAE) on held‑out data quantifies point‐forecast accuracy.

import arviz as az
from sklearn.metrics import mean_absolute_error

# Posterior summary
az.summary(trace, round_to=2)

# Posterior predictive sampling
with emission_cost_model:
    ppc = pm.sample_posterior_predictive(trace, var_names=["Y_obs"])

# Point predictions using posterior means
α_post = trace.posterior["α"].mean().item()
β_post = trace.posterior["β"].mean(dim=["chain","draw"]).values
y_pred = α_post + X_test_s.dot(β_post)

# Evaluate MAE
mae = mean_absolute_error(y_test, y_pred)
print(f"Test MAE: ${mae:.2f} per vehicle per year")

Visualise Credible Interval vs. Engine HP

Sweeping Engine HP while holding other features constant, we plot the posterior mean cost curve with its 94% credible band—illustrating both expected cost scaling and uncertainty.

import numpy as np
import matplotlib.pyplot as plt

# Sweep Engine HP from 50 to 400, hold others at median
hp_vals = np.linspace(50, 400, 100)
grid    = np.median(X_train_s, axis=0)[None,:].repeat(100, axis=0)
# back-transform hp_vals to standardized scale
hp_std   = (hp_vals - scaler.mean_[0]) / scaler.scale_[0]
grid[:,0] = hp_std

with emission_cost_model:
    ppc_grid = pm.sample_posterior_predictive(trace, var_names=["Y_obs"], samples=1000)

preds     = ppc_grid["Y_obs"]
mean_pred = preds.mean(axis=0)
hpd       = az.hdi(preds, hdi_prob=0.94)

# Plot
plt.figure(figsize=(8,5))
plt.plot(hp_vals, mean_pred, label="Posterior mean")
plt.fill_between(hp_vals, hpd[:,0], hpd[:,1], alpha=0.3, label="94% CI")
plt.xlabel("Engine HP")
plt.ylabel("Annual Emission Cost (USD)")
plt.title("Emission Cost vs. Engine HP")
plt.legend()
plt.tight_layout()
plt.show()

Summary

This Bayesian Regression pipeline for Emission Cost Prediction provides:

1. Point estimates of annual compliance cost per vehicle from early technical specs.

2. Credible intervals quantifying uncertainty from mileage assumptions and carbon‐price volatility.

3. Actionable insights: fleet managers can budget for carbon tax obligations using upper/lower bounds and evaluate technical trade-offs (e.g., downsizing engines) under uncertainty.

Your 15 seconds will encourage us to work even harder
Please share your happy experience on Google | Facebook

Emission Cost Prediction using Bayesian Regression in ML

Libraries Required

Dataset

Step-by-Step Code Implementation

Data Loading & Cost Computation

Train/Test Split & Scaling

Define & Fit Bayesian Regression Model