Consumer Purchase Cost Prediction using Bayesian Regression in ML

FREE Online Courses: Elevate Your Skills, Zero Cost Attached - Enroll Now!

Retail analysts and pricing teams need to predict a consumer’s purchase amount—before transaction completion—using early inputs such as customer demographics (age, gender), historical spending, promotional status, and basket size. Purchase amounts exhibit nonlinear effects (e.g., diminishing returns on discount depth, threshold effects of basket variety) and uncertainty due to individual behaviour variability. A classic point‐estimate regression masks this uncertainty, risking mis‐targeted promotions or inventory misallocation. By applying Bayesian Regression in ML, we obtain both a point estimate of purchase cost and credible intervals that quantify our uncertainty—enabling risk‐aware pricing and personalised offer strategies.

Libraries Required

import pandas as pd                              # data loading & manipulation  
import numpy as np                               # numerical operations  

import matplotlib.pyplot as plt                  # plotting  
import seaborn as sns                            # enhanced visualization  

import pymc3 as pm                               # Bayesian modeling  
import arviz as az                               # posterior analysis  

from sklearn.model_selection import train_test_split  
from sklearn.preprocessing import StandardScaler  
from sklearn.metrics import mean_absolute_error

Dataset

Customer Purchase Data

Step-by-Step Code Implementation

Import Libraries & Load Data

import pandas as pd

# Load dataset
df = pd.read_csv("data/customer-purchase-data/CustomerPurchaseData.csv")

# Preview relevant columns
df.head()[[
    'Age','Gender','Annual_Income','Num_Purchases','Purchase_Amount'
]]

Preprocessing & Train/Test Split

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

# Encode gender
df['Gender_Code'] = df['Gender'].map({'Male':0,'Female':1})

# Define features and target
X = df[['Age','Gender_Code','Annual_Income','Num_Purchases']].values
y = df['Purchase_Amount'].values  # in USD

# Split (80% train / 20% test)
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# Standardize numeric features for MCMC stability
scaler = StandardScaler().fit(X_train)
X_train_s = scaler.transform(X_train)
X_test_s  = scaler.transform(X_test)

Define & Fit Bayesian Regression Model

Priors:

α ∼ Normal(0, 100): broad intercept prior.
βᵢ ∼ Normal(0, 50): moderate prior uncertainty for each standardised predictor.
σ ∼ HalfNormal(50): residual noise scale.

Model:

The linear predictor μ = α + β·X_standardized links demographics and behaviour to the purchase amount.
Observations y_train ∼ Normal(μ, σ).

Sampling:

We draw 2,000 posterior samples (post-burn-in of 1,000) with target_accept=0.9 to ensure robust convergence.
Posterior predictive sampling yields complete predictive distributions.

import pymc3 as pm

with pm.Model() as purchase_model:
    # Priors
    α = pm.Normal("α", mu=0, sigma=100)
    β = pm.Normal("β", mu=0, sigma=50, shape=X_train_s.shape[1])
    σ = pm.HalfNormal("σ", sigma=50)
    
    # Linear predictor
    μ = α + pm.math.dot(X_train_s, β)
    
    # Likelihood
    Y_obs = pm.Normal("Y_obs", mu=μ, sigma=σ, observed=y_train)
    
    # Sample the posterior
    trace = pm.sample(
        draws=2000, tune=1000,
        target_accept=0.9,
        return_inferencedata=True
    )

Posterior Analysis & Point Predictions

Posterior means of α and β give point forecasts; MAE on held‑out data quantifies average error.

import arviz as az
from sklearn.metrics import mean_absolute_error

# Summarize posterior
az.summary(trace, round_to=2)

# Posterior predictive sampling
with purchase_model:
    ppc = pm.sample_posterior_predictive(trace, var_names=["Y_obs"])

# Posterior means
α_post = trace.posterior["α"].mean().item()
β_post = trace.posterior["β"].mean(dim=["chain","draw"]).values

# Point predictions on test set
y_pred = α_post + X_test_s.dot(β_post)

# Evaluate MAE
mae = mean_absolute_error(y_test, y_pred)
print(f"Test MAE: ${mae:.2f}")

Visualise Predictions & Credible Intervals

Sweeping one feature (number of purchases) while holding others fixed, we plot the posterior mean curve and 94% credible bands—illustrating both central tendency and uncertainty.

# Vary Num_Purchases; fix others at median
import numpy as np
import matplotlib.pyplot as plt

num_grid = np.linspace(X_train_s[:,3].min(), X_train_s[:,3].max(), 100)
grid = np.tile(np.median(X_train_s, axis=0), (100,1))
grid[:,3] = num_grid

with purchase_model:
    pm.set_data({"X": grid})
    ppc_grid = pm.sample_posterior_predictive(trace, var_names=["Y_obs"])

preds     = ppc_grid["Y_obs"]
mean_pred = preds.mean(axis=0)
hpd       = az.hdi(preds, hdi_prob=0.94)

# Convert Num_Purchases back
num_orig = scaler.inverse_transform(
    np.column_stack([grid[:,0],grid[:,1],grid[:,2],grid[:,3]])
)[:,3]

plt.figure(figsize=(8,5))
plt.plot(num_orig, mean_pred, label="Posterior mean")
plt.fill_between(num_orig, hpd[:,0], hpd[:,1], alpha=0.3,
                 label="94% CI")
plt.scatter(scaler.inverse_transform(X_test_s)[:,3], y_test,
            color="k", alpha=0.5, label="Test data")
plt.xlabel("Number of Purchases")
plt.ylabel("Purchase Amount (USD)")
plt.title("Bayesian Regression: Purchase Amount vs. Num_Purchases")
plt.legend()
plt.show()

Summary

This Bayesian Regression framework for consumer purchase‐amount forecasting provides:

1. Point estimates of expected spend from early customer indicators.

2. Credible intervals quantifying uncertainty from behavioural variability.

3. Actionable insights: marketing and sales teams can use both expected purchase amounts and uncertainty bounds to tailor promotions, adjust loyalty rewards, and optimise inventory under uncertainty.

We work very hard to provide you quality material
Could you take 15 seconds and share your happy experience on Google | Facebook

Consumer Purchase Cost Prediction using Bayesian Regression in ML

Libraries Required

Dataset

Step-by-Step Code Implementation

Import Libraries & Load Data

Preprocessing & Train/Test Split