Customer Engagement Cost Prediction using Bayesian Regression in ML
FREE Online Courses: Dive into Knowledge for Free. Learn More!
Marketing and customer‑success teams need to forecast the variable cost of engaging an individual customer—before launching a new campaign—using early engagement metrics such as page views, clicks, add‑to‑carts, time on site, and support tickets raised. Engagement‐cost per user grows nonlinearly (e.g., each additional support ticket consumes more agent time at a higher marginal cost, bulk email sends enjoy volume discounts) and carries uncertainty from varying channel rates and user behavior. A standard point estimate ignores this uncertainty, risking budget overruns. By applying Bayesian Regression, we derive both:
1. A point forecast of per‑user engagement cost.
2. A credible interval that quantifies our uncertainty—enabling risk‑aware budget planning and campaign sizing.
Libraries Required
import pandas as pd # data loading & manipulation import numpy as np # numerical operations import matplotlib.pyplot as plt # plotting import seaborn as sns # visualization import pymc3 as pm # Bayesian modeling import arviz as az # posterior analysis from sklearn.model_selection import train_test_split from sklearn.preprocessing import StandardScaler from sklearn.metrics import mean_absolute_error
Dataset
E‑commerce Customer Engagement
Step-by-Step Code Implementation
Data Loading & Synthetic Cost Computation
We assign per‑unit costs to each engagement metric (page view, click, add‑to‑cart, etc.) and sum them to form engagement_cost.
import pandas as pd
# Load the engagement data
df = pd.read_csv("data/e-commerce-customer-engagement.csv")
# Assume the dataset has columns:
# 'page_views', 'clicks', 'add_to_carts', 'purchases', 'session_time_min', 'support_tickets'
# Drop any rows with missing critical metrics
df = df[['page_views','clicks','add_to_carts','purchases','session_time_min','support_tickets']].dropna()
# Synthesize per-user engagement cost (USD):
# - $0.02 per page view
# - $0.10 per click
# - $0.50 per add to cart
# - $2.00 per purchase handled
# - $0.01 per minute of session time
# - $5.00 per support ticket
df['engagement_cost'] = (
df['page_views'] * 0.02
+ df['clicks'] * 0.10
+ df['add_to_carts'] * 0.50
+ df['purchases'] * 2.00
+ df['session_time_min'] * 0.01
+ df['support_tickets'] * 5.00
)
# Features matrix X and target y
X = df[['page_views','clicks','add_to_carts','purchases','session_time_min','support_tickets']].values
y = df['engagement_cost'].values
Preprocessing & Train/Test Split
- We use an 80/20 train/test split.
- We z‑score each feature so that the Normal(0, 1) priors on β apply uniformly and the sampler converges reliably.
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
# Split randomly: 80% train, 20% test
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42
)
# Standardize features for stable MCMC sampling
scaler = StandardScaler().fit(X_train)
X_train_s = scaler.transform(X_train)
X_test_s = scaler.transform(X_test)
Define & Fit Bayesian Regression Model
- α ∼ Normal(mean = empirical mean cost, σ = 2×empirical std)
- β ∼ Normal(0, 1) for each standardized predictor.
- σ ∼ HalfNormal(empirical std) for residual variation.
Likelihood: engagement_cost ∼ Normal(α + β·X_std, σ).
Inference: We draw 2,000 posterior samples after 1,000 tuning steps, using target_accept=0.9 for stable convergence.
import pymc3 as pm
with pm.Model() as engagement_cost_model:
# Priors
α = pm.Normal("α", mu=y_train.mean(), sigma=y_train.std()*2)
β = pm.Normal("β", mu=0, sigma=1, shape=X_train_s.shape[1])
σ = pm.HalfNormal("σ", sigma=y_train.std())
# Linear predictor: expected engagement cost
μ = α + pm.math.dot(X_train_s, β)
# Likelihood
Y_obs = pm.Normal("Y_obs", mu=μ, sigma=σ, observed=y_train)
# Sample posterior
trace = pm.sample(
draws=2000,
tune=1000,
target_accept=0.9,
return_inferencedata=True
)
Posterior Analysis & Point Predictions
Posterior Predictive: Sampling Y_obs yields full predictive distributions; from these, we extract the posterior mean forecast and 94% Highest Posterior Density intervals.
Evaluation: Mean Absolute Error on the held‑out test set quantifies the average error per user in USD.
import arviz as az
from sklearn.metrics import mean_absolute_error
# Summarize posterior distributions
az.summary(trace, round_to=2)
# Posterior predictive sampling
with engagement_cost_model:
ppc = pm.sample_posterior_predictive(trace, var_names=["Y_obs"])
# Extract posterior means
α_post = trace.posterior["α"].mean().item()
β_post = trace.posterior["β"].mean(dim=["chain","draw"]).values
# Point predictions on test set
y_pred = α_post + X_test_s.dot(β_post)
# Evaluate mean absolute error
mae = mean_absolute_error(y_test, y_pred)
print(f"Test MAE: ${mae:.2f} per user")
Visualise Predictions & Credible Intervals
By sweeping one key metric (clicks) and holding others fixed, we plot the expected cost curve with its credible band—illustrating both the marginal cost sensitivity and the uncertainty around our estimate.
import numpy as np
import matplotlib.pyplot as plt
# Sweep 'clicks' while holding other metrics at their median
grid = np.median(X_train_s, axis=0)[None,:].repeat(100, axis=0)
clicks_vals = np.linspace(X_train_s[:,1].min(), X_train_s[:,1].max(), 100)
grid[:,1] = clicks_vals
with engagement_cost_model:
ppc_grid = pm.sample_posterior_predictive(trace, var_names=["Y_obs"], samples=1000)
preds = ppc_grid["Y_obs"]
mean_pred = preds.mean(axis=0)
hpd = az.hdi(preds, hdi_prob=0.94)
# Back‑transform 'clicks' to original scale
clicks_orig = scaler.inverse_transform(grid)[:,1]
plt.figure(figsize=(8,5))
plt.plot(clicks_orig, mean_pred, label="Posterior mean")
plt.fill_between(clicks_orig, hpd[:,0], hpd[:,1], alpha=0.3,
label="94% credible interval")
plt.scatter(
scaler.inverse_transform(X_test_s)[:,1],
y_test, color="k", alpha=0.5, label="Test data"
)
plt.xlabel("Number of Clicks")
plt.ylabel("Engagement Cost (USD)")
plt.title("Bayesian Regression: Cost vs. Clicks")
plt.legend()
plt.tight_layout()
plt.show()
Summary
This Bayesian Regression workflow for Customer Engagement Cost Prediction provides:
- Point estimates of per‑user engagement cost from early engagement metrics.
- Credible intervals that quantify uncertainty from behavioural variability and channel–rate fluctuations.
- Actionable insights: marketing teams can allocate budgets and set campaign targets with explicit cost‑risk bounds—optimising both spend efficiency and customer experience.