Customer Engagement Cost Prediction using Bayesian Regression in ML

FREE Online Courses: Dive into Knowledge for Free. Learn More!

Marketing and customer‑success teams need to forecast the variable cost of engaging an individual customer—before launching a new campaign—using early engagement metrics such as page views, clicks, add‑to‑carts, time on site, and support tickets raised. Engagement‐cost per user grows nonlinearly (e.g., each additional support ticket consumes more agent time at a higher marginal cost, bulk email sends enjoy volume discounts) and carries uncertainty from varying channel rates and user behavior. A standard point estimate ignores this uncertainty, risking budget overruns. By applying Bayesian Regression, we derive both:

1. A point forecast of per‑user engagement cost.

2. A credible interval that quantifies our uncertainty—enabling risk‑aware budget planning and campaign sizing.

Libraries Required

import pandas as pd                              # data loading & manipulation  
import numpy as np                               # numerical operations  

import matplotlib.pyplot as plt                  # plotting  
import seaborn as sns                            # visualization  

import pymc3 as pm                               # Bayesian modeling  
import arviz as az                               # posterior analysis  

from sklearn.model_selection import train_test_split  
from sklearn.preprocessing import StandardScaler  
from sklearn.metrics import mean_absolute_error  

Dataset

E‑commerce Customer Engagement

Step-by-Step Code Implementation

Data Loading & Synthetic Cost Computation

We assign per‑unit costs to each engagement metric (page view, click, add‑to‑cart, etc.) and sum them to form engagement_cost.

import pandas as pd

# Load the engagement data
df = pd.read_csv("data/e-commerce-customer-engagement.csv")

# Assume the dataset has columns:
#   'page_views', 'clicks', 'add_to_carts', 'purchases', 'session_time_min', 'support_tickets'
# Drop any rows with missing critical metrics
df = df[['page_views','clicks','add_to_carts','purchases','session_time_min','support_tickets']].dropna()

# Synthesize per-user engagement cost (USD):
#   - $0.02 per page view
#   - $0.10 per click
#   - $0.50 per add to cart
#   - $2.00 per purchase handled
#   - $0.01 per minute of session time
#   - $5.00 per support ticket
df['engagement_cost'] = (
      df['page_views'] * 0.02
    + df['clicks'] * 0.10
    + df['add_to_carts'] * 0.50
    + df['purchases'] * 2.00
    + df['session_time_min'] * 0.01
    + df['support_tickets'] * 5.00
)

# Features matrix X and target y
X = df[['page_views','clicks','add_to_carts','purchases','session_time_min','support_tickets']].values
y = df['engagement_cost'].values

Preprocessing & Train/Test Split

  • We use an 80/20 train/test split.
  • We z‑score each feature so that the Normal(0, 1) priors on β apply uniformly and the sampler converges reliably.
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

# Split randomly: 80% train, 20% test
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# Standardize features for stable MCMC sampling
scaler = StandardScaler().fit(X_train)
X_train_s = scaler.transform(X_train)
X_test_s  = scaler.transform(X_test)

Define & Fit Bayesian Regression Model

  • α ∼ Normal(mean = empirical mean cost, σ = 2×empirical std)
  • β ∼ Normal(0, 1) for each standardized predictor.
  • σ ∼ HalfNormal(empirical std) for residual variation.

Likelihood: engagement_cost ∼ Normal(α + β·X_std, σ).

Inference: We draw 2,000 posterior samples after 1,000 tuning steps, using target_accept=0.9 for stable convergence.

import pymc3 as pm

with pm.Model() as engagement_cost_model:
    # Priors
    α = pm.Normal("α", mu=y_train.mean(), sigma=y_train.std()*2)  
    β = pm.Normal("β", mu=0, sigma=1, shape=X_train_s.shape[1])  
    σ = pm.HalfNormal("σ", sigma=y_train.std())

    # Linear predictor: expected engagement cost
    μ = α + pm.math.dot(X_train_s, β)

    # Likelihood
    Y_obs = pm.Normal("Y_obs", mu=μ, sigma=σ, observed=y_train)

    # Sample posterior
    trace = pm.sample(
        draws=2000,
        tune=1000,
        target_accept=0.9,
        return_inferencedata=True
    )

Posterior Analysis & Point Predictions

Posterior Predictive: Sampling Y_obs yields full predictive distributions; from these, we extract the posterior mean forecast and 94% Highest Posterior Density intervals.

Evaluation: Mean Absolute Error on the held‑out test set quantifies the average error per user in USD.

import arviz as az
from sklearn.metrics import mean_absolute_error

# Summarize posterior distributions
az.summary(trace, round_to=2)

# Posterior predictive sampling
with engagement_cost_model:
    ppc = pm.sample_posterior_predictive(trace, var_names=["Y_obs"])

# Extract posterior means
α_post = trace.posterior["α"].mean().item()
β_post = trace.posterior["β"].mean(dim=["chain","draw"]).values

# Point predictions on test set
y_pred = α_post + X_test_s.dot(β_post)

# Evaluate mean absolute error
mae = mean_absolute_error(y_test, y_pred)
print(f"Test MAE: ${mae:.2f} per user")

Visualise Predictions & Credible Intervals

By sweeping one key metric (clicks) and holding others fixed, we plot the expected cost curve with its credible band—illustrating both the marginal cost sensitivity and the uncertainty around our estimate.

import numpy as np
import matplotlib.pyplot as plt

# Sweep 'clicks' while holding other metrics at their median
grid = np.median(X_train_s, axis=0)[None,:].repeat(100, axis=0)
clicks_vals = np.linspace(X_train_s[:,1].min(), X_train_s[:,1].max(), 100)
grid[:,1] = clicks_vals

with engagement_cost_model:
    ppc_grid = pm.sample_posterior_predictive(trace, var_names=["Y_obs"], samples=1000)

preds     = ppc_grid["Y_obs"]
mean_pred = preds.mean(axis=0)
hpd       = az.hdi(preds, hdi_prob=0.94)

# Back‑transform 'clicks' to original scale
clicks_orig = scaler.inverse_transform(grid)[:,1]

plt.figure(figsize=(8,5))
plt.plot(clicks_orig, mean_pred, label="Posterior mean")
plt.fill_between(clicks_orig, hpd[:,0], hpd[:,1], alpha=0.3,
                 label="94% credible interval")
plt.scatter(
    scaler.inverse_transform(X_test_s)[:,1],
    y_test, color="k", alpha=0.5, label="Test data"
)
plt.xlabel("Number of Clicks")
plt.ylabel("Engagement Cost (USD)")
plt.title("Bayesian Regression: Cost vs. Clicks")
plt.legend()
plt.tight_layout()
plt.show()

Summary

This Bayesian Regression workflow for Customer Engagement Cost Prediction provides:

  • Point estimates of per‑user engagement cost from early engagement metrics.
  • Credible intervals that quantify uncertainty from behavioural variability and channel–rate fluctuations.
  • Actionable insights: marketing teams can allocate budgets and set campaign targets with explicit cost‑risk bounds—optimising both spend efficiency and customer experience.

If you are Happy with ProjectGurukul, do not forget to make us happy with your positive feedback on Google | Facebook

ProjectGurukul Team

The ProjectGurukul Team delivers project-based tutorials on programming, machine learning, and web development. We simplify learning by providing hands-on projects to help you master real-world skills. We also provide free major and minor projects for enginering students.

Leave a Reply

Your email address will not be published. Required fields are marked *