Customer Support Cost Prediction using Bayesian Regression in ML

FREE Online Courses: Click for Success, Learn for Free - Start Now!

Customer‑support leaders need to forecast the resolution cost per support ticket—before staffing shifts and budget approvals—using early‐ticket features such as priority level, issue category, time to first response, customer tenure, and number of past tickets. Handling cost per ticket grows nonlinearly with complexity (critical tickets often require senior engineers) and carries uncertainty from ticket reopens and variable labour rates. A simple point estimate hides this risk. By applying Bayesian Regression, we obtain both:

1. A point forecast of expected resolution cost per ticket.

2. A credible interval quantifying uncertainty—enabling data‑driven staffing and budget planning.

Libraries Required

import pandas as pd                              # data loading & manipulation  
import numpy as np                               # numerical operations  

import matplotlib.pyplot as plt                  # plotting  
import seaborn as sns                            # visualization  

import pymc3 as pm                               # Bayesian modeling  
import arviz as az                               # posterior analysis  

from sklearn.model_selection import train_test_split  
from sklearn.preprocessing import StandardScaler  
from sklearn.metrics import mean_absolute_error

Dataset

Customer Support Ticket Dataset

Step-by-Step Code Implementation

Data Loading & Preprocessing

One‑hot encode priority and category.
Standardise numeric features (time_to_first_response, customer_tenure_days, past_tickets), so priors on coefficients converge well.

import pandas as pd

# Load ticket data
df = pd.read_csv("data/customer_support_tickets.csv")

# Select relevant fields and drop missing
df = df[['priority','category','time_to_first_response',
         'customer_tenure_days','past_tickets','resolution_cost']].dropna()

# One‑hot encode categorical features
df = pd.get_dummies(df, columns=['priority','category'], drop_first=True)

# Split into predictors and target
X = df.drop(columns='resolution_cost').values
y = df['resolution_cost'].values

# Train/test split (80% train / 20% test)
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# Standardize numeric columns for stable sampling
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler().fit(X_train[:, :3])  # first three cols numeric
X_train_s = X_train.copy()
X_train_s[:, :3] = scaler.transform(X_train[:, :3])
X_test_s  = X_test.copy()
X_test_s[:, :3]  = scaler.transform(X_test[:, :3])

Define & Fit Bayesian Regression Model

Model:

α (intercept) ∼ Normal(0, 100).
β (slopes) ∼ Normal(0, 50) for each predictor.
σ (noise) ∼ HalfNormal(50).

Likelihood: resolution_cost ∼ Normal(α + β·X_std, σ).

Inference: 2,000 posterior draws (after 1,000 tuning) with target_accept=0.9 ensure robust sampling.

import pymc3 as pm

with pm.Model() as support_cost_model:
    # Priors
    α = pm.Normal("α", mu=0, sigma=100)                          # intercept
    β = pm.Normal("β", mu=0, sigma=50, shape=X_train_s.shape[1]) # slopes
    σ = pm.HalfNormal("σ", sigma=50)                             # residual noise

    # Linear predictor
    μ = α + pm.math.dot(X_train_s, β)

    # Likelihood: observed resolution_cost
    Y_obs = pm.Normal("Y_obs", mu=μ, sigma=σ, observed=y_train)

    # MCMC sampling
    trace = pm.sample(
        draws=2000,       # posterior draws
        tune=1000,        # burn‑in
        target_accept=0.9,
        return_inferencedata=True
    )

Posterior Analysis & Point Predictions

Posterior Predictive: Sampling Y_obs yields full predictive distributions; extract the posterior mean and 94% Highest Posterior Density intervals.
Evaluation: MAE on held‑out tickets quantifies average forecasting error.

import arviz as az
from sklearn.metrics import mean_absolute_error

# Summarize posterior
az.summary(trace, round_to=2)

# Posterior predictive sampling
with support_cost_model:
    ppc = pm.sample_posterior_predictive(trace, var_names=["Y_obs"])

# Extract posterior means
α_post = trace.posterior["α"].mean().item()
β_post = trace.posterior["β"].mean(dim=["chain","draw"]).values

# Compute point predictions on test set
y_pred = α_post + X_test_s.dot(β_post)

# Evaluate MAE
mae = mean_absolute_error(y_test, y_pred)
print(f"Test MAE: ${mae:.2f} per ticket")

Visualise Predictions & Credible Intervals

Sweeping time_to_first_response and holding other features fixed, the plot shows how faster response reduces cost—and how uncertain that relationship is.

import numpy as np
import matplotlib.pyplot as plt

# Sweep time_to_first_response; hold others at median
grid_times = np.linspace(X_train_s[:,0].min(), X_train_s[:,0].max(), 100)
grid = np.median(X_train_s, axis=0)[None,:].repeat(100, axis=0)
grid[:,0] = grid_times

with support_cost_model:
    ppc_grid = pm.sample_posterior_predictive(trace, var_names=["Y_obs"], samples=1000)

preds     = ppc_grid["Y_obs"]
mean_pred = preds.mean(axis=0)
hpd       = az.hdi(preds, hdi_prob=0.94)

# Back‑transform time_to_first_response
times_orig = scaler.inverse_transform(grid)[:,0]

plt.figure(figsize=(8,5))
plt.plot(times_orig, mean_pred, label="Posterior mean")
plt.fill_between(times_orig, hpd[:,0], hpd[:,1], alpha=0.3,
                 label="94% credible interval")
plt.scatter(
    scaler.inverse_transform(X_test_s)[:,0],
    y_test, color="k", alpha=0.5, label="Test data"
)
plt.xlabel("Time to First Response (hours)")
plt.ylabel("Resolution Cost (USD)")
plt.title("Bayesian Regression: Cost vs. First Response Time")
plt.legend()
plt.tight_layout()
plt.show()

Summary

This Bayesian Regression framework for Customer Support Cost Prediction provides:

1. Accurate point forecasts of ticket resolution cost from early‐ticket metrics.

2. Credible intervals quantifying uncertainty from operational variability.

3. Actionable insights: support leaders can staff teams, set SLAs, and budget with confidence bounds—optimising both cost efficiency and service quality.

We work very hard to provide you quality material
Could you take 15 seconds and share your happy experience on Google | Facebook

Customer Support Cost Prediction using Bayesian Regression in ML

Libraries Required

Dataset

Step-by-Step Code Implementation

Data Loading & Preprocessing

Define & Fit Bayesian Regression Model

Posterior Analysis & Point Predictions

Visualise Predictions & Credible Intervals