Festival Attendance Cost Prediction using Bayesian Regression in ML
FREE Online Courses: Click, Learn, Succeed, Start Now!
Festival organisers need to forecast total ticket revenue (as a proxy for attendance costs)—before setting final ticket tiers—using early event indicators such as ticket price, previous-event attendance, event duration, day of week, and weather index. Revenue (cost to the attendee) scales nonlinearly with ticket price (higher prices can dampen attendance) and with duration (longer festivals justify premium pricing). It is subject to weather or competing events. A simple point‐estimate model hides this uncertainty, risking poor pricing decisions. By applying Bayesian Regression, we produce:
1. A point estimate of total revenue (attendance × price).
2. A credible interval that quantifies forecast uncertainty—enabling risk‐aware pricing and resource planning.
Libraries Required
import pandas as pd # data loading & manipulation import numpy as np # numerical operations import matplotlib.pyplot as plt # plotting import seaborn as sns # visualization import pymc3 as pm # Bayesian modeling import arviz as az # posterior analysis from sklearn.model_selection import train_test_split from sklearn.preprocessing import StandardScaler from sklearn.metrics import mean_absolute_error
Dataset
Step-by-Step Code Implementation
Data Loading & Cost Computation
We z‑score each predictor so that the Normal priors on coefficients operate uniformly and sampling is stable.
import pandas as pd
# Load the event dataset
df = pd.read_csv("data/event_dataset.csv")
# Compute total revenue = ticket price × previous attendance
df['total_revenue'] = df['Ticket Price'] * df['Previous Attendance']
# Select features and target
features = [
'Ticket Price',
'Previous Attendance',
'Duration', # hours
'Day of Week', # 0=Mon…6=Sun
'Weather Index' # 0=poor…1=excellent
]
X = df[features].values
y = df['total_revenue'].values # USD per event
# Quick peek
df[features + ['total_revenue']].head()
Train/Test Split & Standardisation
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
# Chronological split or random 80/20 split
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42
)
# Standardize numeric features for stable MCMC
scaler = StandardScaler().fit(X_train)
X_train_s = scaler.transform(X_train)
X_test_s = scaler.transform(X_test)
Define & Fit Bayesian Regression Model
Priors:
- α ∼ Normal(0, 1e5) accommodates large‐scale revenues.
- β ∼ Normal(0, 1e4) reflects moderate uncertainty on each standardised slope.
- σ ∼ HalfNormal(1e4) enforces positive residual noise at revenue scale.
Model: Linear predictor μ = α + β·X_standardized.
Likelihood: total_revenue ∼ Normal(μ, σ).
Sampling: We draw 2,000 posterior samples after 1,000 tuning steps, targeting accept=0.9 for robust inference.
import pymc3 as pm
with pm.Model() as model:
# Priors
α = pm.Normal("α", mu=0, sigma=1e5) # intercept prior
β = pm.Normal("β", mu=0, sigma=1e4, shape=X_train_s.shape[1]) # slopes prior
σ = pm.HalfNormal("σ", sigma=1e4) # noise scale
# Expected revenue
μ = α + pm.math.dot(X_train_s, β)
# Likelihood
Y_obs = pm.Normal("Y_obs", mu=μ, sigma=σ, observed=y_train)
# MCMC sampling
trace = pm.sample(
draws=2000, # number of posterior samples
tune=1000, # burn‑in
target_accept=0.9,
return_inferencedata=True
)
Posterior Analysis & Point Predictions
Posterior predictive: Sampling Y_obs yields full predictive distributions; from these, we compute the posterior mean forecast and 94% Highest Posterior Density interval at new Ticket Price values.
Evaluation: Mean Absolute Error (MAE) on held‑out events quantifies point‐forecast accuracy.
import arviz as az
from sklearn.metrics import mean_absolute_error
# Summarize posterior distributions
az.summary(trace, round_to=2)
# Posterior predictive sampling
with model:
ppc = pm.sample_posterior_predictive(trace, var_names=["Y_obs"])
# Posterior means
α_post = trace.posterior["α"].mean().item()
β_post = trace.posterior["β"].mean(dim=["chain","draw"]).values
# Point predictions on test set
y_pred = α_post + X_test_s.dot(β_post)
# Evaluate accuracy
mae = mean_absolute_error(y_test, y_pred)
print(f"Test MAE: ${mae:.2f}")
Visualise Predictions & Credible Intervals
By sweeping Ticket Price and holding other features fixed, we plot the posterior mean of the revenue curve and its credible band—revealing both the expected revenue sensitivity and the associated uncertainty.
import numpy as np
import matplotlib.pyplot as plt
# Sweep Ticket Price; hold other features at median
price_grid = np.linspace(X_train_s[:,0].min(), X_train_s[:,0].max(), 100)
grid = np.median(X_train_s, axis=0)[None,:].repeat(100, axis=0)
grid[:,0] = price_grid
with model:
ppc_grid = pm.sample_posterior_predictive(
trace, var_names=["Y_obs"], samples=1000
)
preds = ppc_grid["Y_obs"]
mean_pred = preds.mean(axis=0)
hpd = az.hdi(preds, hdi_prob=0.94)
# Back‑transform Ticket Price
price_orig = scaler.inverse_transform(grid)[:,0]
plt.figure(figsize=(8,5))
plt.plot(price_orig, mean_pred, label="Posterior mean")
plt.fill_between(price_orig, hpd[:,0], hpd[:,1], alpha=0.3,
label="94% credible interval")
plt.scatter(
scaler.inverse_transform(X_test_s)[:,0], y_test,
color="k", alpha=0.5, label="Test data"
)
plt.xlabel("Ticket Price (USD)")
plt.ylabel("Total Revenue (USD)")
plt.title("Bayesian Regression: Revenue vs. Ticket Price")
plt.legend()
plt.tight_layout()
plt.show()
Summary
This Bayesian Regression workflow for Festival Attendance Cost Prediction provides:
1. Point estimates of total ticket revenue from early event indicators.
2. Credible intervals that quantify forecasting uncertainty—crucial for risk‐aware pricing.
3. Actionable insights: event planners can set ticket tiers, negotiate vendor contracts, and allocate marketing budgets with confidence bounds—optimising both revenue and attendee satisfaction.