Festival Attendance Cost Prediction using Bayesian Regression in ML

FREE Online Courses: Your Passport to Excellence - Start Now

Festival organisers need to forecast total ticket revenue (as a proxy for attendance costs)—before setting final ticket tiers—using early event indicators such as ticket price, previous-event attendance, event duration, day of week, and weather index. Revenue (cost to the attendee) scales nonlinearly with ticket price (higher prices can dampen attendance) and with duration (longer festivals justify premium pricing). It is subject to weather or competing events. A simple point‐estimate model hides this uncertainty, risking poor pricing decisions. By applying Bayesian Regression, we produce:

1. A point estimate of total revenue (attendance × price).

2. A credible interval that quantifies forecast uncertainty—enabling risk‐aware pricing and resource planning.

Libraries Required

import pandas as pd                              # data loading & manipulation  
import numpy as np                               # numerical operations  

import matplotlib.pyplot as plt                  # plotting  
import seaborn as sns                            # visualization  

import pymc3 as pm                               # Bayesian modeling  
import arviz as az                               # posterior analysis  

from sklearn.model_selection import train_test_split  
from sklearn.preprocessing import StandardScaler  
from sklearn.metrics import mean_absolute_error

Dataset

Event Attendance Dataset

Step-by-Step Code Implementation

Data Loading & Cost Computation

We z‑score each predictor so that the Normal priors on coefficients operate uniformly and sampling is stable.

import pandas as pd

# Load the event dataset
df = pd.read_csv("data/event_dataset.csv")

# Compute total revenue = ticket price × previous attendance
df['total_revenue'] = df['Ticket Price'] * df['Previous Attendance']

# Select features and target
features = [
    'Ticket Price',
    'Previous Attendance',
    'Duration',           # hours
    'Day of Week',        # 0=Mon…6=Sun
    'Weather Index'       # 0=poor…1=excellent
]
X = df[features].values
y = df['total_revenue'].values  # USD per event

# Quick peek
df[features + ['total_revenue']].head()

Train/Test Split & Standardisation

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

# Chronological split or random 80/20 split
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# Standardize numeric features for stable MCMC
scaler = StandardScaler().fit(X_train)
X_train_s = scaler.transform(X_train)
X_test_s  = scaler.transform(X_test)

Define & Fit Bayesian Regression Model

Priors:

α ∼ Normal(0, 1e5) accommodates large‐scale revenues.
β ∼ Normal(0, 1e4) reflects moderate uncertainty on each standardised slope.
σ ∼ HalfNormal(1e4) enforces positive residual noise at revenue scale.

Model: Linear predictor μ = α + β·X_standardized.

Likelihood: total_revenue ∼ Normal(μ, σ).

Sampling: We draw 2,000 posterior samples after 1,000 tuning steps, targeting accept=0.9 for robust inference.

import pymc3 as pm

with pm.Model() as model:
    # Priors
    α = pm.Normal("α", mu=0, sigma=1e5)                                 # intercept prior
    β = pm.Normal("β", mu=0, sigma=1e4, shape=X_train_s.shape[1])       # slopes prior
    σ = pm.HalfNormal("σ", sigma=1e4)                                   # noise scale

    # Expected revenue
    μ = α + pm.math.dot(X_train_s, β)

    # Likelihood
    Y_obs = pm.Normal("Y_obs", mu=μ, sigma=σ, observed=y_train)

    # MCMC sampling
    trace = pm.sample(
        draws=2000,        # number of posterior samples
        tune=1000,         # burn‑in
        target_accept=0.9,
        return_inferencedata=True
    )

Posterior Analysis & Point Predictions

Posterior predictive: Sampling Y_obs yields full predictive distributions; from these, we compute the posterior mean forecast and 94% Highest Posterior Density interval at new Ticket Price values.

Evaluation: Mean Absolute Error (MAE) on held‑out events quantifies point‐forecast accuracy.

import arviz as az
from sklearn.metrics import mean_absolute_error

# Summarize posterior distributions
az.summary(trace, round_to=2)

# Posterior predictive sampling
with model:
    ppc = pm.sample_posterior_predictive(trace, var_names=["Y_obs"])

# Posterior means
α_post = trace.posterior["α"].mean().item()
β_post = trace.posterior["β"].mean(dim=["chain","draw"]).values

# Point predictions on test set
y_pred = α_post + X_test_s.dot(β_post)

# Evaluate accuracy
mae = mean_absolute_error(y_test, y_pred)
print(f"Test MAE: ${mae:.2f}")

Visualise Predictions & Credible Intervals

By sweeping Ticket Price and holding other features fixed, we plot the posterior mean of the revenue curve and its credible band—revealing both the expected revenue sensitivity and the associated uncertainty.

import numpy as np
import matplotlib.pyplot as plt

# Sweep Ticket Price; hold other features at median
price_grid = np.linspace(X_train_s[:,0].min(), X_train_s[:,0].max(), 100)
grid = np.median(X_train_s, axis=0)[None,:].repeat(100, axis=0)
grid[:,0] = price_grid

with model:
    ppc_grid = pm.sample_posterior_predictive(
        trace, var_names=["Y_obs"], samples=1000
    )

preds     = ppc_grid["Y_obs"]
mean_pred = preds.mean(axis=0)
hpd       = az.hdi(preds, hdi_prob=0.94)

# Back‑transform Ticket Price
price_orig = scaler.inverse_transform(grid)[:,0]

plt.figure(figsize=(8,5))
plt.plot(price_orig, mean_pred, label="Posterior mean")
plt.fill_between(price_orig, hpd[:,0], hpd[:,1], alpha=0.3,
                 label="94% credible interval")
plt.scatter(
    scaler.inverse_transform(X_test_s)[:,0], y_test,
    color="k", alpha=0.5, label="Test data"
)
plt.xlabel("Ticket Price (USD)")
plt.ylabel("Total Revenue (USD)")
plt.title("Bayesian Regression: Revenue vs. Ticket Price")
plt.legend()
plt.tight_layout()
plt.show()

Summary

This Bayesian Regression workflow for Festival Attendance Cost Prediction provides:

1. Point estimates of total ticket revenue from early event indicators.

2. Credible intervals that quantify forecasting uncertainty—crucial for risk‐aware pricing.

3. Actionable insights: event planners can set ticket tiers, negotiate vendor contracts, and allocate marketing budgets with confidence bounds—optimising both revenue and attendee satisfaction.

You give me 15 seconds I promise you best tutorials
Please share your happy experience on Google | Facebook

Festival Attendance Cost Prediction using Bayesian Regression in ML

Libraries Required

Dataset

Step-by-Step Code Implementation

Data Loading & Cost Computation

Train/Test Split & Standardisation

Define & Fit Bayesian Regression Model

Posterior Analysis & Point Predictions

Visualise Predictions & Credible Intervals