Fitness Program Cost Prediction using Bayesian Regression in ML
FREE Online Courses: Your Passport to Excellence - Start Now
Wellness centres and corporate fitness providers need to forecast the per-participant cost of a multi-week fitness program—before launching the next cohort—using early-enrollment metrics such as age, body mass index (BMI), initial fitness score, program duration, and attendance commitment level. Delivery costs scale nonlinearly: older or higher‐BMI participants may require more individualised coaching (increasing labour cost), and longer programs often yield volume discounts on facility usage. Moreover, uncertainty in actual attendance rates and staff overtime means simple point estimates risk budget overruns. By applying Bayesian Regression, we obtain both a best-estimate cost per person and a credible interval that quantifies our uncertainty—enabling data-driven pricing, staffing, and resource allocation.
Libraries Required
import pandas as pd # data loading & manipulation import numpy as np # numerical operations import matplotlib.pyplot as plt # plotting import seaborn as sns # visualization import pymc3 as pm # Bayesian modeling import arviz as az # posterior analysis from sklearn.model_selection import train_test_split from sklearn.preprocessing import StandardScaler from sklearn.metrics import mean_absolute_error
Dataset
Step-by-Step Code Implementation
Data Loading & Feature Engineering
- We convert a MonthlyFee into a per-program cost (Program_Cost ≈ MonthlyFee × weeks × 7/30).
- Predictors: Age, BMI, initial fitness level, program length, and attendance commitment.
import pandas as pd
# Load simulated gym membership data
df = pd.read_csv("data/gym-membership-dataset/Gym_Membership_Data.csv")
# Assume the dataset includes:
# Age, BMI, Initial_Fitness_Score, Program_Duration_Weeks, Attendance_Commitment (%),
# MonthlyFee (USD)
df = df[['Age','BMI','Initial_Fitness_Score',
'Program_Duration_Weeks','Attendance_Commitment','MonthlyFee']].dropna()
# Convert MonthlyFee to per‐week ProgramCost for a fair comparison
df['Program_Cost'] = df['MonthlyFee'] * (df['Program_Duration_Weeks'] * 7 / 30)
# Select predictors and target
X = df[['Age','BMI','Initial_Fitness_Score',
'Program_Duration_Weeks','Attendance_Commitment']].values
y = df['Program_Cost'].values # USD per participant
Preprocessing & Train/Test Split
Zero-mean and unit-scale each predictor so the Bayesian sampler converges reliably and the priors apply uniformly.
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
# Split data (80% train / 20% test)
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42
)
# Standardize predictors for stable MCMC
scaler = StandardScaler().fit(X_train)
X_train_s = scaler.transform(X_train)
X_test_s = scaler.transform(X_test)
Define & Fit Bayesian Regression Model
Model Priors:
- α ∼ Normal(0, 50): broad intercept prior reflecting program‐cost scale.
- β ∼ Normal(0, 20): moderate uncertainty on each standardised coefficient.
- σ ∼ HalfNormal(20): positive residual‐noise scale.
Likelihood: Observed Program_Cost ∼ Normal(μ, σ), with μ=α+β·X_standardized.
Sampling: We draw 2,000 posterior samples (after 1,000 tuning) with target_accept=0.9 to ensure stable convergence.
import pymc3 as pm
with pm.Model() as fitness_cost_model:
# Priors
α = pm.Normal("α", mu=0, sigma=50) # intercept
β = pm.Normal("β", mu=0, sigma=20, shape=X_train_s.shape[1]) # slopes
σ = pm.HalfNormal("σ", sigma=20) # residual noise
# Linear predictor
μ = α + pm.math.dot(X_train_s, β)
# Likelihood
Y_obs = pm.Normal("Y_obs", mu=μ, sigma=σ, observed=y_train)
# MCMC sampling
trace = pm.sample(
draws=2000, tune=1000,
target_accept=0.9,
return_inferencedata=True
)
Posterior Analysis & Point Predictions
- Posterior Predictive: Sampling Y_obs yields predictive distributions—allowing us to compute both point forecasts (posterior means) and 94% Highest Posterior Density intervals.
- Evaluation: Mean Absolute Error (MAE) on held-out test data quantifies the average point-forecast error.
import arviz as az
from sklearn.metrics import mean_absolute_error
# Summarize posterior distributions
az.summary(trace, round_to=2)
# Posterior predictive sampling
with fitness_cost_model:
ppc = pm.sample_posterior_predictive(trace, var_names=["Y_obs"])
# Compute posterior means of parameters
α_post = trace.posterior["α"].mean().item()
β_post = trace.posterior["β"].mean(dim=["chain","draw"]).values
# Point predictions on test set
y_pred = α_post + X_test_s.dot(β_post)
# Evaluate MAE
mae = mean_absolute_error(y_test, y_pred)
print(f"Test MAE: ${mae:.2f}")
Visualise Predictions & Credible Intervals
By sweeping attendance commitment (a key cost driver), we plot both the posterior mean program cost curve and its 94% credible band, illustrating how higher commitment reduces per‐participant cost—and how much uncertainty surrounds that estimate.
import numpy as np
import matplotlib.pyplot as plt
# Vary Attendance Commitment; hold other features at their median
commit_grid = np.linspace(X_train_s[:,4].min(), X_train_s[:,4].max(), 100)
grid = np.median(X_train_s, axis=0)[None,:].repeat(100, axis=0)
grid[:,4] = commit_grid
with fitness_cost_model:
ppc_grid = pm.sample_posterior_predictive(trace,
var_names=["Y_obs"],
samples=1000)
preds = ppc_grid["Y_obs"]
mean_pred = preds.mean(axis=0)
hpd = az.hdi(preds, hdi_prob=0.94)
# Back-transform attendance commitment
commit_orig = scaler.inverse_transform(grid)[:,4]
plt.figure(figsize=(8,5))
plt.plot(commit_orig, mean_pred, label="Posterior mean")
plt.fill_between(commit_orig, hpd[:,0], hpd[:,1], alpha=0.3,
label="94% credible interval")
plt.scatter(
scaler.inverse_transform(X_test_s)[:,4],
y_test, color="k", alpha=0.5, label="Test data"
)
plt.xlabel("Attendance Commitment (%)")
plt.ylabel("Program Cost per Participant (USD)")
plt.title("Bayesian Regression: Cost vs. Attendance Commitment")
plt.legend()
plt.tight_layout()
plt.show()
Summary
This Bayesian Regression workflow for Fitness Program Cost Prediction delivers:
1. Accurate point estimates of participant‐level program cost from early enrollment metrics.
2. Credible intervals that quantify forecasting uncertainty—crucial for budget risk management.
3. Actionable insights: fitness operators can set program prices, allocate coaching staff, and negotiate facility contracts with full awareness of cost bounds—optimising both profitability and participant outcomes.