Clinic Wait Cost Prediction using Bayesian Regression in ML
FREE Online Courses: Dive into Knowledge for Free. Learn More!
Healthcare administrators need to forecast the additional operational costs associated with patient wait times—before the clinic’s daily schedule is finalised—using early-day indicators such as average prior-day wait time, appointment volume, no-show rate, and staffing levels. Wait‐time costs grow nonlinearly (e.g., overtime pay kicks in beyond threshold, idle‐room overhead) and carry uncertainty from day‐of‐service variability (no‐shows, emergencies). By applying Bayesian Regression, we obtain:
1. A point estimate of expected daily wait‐time cost.
2. A credible interval capturing forecasting uncertainty—enabling data‐driven staffing, scheduling, and budget planning.
Libraries Required
import pandas as pd # data loading & manipulation import numpy as np # numerical operations import matplotlib.pyplot as plt # plotting import seaborn as sns # enhanced visualization import pymc3 as pm # Bayesian modeling import arviz as az # posterior analysis from sklearn.model_selection import train_test_split from sklearn.preprocessing import StandardScaler from sklearn.metrics import mean_absolute_error
Dataset
Step-by-Step Code Implementation
Import Libraries & Load Data
import pandas as pd
# Load appointment data
df = pd.read_csv("data/noshowappointments/No-show.csv")
# Parse dates and compute wait days
df['ScheduledDay'] = pd.to_datetime(df['ScheduledDay'])
df['AppointmentDay'] = pd.to_datetime(df['AppointmentDay'])
df['wait_days'] = (df['AppointmentDay'] - df['ScheduledDay']).dt.days.clip(lower=0)
# Preview relevant columns
df[['wait_days','No-show','Age','Hipertension','SMS_received']].head()
Feature Engineering & Wait‐Cost Target
- Cost function: We model each day’s wait_time cost as wait_days × \$500/day (overtime + idle‐room).
- Daily aggregation: We summarise each calendar day with mean wait, no‐show rate, appointment volume, and SMS reminder rate.
# Define a simple cost function:
# cost = wait_days×(8 hrs×$50 overtime + $100 idle per room) = wait_days×(400+100)=wait_days×500
df['wait_cost'] = df['wait_days'] * 500
# Aggregate daily indicators
daily = df.groupby(df['AppointmentDay'].dt.date).agg({
'wait_cost': 'sum',
'wait_days': 'mean',
'No-show': lambda x: (x=='Yes').mean(),
'PatientId': 'count', # appointment volume
'SMS_received': 'mean' # reminder rate
}).rename(columns={
'No-show':'no_show_rate',
'PatientId':'volume',
'SMS_received':'sms_rate'
}).reset_index().rename(columns={'AppointmentDay':'date'})
# Drop weekends if no operations
daily['date'] = pd.to_datetime(daily['AppointmentDay'], errors='coerce')
Train/Test Split & Standardisation
from sklearn.model_selection import train_test_split from sklearn.preprocessing import StandardScaler # Features & target X = daily[['wait_days','no_show_rate','volume','sms_rate']].values y = daily['wait_cost'].values # Chronological split: first 75% days train, last 25% test split = int(len(X)*0.75) X_train, X_test = X[:split], X[split:] y_train, y_test = y[:split], y[split:] # Standardize predictors for stable MCMC scaler = StandardScaler().fit(X_train) X_train_s = scaler.transform(X_train) X_test_s = scaler.transform(X_test)
Define & Fit Bayesian Regression Model
Priors:
- α ∼ Normal(0, 10 000), broad intercept.
- β ∼ Normal(0, 1,000), reflecting uncertainty on each standardised covariate’s effect.
- σ ∼ HalfNormal(10 000), large residual scale to start.
Model: Linear predictor μ = α + β·X_standardized; observed daily cost ∼ Normal(μ, σ).
MCMC: 2,000 posterior draws (after 1,000 tuning) with target_accept=0.9 for stable inference.
import pymc3 as pm
with pm.Model() as wait_cost_model:
# Priors
α = pm.Normal("α", mu=0, sigma=1e4)
β = pm.Normal("β", mu=0, sigma=1e3, shape=X_train_s.shape[1])
σ = pm.HalfNormal("σ", sigma=1e4)
# Linear predictor
μ = α + pm.math.dot(X_train_s, β)
# Likelihood
Y_obs = pm.Normal("Y_obs", mu=μ, sigma=σ, observed=y_train)
# Sample posterior
trace = pm.sample(
draws=2000,
tune=1000,
target_accept=0.9,
return_inferencedata=True
)
Posterior Analysis & Point Predictions
- Posterior predictive: Sampling from Y_obs yields full predictive distributions, enabling both point forecasts (posterior means) and 94% Highest Posterior Density intervals.
- Evaluation: We compute MAE on held‑out days to gauge prediction accuracy.
import arviz as az
from sklearn.metrics import mean_absolute_error
# Posterior summary
az.summary(trace, round_to=2)
# Posterior predictive sampling
with wait_cost_model:
ppc = pm.sample_posterior_predictive(trace, var_names=["Y_obs"])
# Extract posterior means
α_post = trace.posterior["α"].mean().item()
β_post = trace.posterior["β"].mean(dim=["chain","draw"]).values
# Point predictions
y_pred = α_post + X_test_s.dot(β_post)
# Evaluate MAE
mae = mean_absolute_error(y_test, y_pred)
print(f"Test MAE: ${mae:.2f} per day")
Visualise Predictions & Credible Intervals
By varying one key feature (mean wait_days) while holding others at their median, we plot the posterior mean cost curve with its 94% credible band—illustrating both the expected cost impact and our uncertainty.
import numpy as np
import matplotlib.pyplot as plt
# Vary mean wait_days; hold other features at median
grid_wait = np.linspace(X_train_s[:,0].min(), X_train_s[:,0].max(), 100)
grid = np.tile(np.median(X_train_s, axis=0), (100,1))
grid[:,0] = grid_wait
with wait_cost_model:
pm.set_data({"X": grid})
ppc_grid = pm.sample_posterior_predictive(trace, var_names=["Y_obs"])
preds = ppc_grid["Y_obs"]
mean_pred = preds.mean(axis=0)
hpd = az.hdi(preds, hdi_prob=0.94)
# Back‐transform wait_days
wait_orig = scaler.inverse_transform(
np.column_stack([grid[:,0],grid[:,1],grid[:,2],grid[:,3]])
)[:,0]
plt.figure(figsize=(8,5))
plt.plot(wait_orig, mean_pred, label="Posterior mean")
plt.fill_between(wait_orig, hpd[:,0], hpd[:,1], alpha=0.3,
label="94% credible interval")
plt.scatter(scaler.inverse_transform(X_test_s)[:,0], y_test,
color="k", alpha=0.5, label="Test data")
plt.xlabel("Average Wait Days")
plt.ylabel("Daily Wait Cost (USD)")
plt.title("Bayesian Regression: Wait Cost vs. Wait Days")
plt.legend()
plt.tight_layout()
plt.show()
Summary
This Bayesian Regression workflow for clinic wait‐time cost forecasting provides:
1. Accurate point estimates for daily wait‐time cost based on early indicators (mean wait days, no‐show rate, volume, SMS reminders).
2. Credible intervals quantifying uncertainty—critical for staffing and budget planning under variable demand.
3. Actionable insights: clinic managers can schedule staff and resources with both expected cost and uncertainty bounds, optimising service levels and financial performance.