Clinic Operation Cost Prediction using Bayesian Regression in ML

FREE Online Courses: Knowledge Awaits – Click for Free Access!

Healthcare administrators need to forecast the daily operational costs of an outpatient clinic—before staffing and supply decisions are made—using early‑day indicators such as patient arrival rate, average consultation time, number of procedures, staff count, and supply usage. Clinic costs scale nonlinearly (e.g., high patient volume may trigger overtime pay, and procedure mix drives supply spikes) and are subject to uncertainty from no‑shows, emergency walk‑ins, and variable supply prices. A single point‑estimate hides this risk. By applying Bayesian Regression, we derive both:

1. A point forecast of expected daily clinic cost.

2. A credible interval quantifying our uncertainty—enabling risk‑aware staffing, supply ordering, and budget planning.

Libraries Required

import pandas as pd                              # data loading & manipulation  
import numpy as np                               # numerical operations  

import matplotlib.pyplot as plt                  # plotting  
import seaborn as sns                            # visualization  

import pymc3 as pm                               # Bayesian modeling  
import arviz as az                               # posterior analysis  

from sklearn.model_selection import train_test_split  
from sklearn.preprocessing import StandardScaler  
from sklearn.metrics import mean_absolute_error

Dataset

Medical Facility Operational Data

Step-by-Step Code Implementation

Data Loading & Synthetic Cost Computation

Synthetic target: We translate operational drivers into a daily_cost combining labour (consult min × $50/hr), procedures ($200 each), staff ($300/day), and supplies ($5/unit).

import pandas as pd

# Load operational data
df = pd.read_csv("data/medical-facility-operational-data.csv")

# Select and rename relevant fields
# Assume columns: 'Date','WalkIn_Count','Scheduled_Count','Avg_Consult_Min',
#                'Procedure_Count','Staff_Count','Supply_Units_Used'
df = df.rename(columns={
    'WalkIn_Count':     'walkins',
    'Scheduled_Count':  'scheduled',
    'Avg_Consult_Min':  'consult_time',
    'Procedure_Count':  'procedures',
    'Staff_Count':      'staff',
    'Supply_Units_Used':'supplies'
})

# Compute synthetic daily cost:
# - $50 per patient-minute of consultation (labor)
# - $200 per procedure (equipment & supplies)
# - $300 per staff member per day (salary/fringe)
# - $5 per supply unit used
df['daily_cost'] = (
    (df['walkins'] + df['scheduled']) * df['consult_time'] * 50/60
  + df['procedures'] * 200
  + df['staff'] * 300
  + df['supplies'] * 5
)

# Features & target
features = ['walkins','scheduled','consult_time','procedures','staff','supplies']
X = df[features].values
y = df['daily_cost'].values  # USD/day

Preprocessing & Train/Test Split

Scaling: Zero‑means and unit‑scales each feature so that priors on β (Normal(0,1)) apply uniformly and MCMC converges stably.

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

# Random 80/20 split
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# Standardize features for stable MCMC
scaler = StandardScaler().fit(X_train)
X_train_s = scaler.transform(X_train)
X_test_s  = scaler.transform(X_test)

Define & Fit Bayesian Regression Model

Priors:

α ∼ Normal(mean_cost, sd_cost) centers the intercept on observed costs;
β ∼ Normal(0,1) allows moderate sensitivity per standardised driver;
σ ∼ HalfNormal(sd_cost) enforces positive residual noise.

Model: Observed cost ∼ Normal(α + β·X_std, σ).

Inference: 2,000 posterior samples (with 1,000 tuning) at target_accept=0.9 ensure robust exploration.

import pymc3 as pm

with pm.Model() as clinic_cost_model:
    # Priors
    α = pm.Normal("α", mu=y_train.mean(), sigma=y_train.std())  
    β = pm.Normal("β", mu=0, sigma=1, shape=X_train_s.shape[1])
    σ = pm.HalfNormal("σ", sigma=y_train.std())

    # Linear predictor
    μ = α + pm.math.dot(X_train_s, β)

    # Likelihood
    Y_obs = pm.Normal("Y_obs", mu=μ, sigma=σ, observed=y_train)

    # MCMC sampling
    trace = pm.sample(
        draws=2000,
        tune=1000,
        target_accept=0.9,
        return_inferencedata=True
    )

Posterior Analysis & Point Predictions

Posterior predictive: Sampling yields full predictive distributions, from which we extract posterior mean forecasts and 94% Highest Posterior Density intervals.
Evaluation: MAE on held‑out days quantifies average cost‑forecast error.

import arviz as az
from sklearn.metrics import mean_absolute_error

# Summarize the posterior
az.summary(trace, round_to=2)

# Posterior predictive sampling
with clinic_cost_model:
    ppc = pm.sample_posterior_predictive(trace, var_names=["Y_obs"])

# Compute point predictions
α_post = trace.posterior["α"].mean().item()
β_post = trace.posterior["β"].mean(dim=["chain","draw"]).values
y_pred = α_post + X_test_s.dot(β_post)

# Evaluate MAE
mae = mean_absolute_error(y_test, y_pred)
print(f"Test MAE: ${mae:.2f} per day")

Visualise Predictions & Credible Intervals

By varying walk‑in count and holding other features fixed, we plot both the expected cost curve and its credible band—revealing how volume drives cost and our uncertainty around it.

import numpy as np
import matplotlib.pyplot as plt

# Sweep walkins while holding other features at median
grid = np.median(X_train_s, axis=0)[None,:].repeat(100, axis=0)
walkin_vals = np.linspace(X_train_s[:,0].min(), X_train_s[:,0].max(), 100)
grid[:,0] = walkin_vals

with clinic_cost_model:
    ppc_grid = pm.sample_posterior_predictive( trace,
                                               var_names=["Y_obs"], 
                                               samples=1000)

preds     = ppc_grid["Y_obs"]
mean_pred = preds.mean(axis=0)
hpd       = az.hdi(preds, hdi_prob=0.94)

# Back‑transform walk‑in counts
walkin_orig = scaler.inverse_transform(grid)[:,0]

plt.figure(figsize=(8,5))
plt.plot(walkin_orig, mean_pred, label="Posterior mean")
plt.fill_between(walkin_orig, hpd[:,0], hpd[:,1], alpha=0.3,
                 label="94% credible interval")
plt.scatter(
    scaler.inverse_transform(X_test_s)[:,0],
    y_test, color="k", alpha=0.5, label="Test data"
)
plt.xlabel("Daily Walk‑In Count")
plt.ylabel("Daily Clinic Cost (USD)")
plt.title("Bayesian Regression: Cost vs. Walk‑Ins")
plt.legend()
plt.tight_layout()
plt.show()

Summary

This Bayesian Regression workflow for Clinic Operation Cost Prediction provides:

1. Point estimates of daily clinic cost from early operational metrics.

2. Credible intervals that quantify forecast uncertainty—crucial for staffing, supply ordering, and budget risk management.

3. Actionable insights: healthcare administrators can allocate resources, set fee schedules, and plan contingencies with explicit cost‑risk bounds—optimising both patient service and financial stewardship.

We work very hard to provide you quality material
Could you take 15 seconds and share your happy experience on Google | Facebook

Clinic Operation Cost Prediction using Bayesian Regression in ML

Libraries Required

Dataset

Step-by-Step Code Implementation

Data Loading & Synthetic Cost Computation

Preprocessing & Train/Test Split

Define & Fit Bayesian Regression Model

Posterior Analysis & Point Predictions

Visualise Predictions & Credible Intervals