Clinic Operation Cost Prediction using Bayesian Regression in ML

FREE Online Courses: Elevate Your Skills, Zero Cost Attached - Enroll Now!

Healthcare administrators need to forecast the daily operational costs of an outpatient clinic—before staffing and supply decisions are made—using early‑day indicators such as patient arrival rate, average consultation time, number of procedures, staff count, and supply usage. Clinic costs scale nonlinearly (e.g., high patient volume may trigger overtime pay, and procedure mix drives supply spikes) and are subject to uncertainty from no‑shows, emergency walk‑ins, and variable supply prices. A single point‑estimate hides this risk. By applying Bayesian Regression, we derive both:

1. A point forecast of expected daily clinic cost.

2. A credible interval quantifying our uncertainty—enabling risk‑aware staffing, supply ordering, and budget planning.

Libraries Required

import pandas as pd                              # data loading & manipulation  
import numpy as np                               # numerical operations  

import matplotlib.pyplot as plt                  # plotting  
import seaborn as sns                            # visualization  

import pymc3 as pm                               # Bayesian modeling  
import arviz as az                               # posterior analysis  

from sklearn.model_selection import train_test_split  
from sklearn.preprocessing import StandardScaler  
from sklearn.metrics import mean_absolute_error  

Dataset

Medical Facility Operational Data

Step-by-Step Code Implementation

Data Loading & Synthetic Cost Computation

Synthetic target: We translate operational drivers into a daily_cost combining labour (consult min × $50/hr), procedures ($200 each), staff ($300/day), and supplies ($5/unit).

import pandas as pd

# Load operational data
df = pd.read_csv("data/medical-facility-operational-data.csv")

# Select and rename relevant fields
# Assume columns: 'Date','WalkIn_Count','Scheduled_Count','Avg_Consult_Min',
#                'Procedure_Count','Staff_Count','Supply_Units_Used'
df = df.rename(columns={
    'WalkIn_Count':     'walkins',
    'Scheduled_Count':  'scheduled',
    'Avg_Consult_Min':  'consult_time',
    'Procedure_Count':  'procedures',
    'Staff_Count':      'staff',
    'Supply_Units_Used':'supplies'
})

# Compute synthetic daily cost:
# - $50 per patient-minute of consultation (labor)
# - $200 per procedure (equipment & supplies)
# - $300 per staff member per day (salary/fringe)
# - $5 per supply unit used
df['daily_cost'] = (
    (df['walkins'] + df['scheduled']) * df['consult_time'] * 50/60
  + df['procedures'] * 200
  + df['staff'] * 300
  + df['supplies'] * 5
)

# Features & target
features = ['walkins','scheduled','consult_time','procedures','staff','supplies']
X = df[features].values
y = df['daily_cost'].values  # USD/day

Preprocessing & Train/Test Split

Scaling: Zero‑means and unit‑scales each feature so that priors on β (Normal(0,1)) apply uniformly and MCMC converges stably.

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

# Random 80/20 split
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# Standardize features for stable MCMC
scaler = StandardScaler().fit(X_train)
X_train_s = scaler.transform(X_train)
X_test_s  = scaler.transform(X_test)

Define & Fit Bayesian Regression Model

Priors:

  • α ∼ Normal(mean_cost, sd_cost) centers the intercept on observed costs;
  • β ∼ Normal(0,1) allows moderate sensitivity per standardised driver;
  • σ ∼ HalfNormal(sd_cost) enforces positive residual noise.

Model: Observed cost ∼ Normal(α + β·X_std, σ).

Inference: 2,000 posterior samples (with 1,000 tuning) at target_accept=0.9 ensure robust exploration.

import pymc3 as pm

with pm.Model() as clinic_cost_model:
    # Priors
    α = pm.Normal("α", mu=y_train.mean(), sigma=y_train.std())  
    β = pm.Normal("β", mu=0, sigma=1, shape=X_train_s.shape[1])
    σ = pm.HalfNormal("σ", sigma=y_train.std())

    # Linear predictor
    μ = α + pm.math.dot(X_train_s, β)

    # Likelihood
    Y_obs = pm.Normal("Y_obs", mu=μ, sigma=σ, observed=y_train)

    # MCMC sampling
    trace = pm.sample(
        draws=2000,
        tune=1000,
        target_accept=0.9,
        return_inferencedata=True
    )

Posterior Analysis & Point Predictions

  • Posterior predictive: Sampling yields full predictive distributions, from which we extract posterior mean forecasts and 94% Highest Posterior Density intervals.
  • Evaluation: MAE on held‑out days quantifies average cost‑forecast error.
import arviz as az
from sklearn.metrics import mean_absolute_error

# Summarize the posterior
az.summary(trace, round_to=2)

# Posterior predictive sampling
with clinic_cost_model:
    ppc = pm.sample_posterior_predictive(trace, var_names=["Y_obs"])

# Compute point predictions
α_post = trace.posterior["α"].mean().item()
β_post = trace.posterior["β"].mean(dim=["chain","draw"]).values
y_pred = α_post + X_test_s.dot(β_post)

# Evaluate MAE
mae = mean_absolute_error(y_test, y_pred)
print(f"Test MAE: ${mae:.2f} per day")

Visualise Predictions & Credible Intervals

By varying walk‑in count and holding other features fixed, we plot both the expected cost curve and its credible band—revealing how volume drives cost and our uncertainty around it.

import numpy as np
import matplotlib.pyplot as plt

# Sweep walkins while holding other features at median
grid = np.median(X_train_s, axis=0)[None,:].repeat(100, axis=0)
walkin_vals = np.linspace(X_train_s[:,0].min(), X_train_s[:,0].max(), 100)
grid[:,0] = walkin_vals

with clinic_cost_model:
    ppc_grid = pm.sample_posterior_predictive( trace,
                                               var_names=["Y_obs"], 
                                               samples=1000)

preds     = ppc_grid["Y_obs"]
mean_pred = preds.mean(axis=0)
hpd       = az.hdi(preds, hdi_prob=0.94)

# Back‑transform walk‑in counts
walkin_orig = scaler.inverse_transform(grid)[:,0]

plt.figure(figsize=(8,5))
plt.plot(walkin_orig, mean_pred, label="Posterior mean")
plt.fill_between(walkin_orig, hpd[:,0], hpd[:,1], alpha=0.3,
                 label="94% credible interval")
plt.scatter(
    scaler.inverse_transform(X_test_s)[:,0],
    y_test, color="k", alpha=0.5, label="Test data"
)
plt.xlabel("Daily Walk‑In Count")
plt.ylabel("Daily Clinic Cost (USD)")
plt.title("Bayesian Regression: Cost vs. Walk‑Ins")
plt.legend()
plt.tight_layout()
plt.show()

Summary

This Bayesian Regression workflow for Clinic Operation Cost Prediction provides:

1. Point estimates of daily clinic cost from early operational metrics.

2. Credible intervals that quantify forecast uncertainty—crucial for staffing, supply ordering, and budget risk management.

3. Actionable insights: healthcare administrators can allocate resources, set fee schedules, and plan contingencies with explicit cost‑risk bounds—optimising both patient service and financial stewardship.

Did you like our efforts? If Yes, please give ProjectGurukul 5 Stars on Google | Facebook

ProjectGurukul Team

The ProjectGurukul Team delivers project-based tutorials on programming, machine learning, and web development. We simplify learning by providing hands-on projects to help you master real-world skills. We also provide free major and minor projects for enginering students.

Leave a Reply

Your email address will not be published. Required fields are marked *