ER Wait Time Trend Prediction with Polynomial Regression in ML

FREE Online Courses: Knowledge Awaits – Click for Free Access!

Hospital administrators and operations analysts need to forecast the average emergency‑room (ER) wait time (minutes) based on early‑shift indicators—patient arrival rate, triage severity mix, staffing level, and time‑of‑day—so they can proactively allocate resources and reduce bottlenecks. Historical ER data exhibit nonlinear wait‑time curves: at low arrival rates, additional staff sharply minimise wait times, but beyond a threshold, further staffing yields diminishing returns; severity mix and time‑of‑day interact to amplify peaks during evening hours. A simple linear model underfits these dynamics, while a naïve high‑degree polynomial overfits. By applying Polynomial Regression on engineered features with Ridge (ℓ²) regularisation, we can capture smooth, interpretable wait‑time curves that generalise reliably to new demand patterns.

Libraries Required

Purpose Library
Data loading & handling pandas, numpy
Visualization matplotlib, seaborn
Feature preprocessing scikit‑learnColumnTransformer, StandardScaler, PolynomialFeatures
Categorical encoding OneHotEncoder
Regression & model selection Ridge, train_test_split, GridSearchCV
Evaluation mean_squared_error, r2_score

Dataset

Simulated ER Wait Time Dataset

Step-by-Step Code Implementation

1. Import Libraries & Load Data

import pandas as pd
import numpy as np

# Load the simulated ER wait-time data
df = pd.read_csv("er_wait_time.csv")

# Inspect key columns
df.head()[[
    'arrival_rate_per_hr',   # patients/hour
    'pct_high_severity',     # % of arrivals triaged as high severity
    'staff_count',           # number of doctors on duty
    'hour_of_day',           # 0–23
    'wait_time_min'          # observed avg wait time (target)
]]

2. Feature Engineering & Exploratory Analysis

Expands the three scaled numerics + 23 dummies into squares and interactions (e.g. arrival_rate_per_hr², arrival_rate_per_hr×pct_high_severity, staff_count×hour_of_day_18), modelling curvature and cross‑effects such as staff efficacy under peak loads.

import seaborn as sns
import matplotlib.pyplot as plt

# Visualize nonlinear trend: arrival rate vs wait time
sns.scatterplot(
    x='arrival_rate_per_hr', y='wait_time_min',
    hue='staff_count', palette='viridis', data=df, alpha=0.6
)
plt.title("Arrival Rate vs Wait Time (colored by Staff Count)")
plt.xlabel("Arrival Rate (patients/hr)")
plt.ylabel("Average Wait Time (min)")
plt.show()

3. Define Features & Target

# Categorical feature: hour_of_day (captures diurnal pattern)
# Numeric features: arrival_rate_per_hr, pct_high_severity, staff_count
X = df[[
    'arrival_rate_per_hr',
    'pct_high_severity',
    'staff_count',
    'hour_of_day'
]]
y = df['wait_time_min']

4. Build Polynomial Regression Pipeline

  • StandardScaler normalises arrival_rate_per_hr, pct_high_severity, and staff_count so the Ridge penalty treats them uniformly.
  • OneHotEncoder converts hour_of_day into 23 binary features to capture diurnal effects without imposing ordinality.
  • Applies an ℓ² penalty to control the complexity from the expanded feature space.
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler, PolynomialFeatures, OneHotEncoder
from sklearn.compose import ColumnTransformer
from sklearn.linear_model import Ridge

# Preprocessing: scale numeric, one-hot encode hour_of_day
numeric_cols = ['arrival_rate_per_hr','pct_high_severity','staff_count']
categorical_cols = ['hour_of_day']

preprocessor = ColumnTransformer([
    ('num', StandardScaler(), numeric_cols),
    ('cat', OneHotEncoder(drop='first'), categorical_cols)
])

pipe = Pipeline([
    ('prep', preprocessor),
    ('poly', PolynomialFeatures(include_bias=False)),
    ('ridge', Ridge(max_iter=20000, random_state=42))
])

5. Train/Test Split & Hyperparameter Search

from sklearn.model_selection import train_test_split, GridSearchCV

# Random split; time ordering less critical in simulated cross-sectional data
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

param_grid = {
    'poly__degree': [1, 2, 3],
    'ridge__alpha': np.logspace(-3, 2, 6)  # 0.001 → 100
}

gs = GridSearchCV(
    pipe, param_grid,
    cv=5,
    scoring='neg_root_mean_squared_error',
    n_jobs=-1, verbose=1
)
gs.fit(X_train, y_train)

print("Best polynomial degree:", gs.best_params_['poly__degree'])
print("Best Ridge alpha      :", gs.best_params_['ridge__alpha'])

6. Evaluate Model

  • degree (1–3) balances underfit vs overfit,
  • alpha (10⁻³…10²) regulates shrinkage strength.
  • A 5‑fold cross‑validation selects the configuration minimising RMSE.
from sklearn.metrics import mean_squared_error, r2_score

y_pred = gs.predict(X_test)
rmse = mean_squared_error(y_test, y_pred, squared=False)
r2   = r2_score(y_test, y_pred)

print(f"Test RMSE: {rmse:.2f} minutes")
print(f"Test R²  : {r2:.3f}")

7. Inspect Key Polynomial Coefficients

Coefficients with the largest magnitudes (absolute value) highlight which nonlinear and interaction terms—such as arrival_rate_per_hr² or staff_count×hour_of_day_18—most influence predicted wait times, guiding staffing and scheduling decisions.

# Reconstruct input feature names after preprocessing
prep = gs.best_estimator_.named_steps['prep']
num_feats = numeric_cols
cat_feats = prep.named_transformers_['cat'] \
                .get_feature_names_out(categorical_cols).tolist()
all_feats = num_feats + cat_feats

# Get polynomial-expanded names and coefficients
poly = gs.best_estimator_.named_steps['poly']
feat_names = poly.get_feature_names_out(input_features=all_feats)
coefs = gs.best_estimator_.named_steps['ridge'].coef_

import pandas as pd
imp = pd.Series(coefs, index=feat_names) \
            .abs().sort_values(ascending=False).head(10)

import matplotlib.pyplot as plt
plt.figure(figsize=(8,5))
imp.plot(kind='barh')
plt.gca().invert_yaxis()
plt.title("Top Polynomial Features Driving ER Wait Time")
plt.xlabel("Coefficient Magnitude")
plt.tight_layout()
plt.show()

Summary

By combining polynomial feature engineering with Ridge regularisation in a clean pipeline, this approach provides:

  • Accurate nonlinear modelling of ER wait‑time dynamics (low RMSE, strong R²).
  • Robust generalisation, avoiding overfitting via α tuning.
  • Actionable insights, with interpretable polynomial features that inform optimal staffing levels and dispatch times to minimise wait times.

We work very hard to provide you quality material
Could you take 15 seconds and share your happy experience on Google | Facebook

ProjectGurukul Team

ProjectGurukul Team specializes in creating project-based learning resources for programming, Java, Python, Android, AI, Webdevelopment and machine learning. Our mission is to help learners build practical skills through engaging, hands-on projects. We also offer free major and minor projects with source code for engineering students

Leave a Reply

Your email address will not be published. Required fields are marked *