ER Wait Time Trend Prediction with Polynomial Regression in ML

FREE Online Courses: Knowledge Awaits – Click for Free Access!

Hospital administrators and operations analysts need to forecast the average emergency‑room (ER) wait time (minutes) based on early‑shift indicators—patient arrival rate, triage severity mix, staffing level, and time‑of‑day—so they can proactively allocate resources and reduce bottlenecks. Historical ER data exhibit nonlinear wait‑time curves: at low arrival rates, additional staff sharply minimise wait times, but beyond a threshold, further staffing yields diminishing returns; severity mix and time‑of‑day interact to amplify peaks during evening hours. A simple linear model underfits these dynamics, while a naïve high‑degree polynomial overfits. By applying Polynomial Regression on engineered features with Ridge (ℓ²) regularisation, we can capture smooth, interpretable wait‑time curves that generalise reliably to new demand patterns.

Libraries Required

Purpose	Library
Data loading & handling	pandas, numpy
Visualization	matplotlib, seaborn
Feature preprocessing	scikit‑learn → ColumnTransformer, StandardScaler, PolynomialFeatures
Categorical encoding	OneHotEncoder
Regression & model selection	Ridge, train_test_split, GridSearchCV
Evaluation	mean_squared_error, r2_score

Dataset

Simulated ER Wait Time Dataset

Step-by-Step Code Implementation

1. Import Libraries & Load Data

import pandas as pd
import numpy as np

# Load the simulated ER wait-time data
df = pd.read_csv("er_wait_time.csv")

# Inspect key columns
df.head()[[
    'arrival_rate_per_hr',   # patients/hour
    'pct_high_severity',     # % of arrivals triaged as high severity
    'staff_count',           # number of doctors on duty
    'hour_of_day',           # 0–23
    'wait_time_min'          # observed avg wait time (target)
]]

2. Feature Engineering & Exploratory Analysis

Expands the three scaled numerics + 23 dummies into squares and interactions (e.g. arrival_rate_per_hr², arrival_rate_per_hr×pct_high_severity, staff_count×hour_of_day_18), modelling curvature and cross‑effects such as staff efficacy under peak loads.

import seaborn as sns
import matplotlib.pyplot as plt

# Visualize nonlinear trend: arrival rate vs wait time
sns.scatterplot(
    x='arrival_rate_per_hr', y='wait_time_min',
    hue='staff_count', palette='viridis', data=df, alpha=0.6
)
plt.title("Arrival Rate vs Wait Time (colored by Staff Count)")
plt.xlabel("Arrival Rate (patients/hr)")
plt.ylabel("Average Wait Time (min)")
plt.show()

3. Define Features & Target

# Categorical feature: hour_of_day (captures diurnal pattern)
# Numeric features: arrival_rate_per_hr, pct_high_severity, staff_count
X = df[[
    'arrival_rate_per_hr',
    'pct_high_severity',
    'staff_count',
    'hour_of_day'
]]
y = df['wait_time_min']

4. Build Polynomial Regression Pipeline

StandardScaler normalises arrival_rate_per_hr, pct_high_severity, and staff_count so the Ridge penalty treats them uniformly.
OneHotEncoder converts hour_of_day into 23 binary features to capture diurnal effects without imposing ordinality.
Applies an ℓ² penalty to control the complexity from the expanded feature space.

from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler, PolynomialFeatures, OneHotEncoder
from sklearn.compose import ColumnTransformer
from sklearn.linear_model import Ridge

# Preprocessing: scale numeric, one-hot encode hour_of_day
numeric_cols = ['arrival_rate_per_hr','pct_high_severity','staff_count']
categorical_cols = ['hour_of_day']

preprocessor = ColumnTransformer([
    ('num', StandardScaler(), numeric_cols),
    ('cat', OneHotEncoder(drop='first'), categorical_cols)
])

pipe = Pipeline([
    ('prep', preprocessor),
    ('poly', PolynomialFeatures(include_bias=False)),
    ('ridge', Ridge(max_iter=20000, random_state=42))
])

5. Train/Test Split & Hyperparameter Search

from sklearn.model_selection import train_test_split, GridSearchCV

# Random split; time ordering less critical in simulated cross-sectional data
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

param_grid = {
    'poly__degree': [1, 2, 3],
    'ridge__alpha': np.logspace(-3, 2, 6)  # 0.001 → 100
}

gs = GridSearchCV(
    pipe, param_grid,
    cv=5,
    scoring='neg_root_mean_squared_error',
    n_jobs=-1, verbose=1
)
gs.fit(X_train, y_train)

print("Best polynomial degree:", gs.best_params_['poly__degree'])
print("Best Ridge alpha      :", gs.best_params_['ridge__alpha'])

6. Evaluate Model

degree (1–3) balances underfit vs overfit,
alpha (10⁻³…10²) regulates shrinkage strength.
A 5‑fold cross‑validation selects the configuration minimising RMSE.

from sklearn.metrics import mean_squared_error, r2_score

y_pred = gs.predict(X_test)
rmse = mean_squared_error(y_test, y_pred, squared=False)
r2   = r2_score(y_test, y_pred)

print(f"Test RMSE: {rmse:.2f} minutes")
print(f"Test R²  : {r2:.3f}")

7. Inspect Key Polynomial Coefficients

Coefficients with the largest magnitudes (absolute value) highlight which nonlinear and interaction terms—such as arrival_rate_per_hr² or staff_count×hour_of_day_18—most influence predicted wait times, guiding staffing and scheduling decisions.

# Reconstruct input feature names after preprocessing
prep = gs.best_estimator_.named_steps['prep']
num_feats = numeric_cols
cat_feats = prep.named_transformers_['cat'] \
                .get_feature_names_out(categorical_cols).tolist()
all_feats = num_feats + cat_feats

# Get polynomial-expanded names and coefficients
poly = gs.best_estimator_.named_steps['poly']
feat_names = poly.get_feature_names_out(input_features=all_feats)
coefs = gs.best_estimator_.named_steps['ridge'].coef_

import pandas as pd
imp = pd.Series(coefs, index=feat_names) \
            .abs().sort_values(ascending=False).head(10)

import matplotlib.pyplot as plt
plt.figure(figsize=(8,5))
imp.plot(kind='barh')
plt.gca().invert_yaxis()
plt.title("Top Polynomial Features Driving ER Wait Time")
plt.xlabel("Coefficient Magnitude")
plt.tight_layout()
plt.show()

Summary

By combining polynomial feature engineering with Ridge regularisation in a clean pipeline, this approach provides:

Accurate nonlinear modelling of ER wait‑time dynamics (low RMSE, strong R²).
Robust generalisation, avoiding overfitting via α tuning.
Actionable insights, with interpretable polynomial features that inform optimal staffing levels and dispatch times to minimise wait times.

We work very hard to provide you quality material
Could you take 15 seconds and share your happy experience on Google | Facebook

ER Wait Time Trend Prediction with Polynomial Regression in ML

Libraries Required

Dataset

Step-by-Step Code Implementation

1. Import Libraries & Load Data

2. Feature Engineering & Exploratory Analysis

3. Define Features & Target

4. Build Polynomial Regression Pipeline

5. Train/Test Split & Hyperparameter Search

6. Evaluate Model

7. Inspect Key Polynomial Coefficients