Hospital Equipment Cost Prediction with Ridge Regression in ML
We offer you a brighter future with FREE online courses - Start Now!!
Hospital finance teams purchase, lease, maintain, and eventually replace hundreds of pieces of clinical and support equipment—from CT scanners and ventilators to linen carts. If they can predict next year’s equipment‑related cost for each department using data already in the supply‑chain system, they can:
- Negotiate service contracts and warranties before renewal dates,
- Identify assets that should be replaced rather than repaired, and
- Defend capital‑budget requests to the board.
We will build a Ridge‑regression model that predicts a department’s annual equipment cost in USD from routinely captured inventory and utilisation variables:
- equipment age and original purchase price
- category (diagnostic / therapeutic / facilities)
- annual usage hours or scan counts
- number of corrective and preventive work orders
- whether the item is under a service contract
- location type (in‑patient, ambulatory, support)
- calendar year (captures inflation and supply‑chain shocks)
Ridge regression maintains a linear relationship—every input has a fixed coefficient—while its L2 penalty prevents unstable weights when correlated variables (such as age, usage, and repairs) move in tandem.
Libraries Required
- pandas # data loading and cleaning
- numpy # numeric helpers
- matplotlib.pyplot # optional quick plots
- scikit‑learn # preprocessing, RidgeCV, metrics
- joblib # persist the fitted model
Dataset Link
Step-by-Step Code Implementation
1. Import Libraries
import pandas as pd import numpy as np import matplotlib.pyplot as plt from sklearn.model_selection import train_test_split from sklearn.preprocessing import OneHotEncoder, StandardScaler from sklearn.compose import ColumnTransformer from sklearn.pipeline import Pipeline from sklearn.linear_model import RidgeCV from sklearn.metrics import r2_score, mean_absolute_error import joblib
2. Load the dataset
# file from Kaggle link below
df = pd.read_csv("hospital_supply_chain.csv")
print(df.head())
Expected columns
| column | example value |
| total_equipment_cost_usd | 147 320 |
| purchase_price_usd | 680 000 |
| equip_age_yrs | 6.4 |
| annual_usage_hrs | 1 980 |
| corrective_WOs | 11 |
| preventive_WOs | 4 |
| service_contract | Yes / No |
| equip_category | Diagnostic |
| location_type | In‑patient |
| year | 2022 |
3. Minimal cleaning
Bring purchase price (hundreds k) and work‑order counts (single digits) onto similar variance so Ridge’s penalty treats them evenly.
req = ['total_equipment_cost_usd','purchase_price_usd','equip_age_yrs',
'annual_usage_hrs','corrective_WOs','preventive_WOs',
'service_contract','equip_category','location_type','year']
df = df.dropna(subset=req).copy()
4. Define feature groups
num_cols = ['purchase_price_usd','equip_age_yrs','annual_usage_hrs',
'corrective_WOs','preventive_WOs']
cat_cols = ['service_contract','equip_category','location_type','year']
target = 'total_equipment_cost_usd'
X = df[num_cols + cat_cols]
y = df[target]
5. Pre‑processing and Ridge pipeline
Converts yes/no contract flag, category and location into binary columns; dropping the first level prevents perfect collinearity.
preprocess = ColumnTransformer([
('cat', OneHotEncoder(drop='first', handle_unknown='ignore'), cat_cols),
('num', StandardScaler(), num_cols)
])
ridge = RidgeCV(alphas=[0.1, 1, 10, 50, 100, 250], cv=5)
model = Pipeline([
('prep', preprocess),
('ridge', ridge)
])
6. Train‑test split and fit
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42, shuffle=True)
model.fit(X_train, y_train)
7. Evaluate hold‑out accuracy
Five-fold cross-validation selects the α that minimises validation error, automatically balancing bias and variance.
pred = model.predict(X_test)
print(f"α chosen by CV : {model.named_steps['ridge'].alpha_}")
print(f"R² (test set) : {r2_score(y_test, pred):.3f}")
print(f"MAE (test set) : ${mean_absolute_error(y_test, pred):,.0f}")
8. Coefficient inspection
Coefficients remain dollars. Example: a + $ 95,000 coefficient on equip_age_yrs (per σ) highlights how older devices drive costs; a − $ 12,000 weight on service_contract_Yes indicates that contracted equipment tends to cost less per year.
ohe = model.named_steps['prep'].named_transformers_['cat']
ohe_names = ohe.get_feature_names_out(cat_cols)
feature_names = np.concatenate([ohe_names, num_cols])
coefs = (pd.Series(model.named_steps['ridge'].coef_, index=feature_names)
.sort_values())
print("\nCost reducers (most negative):")
print(coefs.head(6))
print("\nCost drivers (most positive):")
print(coefs.tail(6))
Numeric coefficients represent USD change for a one‑standard‑deviation increase; dummy‑variable coefficients show the dollar shift relative to the reference level.
9. Persist for production dashboards
The purchase price, age, and usage are correlated; OLS can assign unstable, contradictory weights. Ridge stabilises the solution while preserving a linear business story.
joblib.dump(model, "ridge_hospital_equip_cost.pkl")
Summary
By coupling standard preprocessing with Ridge regression, we produced an explainable hospital‑equipment cost predictor:
- Immediate value: Supply-chain managers can preview next year’s maintenance and depreciation costs by department.
- Transparent levers: Each coefficient directly maps to an actionable variable (e.g., age, usage, service contract).
- Benchmark: Any booster model must beat this Ridge MAE and remain interpretable enough for capital-planning committees.