Hospital Equipment Cost Prediction with Ridge Regression in ML

We offer you a brighter future with FREE online courses - Start Now!!

Hospital finance teams purchase, lease, maintain, and eventually replace hundreds of pieces of clinical and support equipment—from CT scanners and ventilators to linen carts. If they can predict next year’s equipment‑related cost for each department using data already in the supply‑chain system, they can:

Negotiate service contracts and warranties before renewal dates,
Identify assets that should be replaced rather than repaired, and
Defend capital‑budget requests to the board.

We will build a Ridge‑regression model that predicts a department’s annual equipment cost in USD from routinely captured inventory and utilisation variables:

equipment age and original purchase price
category (diagnostic / therapeutic / facilities)
annual usage hours or scan counts
number of corrective and preventive work orders
whether the item is under a service contract
location type (in‑patient, ambulatory, support)
calendar year (captures inflation and supply‑chain shocks)

Ridge regression maintains a linear relationship—every input has a fixed coefficient—while its L2 penalty prevents unstable weights when correlated variables (such as age, usage, and repairs) move in tandem.

Libraries Required

pandas # data loading and cleaning
numpy # numeric helpers
matplotlib.pyplot # optional quick plots
scikit‑learn # preprocessing, RidgeCV, metrics
joblib # persist the fitted model

Dataset Link

Hospital Supply Chain dataset

Step-by-Step Code Implementation

1. Import Libraries

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import OneHotEncoder, StandardScaler
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.linear_model import RidgeCV
from sklearn.metrics import r2_score, mean_absolute_error
import joblib

2. Load the dataset

# file from Kaggle link below
df = pd.read_csv("hospital_supply_chain.csv")
print(df.head())

Expected columns

column	example value
total_equipment_cost_usd	147 320
purchase_price_usd	680 000
equip_age_yrs	6.4
annual_usage_hrs	1 980
corrective_WOs	11
preventive_WOs	4
service_contract	Yes / No
equip_category	Diagnostic
location_type	In‑patient
year	2022

3. Minimal cleaning

Bring purchase price (hundreds k) and work‑order counts (single digits) onto similar variance so Ridge’s penalty treats them evenly.

req = ['total_equipment_cost_usd','purchase_price_usd','equip_age_yrs',
       'annual_usage_hrs','corrective_WOs','preventive_WOs',
       'service_contract','equip_category','location_type','year']
df = df.dropna(subset=req).copy()

4. Define feature groups

num_cols = ['purchase_price_usd','equip_age_yrs','annual_usage_hrs',
            'corrective_WOs','preventive_WOs']
cat_cols = ['service_contract','equip_category','location_type','year']
target   = 'total_equipment_cost_usd'

X = df[num_cols + cat_cols]
y = df[target]

5. Pre‑processing and Ridge pipeline

Converts yes/no contract flag, category and location into binary columns; dropping the first level prevents perfect collinearity.

preprocess = ColumnTransformer([
        ('cat', OneHotEncoder(drop='first', handle_unknown='ignore'), cat_cols),
        ('num', StandardScaler(),                                  num_cols)
])

ridge = RidgeCV(alphas=[0.1, 1, 10, 50, 100, 250], cv=5)

model = Pipeline([
        ('prep',  preprocess),
        ('ridge', ridge)
])

6. Train‑test split and fit

X_train, X_test, y_train, y_test = train_test_split(
        X, y, test_size=0.2, random_state=42, shuffle=True)

model.fit(X_train, y_train)

7. Evaluate hold‑out accuracy

Five-fold cross-validation selects the α that minimises validation error, automatically balancing bias and variance.

pred = model.predict(X_test)

print(f"α chosen by CV : {model.named_steps['ridge'].alpha_}")
print(f"R² (test set)  : {r2_score(y_test, pred):.3f}")
print(f"MAE (test set) : ${mean_absolute_error(y_test, pred):,.0f}")

8. Coefficient inspection

Coefficients remain dollars. Example: a + $ 95,000 coefficient on equip_age_yrs (per σ) highlights how older devices drive costs; a − $ 12,000 weight on service_contract_Yes indicates that contracted equipment tends to cost less per year.

ohe = model.named_steps['prep'].named_transformers_['cat']
ohe_names = ohe.get_feature_names_out(cat_cols)
feature_names = np.concatenate([ohe_names, num_cols])

coefs = (pd.Series(model.named_steps['ridge'].coef_, index=feature_names)
         .sort_values())

print("\nCost reducers (most negative):")
print(coefs.head(6))

print("\nCost drivers (most positive):")
print(coefs.tail(6))

Numeric coefficients represent USD change for a one‑standard‑deviation increase; dummy‑variable coefficients show the dollar shift relative to the reference level.

9. Persist for production dashboards

The purchase price, age, and usage are correlated; OLS can assign unstable, contradictory weights. Ridge stabilises the solution while preserving a linear business story.

joblib.dump(model, "ridge_hospital_equip_cost.pkl")

Summary

By coupling standard preprocessing with Ridge regression, we produced an explainable hospital‑equipment cost predictor:

Immediate value: Supply-chain managers can preview next year’s maintenance and depreciation costs by department.
Transparent levers: Each coefficient directly maps to an actionable variable (e.g., age, usage, service contract).
Benchmark: Any booster model must beat this Ridge MAE and remain interpretable enough for capital-planning committees.

Did you like our efforts? If Yes, please give ProjectGurukul 5 Stars on Google | Facebook