Clinic Equipment Cost Prediction with Ridge Regression in ML
FREE Online Courses: Transform Your Career – Enroll for Free!
Outpatient clinics replace or service dozens of devices every year, including autoclaves, ultrasound scanners, vital-sign monitors, exam tables, and even simple otoscopes. Finance teams need a forward‑looking estimate of each department’s equipment cost for the coming fiscal year so they can:
- decide whether to buy, lease, or extend warranties,
- negotiate service‑contract pricing, and
- defend capital‑budget requests to executives.
We will build a Ridge‑regression model that predicts a clinic’s annual equipment cost (USD) from information already stored in the asset‑management system:
- purchase price and equipment age
- utilisation hours (patient scans, autoclave cycles, etc.)
- preventive and corrective work‑order counts
- service‑contract status (in or out of warranty)
- equipment category (diagnostic/therapeutic / support)
- clinic type (primary‑care, surgical centre, imaging)
- reporting year (captures inflation)
Ridge regression maintains a linear model—each coefficient directly translates into a dollar effect—while its L2 penalty stabilises weights when correlated variables (age, work orders, utilisation) move together.
Libraries Required
- pandas # data wrangling
- numpy # numeric helpers
- matplotlib.pyplot # optional diagnostics
- scikit‑learn # preprocessing, RidgeCV, metrics
- joblib # save & reload the fitted pipeline
Dataset Link
Medical Equipment Spare Parts Inventories
Step-by-Step Code Implementation
Import Libraries
import pandas as pd import numpy as np import matplotlib.pyplot as plt from sklearn.model_selection import train_test_split from sklearn.preprocessing import StandardScaler, OneHotEncoder from sklearn.compose import ColumnTransformer from sklearn.pipeline import Pipeline from sklearn.linear_model import RidgeCV from sklearn.metrics import r2_score, mean_absolute_error import joblib
Load the dataset
df = pd.read_csv("medical_equipment_spare_parts_Inventories.csv") # path after un‑zipping
print(df.head())
Minimal cleaning
Purchase price (hundreds k) and work‑order counts (single‑digit) now share unit variance; Ridge’s L2 penalty can shrink them proportionally.
req = ['annual_cost_usd','purchase_price_usd','equip_age_yrs',
'utilisation_hours','pm_workorders','cm_workorders',
'service_contract','equip_category','clinic_type','year']
df = df.dropna(subset=req).copy()
Define feature blocks
Converts the yes/no contract flag, equipment category, clinic type, and calendar year into binary columns; dropping the first level prevents the dummy-variable trap.
num_cols = ['purchase_price_usd','equip_age_yrs','utilisation_hours',
'pm_workorders','cm_workorders']
cat_cols = ['service_contract','equip_category','clinic_type','year']
target = 'annual_cost_usd'
X = df[num_cols + cat_cols]
y = df[target]
Pre‑processing & Ridge pipeline
Five‑fold cross‑validation selects the α value that minimises the validation error, automatically balancing bias and variance.
pre = ColumnTransformer([
('cats', OneHotEncoder(drop='first', handle_unknown='ignore'), cat_cols),
('nums', StandardScaler(), num_cols)
])
ridge = RidgeCV(alphas=[0.1, 1, 10, 50, 100, 300], cv=5)
pipe = Pipeline([
('prep', pre),
('model', ridge)
])
Train‑test split & fit
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42, shuffle=True)
pipe.fit(X_train, y_train)
Evaluate hold‑out accuracy
pred = pipe.predict(X_test)
print(f"α selected by CV : {pipe.named_steps['model'].alpha_}")
print(f"Test‑set R² : {r2_score(y_test, pred):.3f}")
print(f"Test‑set MAE : ${mean_absolute_error(y_test, pred):,.0f}")
Inspect cost drivers
A +$95,000 coefficient on equip_age_yrs (per standard deviation) quantifies how much older devices increase annual cost; a −$ 12,000 coefficient on service_contract_Yes indicates that contracted equipment usually costs less overall.
ohe = pipe.named_steps['prep'].named_transformers_['cats']
ohe_names = ohe.get_feature_names_out(cat_cols)
feature_names = np.concatenate([ohe_names, num_cols])
coefs = (pd.Series(pipe.named_steps['model'].coef_, index=feature_names)
.sort_values())
print("\nCost reducers (most negative):")
print(coefs.head(6))
print("\nCost drivers (most positive):")
print(coefs.tail(6))
Numeric coefficients show USD change for a one‑standard‑deviation increase; categorical flags are dollar shifts versus the reference category.
Persist for dashboards
Purchase price, age and utilisation are correlated. OLS can assign unstable or even opposite‑sign weights. Ridge stabilises them while keeping the business narrative clear.
joblib.dump(pipe, "ridge_clinic_equipment_cost.pkl")
Summary
With just over a hundred lines of Python, we transformed routine asset‑management data into an explainable clinic‑equipment cost forecaster:
- Budgeting power: Finance can project next year’s repair and depreciation spend by department.
- Transparent levers: Every coefficient directly ties cost to age, usage, service contract status, or category.
- Strong baseline: Any future gradient‑boosted or Bayesian model must beat this Ridge model’s mean‑absolute error and remain interpretable for capital‑planning committees.