Clinic Equipment Cost Prediction with Ridge Regression in ML

FREE Online Courses: Transform Your Career – Enroll for Free!

Outpatient clinics replace or service dozens of devices every year, including autoclaves, ultrasound scanners, vital-sign monitors, exam tables, and even simple otoscopes. Finance teams need a forward‑looking estimate of each department’s equipment cost for the coming fiscal year so they can:

decide whether to buy, lease, or extend warranties,
negotiate service‑contract pricing, and
defend capital‑budget requests to executives.

We will build a Ridge‑regression model that predicts a clinic’s annual equipment cost (USD) from information already stored in the asset‑management system:

purchase price and equipment age
utilisation hours (patient scans, autoclave cycles, etc.)
preventive and corrective work‑order counts
service‑contract status (in or out of warranty)
equipment category (diagnostic/therapeutic / support)
clinic type (primary‑care, surgical centre, imaging)
reporting year (captures inflation)

Ridge regression maintains a linear model—each coefficient directly translates into a dollar effect—while its L2 penalty stabilises weights when correlated variables (age, work orders, utilisation) move together.

Libraries Required

pandas # data wrangling
numpy # numeric helpers
matplotlib.pyplot # optional diagnostics
scikit‑learn # preprocessing, RidgeCV, metrics
joblib # save & reload the fitted pipeline

Dataset Link

Medical Equipment Spare Parts Inventories

Step-by-Step Code Implementation

Import Libraries

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.linear_model import RidgeCV
from sklearn.metrics import r2_score, mean_absolute_error
import joblib

Load the dataset

df = pd.read_csv("medical_equipment_spare_parts_Inventories.csv")      # path after un‑zipping
print(df.head())

Minimal cleaning

Purchase price (hundreds k) and work‑order counts (single‑digit) now share unit variance; Ridge’s L2 penalty can shrink them proportionally.

req = ['annual_cost_usd','purchase_price_usd','equip_age_yrs',
       'utilisation_hours','pm_workorders','cm_workorders',
       'service_contract','equip_category','clinic_type','year']
df = df.dropna(subset=req).copy()

Define feature blocks

Converts the yes/no contract flag, equipment category, clinic type, and calendar year into binary columns; dropping the first level prevents the dummy-variable trap.

num_cols = ['purchase_price_usd','equip_age_yrs','utilisation_hours',
            'pm_workorders','cm_workorders']
cat_cols = ['service_contract','equip_category','clinic_type','year']
target   = 'annual_cost_usd'

X = df[num_cols + cat_cols]
y = df[target]

Pre‑processing & Ridge pipeline

Five‑fold cross‑validation selects the α value that minimises the validation error, automatically balancing bias and variance.

pre = ColumnTransformer([
        ('cats', OneHotEncoder(drop='first', handle_unknown='ignore'), cat_cols),
        ('nums', StandardScaler(),                                  num_cols)
])

ridge = RidgeCV(alphas=[0.1, 1, 10, 50, 100, 300], cv=5)

pipe = Pipeline([
        ('prep',  pre),
        ('model', ridge)
])

Train‑test split & fit

X_train, X_test, y_train, y_test = train_test_split(
        X, y, test_size=0.2, random_state=42, shuffle=True)

pipe.fit(X_train, y_train)

Evaluate hold‑out accuracy

pred = pipe.predict(X_test)

print(f"α selected by CV : {pipe.named_steps['model'].alpha_}")
print(f"Test‑set R²      : {r2_score(y_test, pred):.3f}")
print(f"Test‑set MAE     : ${mean_absolute_error(y_test, pred):,.0f}")

Inspect cost drivers

A +$95,000 coefficient on equip_age_yrs (per standard deviation) quantifies how much older devices increase annual cost; a −$ 12,000 coefficient on service_contract_Yes indicates that contracted equipment usually costs less overall.

ohe = pipe.named_steps['prep'].named_transformers_['cats']
ohe_names = ohe.get_feature_names_out(cat_cols)
feature_names = np.concatenate([ohe_names, num_cols])

coefs = (pd.Series(pipe.named_steps['model'].coef_, index=feature_names)
         .sort_values())

print("\nCost reducers (most negative):")
print(coefs.head(6))

print("\nCost drivers (most positive):")
print(coefs.tail(6))

Numeric coefficients show USD change for a one‑standard‑deviation increase; categorical flags are dollar shifts versus the reference category.

Persist for dashboards

Purchase price, age and utilisation are correlated. OLS can assign unstable or even opposite‑sign weights. Ridge stabilises them while keeping the business narrative clear.

joblib.dump(pipe, "ridge_clinic_equipment_cost.pkl")

Summary

With just over a hundred lines of Python, we transformed routine asset‑management data into an explainable clinic‑equipment cost forecaster:

Budgeting power: Finance can project next year’s repair and depreciation spend by department.
Transparent levers: Every coefficient directly ties cost to age, usage, service contract status, or category.
Strong baseline: Any future gradient‑boosted or Bayesian model must beat this Ridge model’s mean‑absolute error and remain interpretable for capital‑planning committees.

If you are Happy with ProjectGurukul, do not forget to make us happy with your positive feedback on Google | Facebook