Equipment Rental Price Prediction using Linear Regression in ML
FREE Online Courses: Enroll Now, Thank us Later!
Rental‐equipment brokers and construction planners need a fast, data‑driven way to quote a fair daily rental price for excavators, dozers, loaders, lifts, and other heavy machinery. Quoting too low a margin chases customers to competitors.
Using the open “Heavy Equipment Pricing Data” set on Kaggle—whose rows list make, model, year, engine power, meter hours, geographic region, and observed market price for thousands of machines—we’ll train a linear‑regression baseline that predicts a machine’s expected daily rental rate (USD) from its physical specs and age.* A transparent line tells fleet managers exactly how strongly each spec nudges the price and supplies a benchmark before moving to tree ensembles or hedonic‑index methods.
While the dataset records advertised sale prices, industry practice prices rentals as a percentage of the current market value (often 4 – 6 % of resale price per month). We first convert the sale price to an estimated daily rental rate (≈ 0.2 × sale‑price / 30) and then model that figure.
Libraries Required
- pandas # data wrangling
- numpy # numerical helpers
- matplotlib.pyplot # quick EDA plots
- scikit‑learn # preprocessing, linear model, metrics
- joblib # save the trained pipeline
Dataset Link
Step-by-Step Code Implementation
Why linear regression first? Rental companies typically quote rates as a base value plus additive premiums for high horsepower, young age, or premium brands. A straight‑line model captures these additive effects and outputs coefficients that managers can verify against rules of thumb.
1. Import Libraries
import pandas as pd import numpy as np import matplotlib.pyplot as plt from sklearn.model_selection import train_test_split from sklearn.preprocessing import OneHotEncoder, StandardScaler from sklearn.compose import ColumnTransformer from sklearn.pipeline import Pipeline from sklearn.linear_model import LinearRegression from sklearn.metrics import r2_score, mean_absolute_error import joblib
2. Load the data
df = pd.read_csv("heavy_equipment_pricing.csv") # rename after unzip
print(df.head())
Minimum columns you should see
| column | sample values |
| make | Caterpillar, Komatsu |
| model | 320DL, PC200 … |
| year | 2008 |
| horsepower | 150 |
| meter_hours | 6 400 |
| region | Southeast |
| sale_price_usd | 142 000 |
3. Create daily rental‑rate target
MONTHLY_FACTOR = 0.05 # 5 % of value per month df['daily_rent_usd'] = df['sale_price_usd'] * MONTHLY_FACTOR / 30.0
4. Quick cleaning
core_cols = ['daily_rent_usd', 'make', 'model', 'year',
'horsepower', 'meter_hours', 'region']
df = df.dropna(subset=core_cols).copy()
# machine age in years
CURRENT_YEAR = 2025
df['age_yrs'] = CURRENT_YEAR - df['year']
5. Define features & label
Standard scaling puts horsepower, meter hours, and age on equal variance, so coefficients read in precise $ per‑σ units.
num_cols = ['horsepower', 'meter_hours', 'age_yrs'] cat_cols = ['make', 'model', 'region'] target = 'daily_rent_usd' X = df[num_cols + cat_cols] y = df[target]
6. Pre‑processing & pipeline
One‑hot encoding grants every make/model/region its own dollar bump without pretending to be ordinal.
pre = ColumnTransformer([
('cat', OneHotEncoder(handle_unknown='ignore'), cat_cols),
('num', StandardScaler(), num_cols)
])
lin = LinearRegression()
pipe = Pipeline([
('prep', pre),
('model', lin)
])
7. Train‑test split & training
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42, shuffle=True)
pipe.fit(X_train, y_train)
8. Evaluation
MAE in dollars‑per‑day tells the sales desk exactly how wide their typical pricing error is—e.g., ±$45/day
y_pred = pipe.predict(X_test)
print(f"R² : {r2_score(y_test, y_pred):.3f}")
print(f"MAE : ${mean_absolute_error(y_test, y_pred):,.0f} per day")
9. Inspect price drivers
The coefficient table highlights actionable levers: a large positive bump for “Caterpillar 320” or “West Region” indicates where premium pricing is defensible. At the same time, a significant negative for “>10 000 h meter” quantifies the discount older machines command.
ohe_feats = pipe.named_steps['prep']\
.named_transformers_['cat']\
.get_feature_names_out(cat_cols)
all_feats = list(ohe_feats) + num_cols
coefs = (pd.Series(pipe.named_steps['model'].coef_, index=all_feats)
.sort_values())
print("\nDiscount factors (negative coefficients):")
print(coefs.head(8))
print("\nPremium factors (positive coefficients):")
print(coefs.tail(8))
Because numeric inputs are z‑scored, each coefficient reads as dollars‑per‑day change for a one σ shift in that feature.
10. Persist the pipeline
Joblib persistence packages both preprocessing and weights; tomorrow’s quoting tool can joblib.load() the .pkl and return an instant rate for any new machine spec.
joblib.dump(pipe, "equipment_rent_linreg.pkl")
Summary
With under 120 lines of Python, we transformed raw resale listings into an explainable equipment‑rental pricing engine:
- Instant rate recommendations help sales teams provide accurate quotes quickly and consistently.
- Transparent elasticities reveal exactly how specs, age, usage, and brand affect daily rent.
Keep this linear baseline as your yardstick; every boosted‑tree or deep‑hedonic model you deploy next must beat its mean‑absolute‑error while still making sense to the rental‑fleet manager.