Equipment Rental Price Prediction using Linear Regression in ML

FREE Online Courses: Enroll Now, Thank us Later!

Rental‐equipment brokers and construction planners need a fast, data‑driven way to quote a fair daily rental price for excavators, dozers, loaders, lifts, and other heavy machinery. Quoting too low a margin chases customers to competitors.

Using the open “Heavy Equipment Pricing Data” set on Kaggle—whose rows list make, model, year, engine power, meter hours, geographic region, and observed market price for thousands of machines—we’ll train a linear‑regression baseline that predicts a machine’s expected daily rental rate (USD) from its physical specs and age.* A transparent line tells fleet managers exactly how strongly each spec nudges the price and supplies a benchmark before moving to tree ensembles or hedonic‑index methods.

While the dataset records advertised sale prices, industry practice prices rentals as a percentage of the current market value (often 4 – 6 % of resale price per month). We first convert the sale price to an estimated daily rental rate (≈ 0.2 × sale‑price / 30) and then model that figure.

Libraries Required

pandas # data wrangling
numpy # numerical helpers
matplotlib.pyplot # quick EDA plots
scikit‑learn # preprocessing, linear model, metrics
joblib # save the trained pipeline

Dataset Link

Heavy Equipment Pricing Data

Step-by-Step Code Implementation

Why linear regression first? Rental companies typically quote rates as a base value plus additive premiums for high horsepower, young age, or premium brands. A straight‑line model captures these additive effects and outputs coefficients that managers can verify against rules of thumb.

1. Import Libraries

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import OneHotEncoder, StandardScaler
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score, mean_absolute_error
import joblib

2. Load the data

df = pd.read_csv("heavy_equipment_pricing.csv")     # rename after unzip
print(df.head())

Minimum columns you should see

column	sample values
make	Caterpillar, Komatsu
model	320DL, PC200 …
year	2008
horsepower	150
meter_hours	6 400
region	Southeast
sale_price_usd	142 000

3. Create daily rental‑rate target

MONTHLY_FACTOR = 0.05          # 5 % of value per month
df['daily_rent_usd'] = df['sale_price_usd'] * MONTHLY_FACTOR / 30.0

4. Quick cleaning

core_cols = ['daily_rent_usd', 'make', 'model', 'year',
             'horsepower', 'meter_hours', 'region']
df = df.dropna(subset=core_cols).copy()

# machine age in years
CURRENT_YEAR = 2025
df['age_yrs'] = CURRENT_YEAR - df['year']

5. Define features & label

Standard scaling puts horsepower, meter hours, and age on equal variance, so coefficients read in precise $ per‑σ units.

num_cols = ['horsepower', 'meter_hours', 'age_yrs']
cat_cols = ['make', 'model', 'region']
target   = 'daily_rent_usd'

X = df[num_cols + cat_cols]
y = df[target]

6. Pre‑processing & pipeline

One‑hot encoding grants every make/model/region its own dollar bump without pretending to be ordinal.

pre = ColumnTransformer([
        ('cat', OneHotEncoder(handle_unknown='ignore'), cat_cols),
        ('num', StandardScaler(),                      num_cols)
])

lin = LinearRegression()

pipe = Pipeline([
        ('prep',  pre),
        ('model', lin)
])

7. Train‑test split & training

X_train, X_test, y_train, y_test = train_test_split(
        X, y, test_size=0.2, random_state=42, shuffle=True)

pipe.fit(X_train, y_train)

8. Evaluation

MAE in dollars‑per‑day tells the sales desk exactly how wide their typical pricing error is—e.g., ±$45/day

y_pred = pipe.predict(X_test)
print(f"R²  : {r2_score(y_test, y_pred):.3f}")
print(f"MAE : ${mean_absolute_error(y_test, y_pred):,.0f} per day")

9. Inspect price drivers

The coefficient table highlights actionable levers: a large positive bump for “Caterpillar 320” or “West Region” indicates where premium pricing is defensible. At the same time, a significant negative for “>10 000 h meter” quantifies the discount older machines command.

ohe_feats = pipe.named_steps['prep']\
                .named_transformers_['cat']\
                .get_feature_names_out(cat_cols)

all_feats = list(ohe_feats) + num_cols
coefs = (pd.Series(pipe.named_steps['model'].coef_, index=all_feats)
         .sort_values())

print("\nDiscount factors (negative coefficients):")
print(coefs.head(8))
print("\nPremium factors (positive coefficients):")
print(coefs.tail(8))

Because numeric inputs are z‑scored, each coefficient reads as dollars‑per‑day change for a one σ shift in that feature.

10. Persist the pipeline

Joblib persistence packages both preprocessing and weights; tomorrow’s quoting tool can joblib.load() the .pkl and return an instant rate for any new machine spec.

joblib.dump(pipe, "equipment_rent_linreg.pkl")

Summary

With under 120 lines of Python, we transformed raw resale listings into an explainable equipment‑rental pricing engine:

Instant rate recommendations help sales teams provide accurate quotes quickly and consistently.
Transparent elasticities reveal exactly how specs, age, usage, and brand affect daily rent.

Keep this linear baseline as your yardstick; every boosted‑tree or deep‑hedonic model you deploy next must beat its mean‑absolute‑error while still making sense to the rental‑fleet manager.

We work very hard to provide you quality material
Could you take 15 seconds and share your happy experience on Google | Facebook