Tool Rental Cost Trend Prediction with Polynomial Regression in ML
FREE Online Courses: Your Passport to Excellence - Start Now
Rental company analysts need to forecast week‑over‑week changes in average tool‐rental cost (USD/day) for budgeting and dynamic pricing before operational adjustments. Historical rental logs indicate that cost changes depend nonlinearly on prior‐week average cost (momentum/saturation), rental volume (demand pressure), tool category mix (premium vs. standard), and seasonal factors (e.g., holiday spikes). A simple linear model underestimates curvature—such as price plateaus at high demand—while an unregularised high‑degree polynomial overfits noise. By fitting a Polynomial Regression on engineered features with Ridge (ℓ²) regularisation, we can capture smooth cost‐trend dynamics and deliver interpretable, accurate forecasts to guide pricing strategy.
Dataset
Step-by-Step Code Implementation
1. Libraries Required
import pandas as pd # data handling import numpy as np # numerical ops import matplotlib.pyplot as plt # plotting import seaborn as sns # visualization from sklearn.model_selection import train_test_split, GridSearchCV from sklearn.preprocessing import StandardScaler, PolynomialFeatures from sklearn.linear_model import Ridge from sklearn.pipeline import Pipeline from sklearn.metrics import mean_squared_error, r2_score
2. Load Data & Compute Features
import pandas as pd
# Load weekly rentals (adjust path)
df = pd.read_csv("data/commercial-tool-rental-data-for-2016-and-2017/rentals.csv")
# Compute average cost per day by week
df['rental_date'] = pd.to_datetime(df['rental_date'])
df['week'] = df['rental_date'].dt.to_period('W').apply(lambda r: r.start_time)
weekly = df.groupby('week').agg({
'daily_cost': 'mean',
'rental_id': 'count', # volume
'tool_category': lambda x: x.mode()[0]
}).rename(columns={'daily_cost':'avg_cost','rental_id':'volume'}).reset_index()
3. Target Engineering & Lag Features
- Lag features (cost_prev,volume_prev) capture momentum and demand saturation.
- One‑hot encoding of tool_category models category‑specific pricing effects.
- PolynomialFeatures generates squared and interaction terms—e.g., cost_prev², cost_prev × volume_prev, volume_prev × tool_category_Premium—to capture curvature and synergy in cost dynamics.
# Sort and lag
weekly = weekly.sort_values('week')
weekly['cost_prev'] = weekly['avg_cost'].shift(1)
weekly['volume_prev'] = weekly['volume'].shift(1)
weekly.dropna(subset=['cost_prev','volume_prev'], inplace=True)
# One‑hot encode category
weekly = pd.get_dummies(weekly, columns=['tool_category'], drop_first=True)
# Compute cost growth target
weekly['cost_growth_pct'] = (weekly['avg_cost'] - weekly['cost_prev']) / weekly['cost_prev'] * 100
# Features & target
feature_cols = ['cost_prev','volume_prev'] + \
[c for c in weekly.columns if c.startswith('tool_category_')]
X = weekly[feature_cols]
y = weekly['cost_growth_pct']
4. Build Polynomial Regression Pipeline
- StandardScaler normalizes inputs so Ridge’s ℓ² penalty treats all polynomial terms equally.
- Ridge Regression with alpha controls overfitting from high‑order terms.
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler, PolynomialFeatures
from sklearn.linear_model import Ridge
pipe = Pipeline([
('scale', StandardScaler()),
('poly', PolynomialFeatures(include_bias=False)),
('ridge', Ridge(random_state=42))
])
5. Train/Test Split & Hyperparameter Search
GridSearchCV tunes polynomial degree (1–3) and regularisation strength α (10⁻³…10³) via 5‑fold CV, optimising for lowest RMSE on held‑out growth predictions.
from sklearn.model_selection import train_test_split, GridSearchCV
import numpy as np
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, shuffle=False, random_state=42
)
param_grid = {
'poly__degree': [1, 2, 3],
'ridge__alpha': np.logspace(-3, 3, 7)
}
gs = GridSearchCV(
pipe, param_grid,
cv=5,
scoring='neg_root_mean_squared_error',
n_jobs=-1, verbose=1
)
gs.fit(X_train, y_train)
print("Best params:", gs.best_params_)
6. Evaluate Model
from sklearn.metrics import mean_squared_error, r2_score
y_pred = gs.predict(X_test)
rmse = mean_squared_error(y_test, y_pred, squared=False)
r2 = r2_score(y_test, y_pred)
print(f"Test RMSE: {rmse:.2f}% growth")
print(f"Test R² : {r2:.3f}")
7. Inspect Key Polynomial Coefficients
Coefficient inspection surfaces the most influential terms—guiding pricing actions, such as moderating cost during peak‐volume weeks or adjusting premium‐tool surcharges.
poly = gs.best_estimator_.named_steps['poly']
feat_names = poly.get_feature_names_out(input_features=feature_cols)
coefs = gs.best_estimator_.named_steps['ridge'].coef_
import pandas as pd
import matplotlib.pyplot as plt
coef_series = pd.Series(coefs, index=feat_names).abs().sort_values(ascending=False)
plt.figure(figsize=(8,5))
coef_series.head(10).plot(kind='barh')
plt.gca().invert_yaxis()
plt.title("Top Polynomial Features Driving Cost Growth")
plt.xlabel("Coefficient Magnitude")
plt.tight_layout()
plt.show()
Summary
This Polynomial Regression pipeline with Ridge regularisation provides:
1. Accurate nonlinear forecasts of weekly tool‐rental cost growth, capturing diminishing‐return and synergy effects (low RMSE, high R²).
2. Balanced complexity, avoiding overfitting through α tuning.
3. Interpretable insights, with clear identification of the polynomial features—like cost_prev² and cost_prev × volume_prev—that drive cost trends, enabling data‑driven dynamic pricing decisions.