Library Book Borrowing Trend Prediction with Polynomial Regression in ML
We offer you a brighter future with FREE online courses - Start Now!!
Public‐library planners and branch managers need to forecast the week‑over‑week percentage change in the number of books borrowed—using only early‑week indicators such as prior‑week borrow volume, number of active patrons, new‐arrival counts, and day‑of‑week mix—before allocating staff and shelf space. Historical transaction logs show nonlinear patterns: borrow growth accelerates with promotional events up to a capacity limit, weekend vs. weekday mixes interact with new‐arrival buzz, and patron activity saturates after peak weeks. A simple linear model underfits these curves, while an unrestricted high‑degree polynomial overfits the noise. By fitting a Polynomial Regression model on engineered features with Ridge (ℓ²) regularisation, we capture smooth, interpretable borrowing‑trend curves and deliver accurate growth forecasts for proactive resource planning.
Dataset
Step-by-Step Code Implementation
1. Libraries Required
import pandas as pd # data loading & handling import numpy as np # numerical operations import matplotlib.pyplot as plt # plotting import seaborn as sns # visualization from sklearn.model_selection import train_test_split, GridSearchCV from sklearn.preprocessing import StandardScaler, PolynomialFeatures from sklearn.linear_model import Ridge from sklearn.pipeline import Pipeline from sklearn.metrics import mean_squared_error, r2_score
2. Load Data & Compute Weekly Metrics
import pandas as pd
# Load transactions
tx = pd.read_csv("data/library_transaction_dataset.csv", parse_dates=["checkout_time","return_time"])
# Define checkout week
tx["week_start"] = tx["checkout_time"].dt.to_period("W").apply(lambda r: r.start_time)
# Aggregate weekly borrow volume and active patrons
weekly = tx.groupby("week_start").agg({
"checkout_time": "count", # total borrows
"patron_id": pd.Series.nunique, # unique active patrons
"book_id": lambda x: x.nunique()# unique titles borrowed
}).rename(columns={
"checkout_time":"borrows",
"patron_id":"active_patrons",
"book_id":"unique_titles"
}).reset_index()
3. Feature Engineering & Target
- Lag features (borrows_prev, patrons_prev, titles_prev) capture momentum, patron activity saturation, and title diversity effects.
- Weekend percentage (weekend_pct) models the day‑of‑week mix impacts on borrowing patterns.
- PolynomialFeatures expands inputs into squared and interaction terms (e.g., borrows_prev², borrows_prev×titles_prev) to capture nonlinear saturation and synergy effects.
# Sort chronologically and create lag features
weekly = weekly.sort_values("week_start")
weekly["borrows_prev"] = weekly["borrows"].shift(1)
weekly["patrons_prev"] = weekly["active_patrons"].shift(1)
weekly["titles_prev"] = weekly["unique_titles"].shift(1)
# Derive weekday mix: fraction of weekend days in that week
tx["is_weekend"] = tx["checkout_time"].dt.weekday >= 5
weekend_pct = (tx.groupby("week_start")["is_weekend"].mean()*100).reset_index(name="weekend_pct")
weekly = weekly.merge(weekend_pct, on="week_start", how="left")
weekly.dropna(subset=["borrows_prev","patrons_prev","titles_prev"], inplace=True)
# Compute week‑over‑week borrow growth (%)
weekly["borrow_growth_pct"] = (
(weekly["borrows"] - weekly["borrows_prev"])
/ weekly["borrows_prev"] * 100
)
# Features & target
feature_cols = [
"borrows_prev","patrons_prev","titles_prev","weekend_pct"
]
X = weekly[feature_cols]
y = weekly["borrow_growth_pct"]
4. Build Polynomial Regression Pipeline
- StandardScaler zero‑means and unit‑scales predictors so Ridge’s ℓ² penalty treats each term uniformly.
- Ridge Regression applies ℓ² regularisation (alpha) to shrink noisy high‑order coefficients, preventing overfitting in the expanded feature space.
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler, PolynomialFeatures
from sklearn.linear_model import Ridge
pipe = Pipeline([
("scale", StandardScaler()),
("poly", PolynomialFeatures(include_bias=False)),
("ridge", Ridge(random_state=42))
])
5. Train/Test Split & Hyperparameter Search
GridSearchCV tunes polynomial degree (1–3) and regularisation strength α (10⁻³…10³) via 5‑fold CV, optimising for lowest RMSE on held‑out growth forecasts.
from sklearn.model_selection import GridSearchCV
# Time‑aware split
split = int(len(X) * 0.8)
X_train, X_test = X[:split], X[split:]
y_train, y_test = y[:split], y[split:]
param_grid = {
"poly__degree": [1, 2, 3],
"ridge__alpha": np.logspace(-3, 3, 7)
}
gs = GridSearchCV(
pipe, param_grid,
cv=5,
scoring="neg_root_mean_squared_error",
n_jobs=-1, verbose=1
)
gs.fit(X_train, y_train)
print("Best params:", gs.best_params_)
6. Evaluate Model
y_pred = gs.predict(X_test)
rmse = mean_squared_error(y_test, y_pred, squared=False)
r2 = r2_score(y_test, y_pred)
print(f"Test RMSE : {rmse:.2f}% growth")
print(f"Test R² : {r2:.3f}")
7. Inspect Top Polynomial Coefficients
Coefficient inspection reveals which nonlinear and interaction terms most influence predicted borrow‑growth—guiding targeted promotions, collection development, and staffing.
poly = gs.best_estimator_.named_steps["poly"]
feat_names = poly.get_feature_names_out(input_features=feature_cols)
coefs = gs.best_estimator_.named_steps["ridge"].coef_
import pandas as pd
import matplotlib.pyplot as plt
coef_series = pd.Series(coefs, index=feat_names).abs().sort_values(ascending=False).head(10)
plt.figure(figsize=(8,5))
coef_series.plot(kind="barh")
plt.gca().invert_yaxis()
plt.title("Top Polynomial Features Driving Borrow Growth")
plt.xlabel("Coefficient Magnitude")
plt.tight_layout()
plt.show()
Summary
This Polynomial Regression pipeline with Ridge regularisation delivers:
- Accurate nonlinear forecasts of borrow‑volume growth, capturing diminishing returns and synergistic effects (low RMSE, high R²).
- Controlled model complexity, avoiding overfitting via α tuning.
- Interpretability, with top-ranked polynomial features—such as squared prior borrows and interactions between borrows and title diversity—informs data‑driven service planning, marketing, and resource allocation.