Library Book Borrowing Trend Prediction with Polynomial Regression in ML

We offer you a brighter future with FREE online courses - Start Now!!

Public‐library planners and branch managers need to forecast the week‑over‑week percentage change in the number of books borrowed—using only early‑week indicators such as prior‑week borrow volume, number of active patrons, new‐arrival counts, and day‑of‑week mix—before allocating staff and shelf space. Historical transaction logs show nonlinear patterns: borrow growth accelerates with promotional events up to a capacity limit, weekend vs. weekday mixes interact with new‐arrival buzz, and patron activity saturates after peak weeks. A simple linear model underfits these curves, while an unrestricted high‑degree polynomial overfits the noise. By fitting a Polynomial Regression model on engineered features with Ridge (ℓ²) regularisation, we capture smooth, interpretable borrowing‑trend curves and deliver accurate growth forecasts for proactive resource planning.

Dataset

Library Transaction Dataset

Step-by-Step Code Implementation

1. Libraries Required

import pandas as pd                            # data loading & handling  
import numpy as np                             # numerical operations  

import matplotlib.pyplot as plt                # plotting  
import seaborn as sns                          # visualization  

from sklearn.model_selection import train_test_split, GridSearchCV  
from sklearn.preprocessing import StandardScaler, PolynomialFeatures  
from sklearn.linear_model import Ridge  
from sklearn.pipeline import Pipeline  
from sklearn.metrics import mean_squared_error, r2_score

2. Load Data & Compute Weekly Metrics

import pandas as pd

# Load transactions
tx = pd.read_csv("data/library_transaction_dataset.csv", parse_dates=["checkout_time","return_time"])

# Define checkout week
tx["week_start"] = tx["checkout_time"].dt.to_period("W").apply(lambda r: r.start_time)

# Aggregate weekly borrow volume and active patrons
weekly = tx.groupby("week_start").agg({
    "checkout_time":    "count",             # total borrows
    "patron_id":        pd.Series.nunique,   # unique active patrons
    "book_id":          lambda x: x.nunique()# unique titles borrowed
}).rename(columns={
    "checkout_time":"borrows",
    "patron_id":"active_patrons",
    "book_id":"unique_titles"
}).reset_index()

3. Feature Engineering & Target

  • Lag features (borrows_prev, patrons_prev, titles_prev) capture momentum, patron activity saturation, and title diversity effects.
  • Weekend percentage (weekend_pct) models the day‑of‑week mix impacts on borrowing patterns.
  • PolynomialFeatures expands inputs into squared and interaction terms (e.g., borrows_prev², borrows_prev×titles_prev) to capture nonlinear saturation and synergy effects.
# Sort chronologically and create lag features
weekly = weekly.sort_values("week_start")
weekly["borrows_prev"]        = weekly["borrows"].shift(1)
weekly["patrons_prev"]        = weekly["active_patrons"].shift(1)
weekly["titles_prev"]         = weekly["unique_titles"].shift(1)

# Derive weekday mix: fraction of weekend days in that week
tx["is_weekend"] = tx["checkout_time"].dt.weekday >= 5
weekend_pct = (tx.groupby("week_start")["is_weekend"].mean()*100).reset_index(name="weekend_pct")
weekly = weekly.merge(weekend_pct, on="week_start", how="left")

weekly.dropna(subset=["borrows_prev","patrons_prev","titles_prev"], inplace=True)

# Compute week‑over‑week borrow growth (%)
weekly["borrow_growth_pct"] = (
    (weekly["borrows"] - weekly["borrows_prev"])
    / weekly["borrows_prev"] * 100
)

# Features & target
feature_cols = [
    "borrows_prev","patrons_prev","titles_prev","weekend_pct"
]
X = weekly[feature_cols]
y = weekly["borrow_growth_pct"]

4. Build Polynomial Regression Pipeline

  • StandardScaler zero‑means and unit‑scales predictors so Ridge’s ℓ² penalty treats each term uniformly.
  • Ridge Regression applies ℓ² regularisation (alpha) to shrink noisy high‑order coefficients, preventing overfitting in the expanded feature space.
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler, PolynomialFeatures
from sklearn.linear_model import Ridge

pipe = Pipeline([
    ("scale", StandardScaler()),  
    ("poly", PolynomialFeatures(include_bias=False)),  
    ("ridge", Ridge(random_state=42))  
])

5. Train/Test Split & Hyperparameter Search

GridSearchCV tunes polynomial degree (1–3) and regularisation strength α (10⁻³…10³) via 5‑fold CV, optimising for lowest RMSE on held‑out growth forecasts.

from sklearn.model_selection import GridSearchCV

# Time‑aware split
split = int(len(X) * 0.8)
X_train, X_test = X[:split], X[split:]
y_train, y_test = y[:split], y[split:]

param_grid = {
    "poly__degree": [1, 2, 3],
    "ridge__alpha": np.logspace(-3, 3, 7)
}

gs = GridSearchCV(
    pipe, param_grid,
    cv=5,
    scoring="neg_root_mean_squared_error",
    n_jobs=-1, verbose=1
)
gs.fit(X_train, y_train)
print("Best params:", gs.best_params_)

6. Evaluate Model

y_pred = gs.predict(X_test)
rmse = mean_squared_error(y_test, y_pred, squared=False)
r2   = r2_score(y_test, y_pred)
print(f"Test RMSE : {rmse:.2f}% growth")
print(f"Test R²   : {r2:.3f}")

7. Inspect Top Polynomial Coefficients

Coefficient inspection reveals which nonlinear and interaction terms most influence predicted borrow‑growth—guiding targeted promotions, collection development, and staffing.

poly       = gs.best_estimator_.named_steps["poly"]
feat_names = poly.get_feature_names_out(input_features=feature_cols)
coefs      = gs.best_estimator_.named_steps["ridge"].coef_

import pandas as pd
import matplotlib.pyplot as plt

coef_series = pd.Series(coefs, index=feat_names).abs().sort_values(ascending=False).head(10)
plt.figure(figsize=(8,5))
coef_series.plot(kind="barh")
plt.gca().invert_yaxis()
plt.title("Top Polynomial Features Driving Borrow Growth")
plt.xlabel("Coefficient Magnitude")
plt.tight_layout()
plt.show()

Summary

This Polynomial Regression pipeline with Ridge regularisation delivers:

  1. Accurate nonlinear forecasts of borrow‑volume growth, capturing diminishing returns and synergistic effects (low RMSE, high R²).
  2. Controlled model complexity, avoiding overfitting via α tuning.
  3. Interpretability, with top-ranked polynomial features—such as squared prior borrows and interactions between borrows and title diversity—informs data‑driven service planning, marketing, and resource allocation.

Did you know we work 24x7 to provide you best tutorials
Please encourage us - write a review on Google | Facebook

ProjectGurukul Team

ProjectGurukul Team specializes in creating project-based learning resources for programming, Java, Python, Android, AI, Webdevelopment and machine learning. Our mission is to help learners build practical skills through engaging, hands-on projects. We also offer free major and minor projects with source code for engineering students

Leave a Reply

Your email address will not be published. Required fields are marked *