Helpdesk Response Time Trend Prediction with Polynomial Regression in ML

FREE Online Courses: Transform Your Career – Enroll for Free!

Support‑desk managers need to forecast the average response time (hours) to incoming helpdesk tickets based on early incident attributes—such as ticket priority, issue category, time of day submitted, and historical daily ticket volume—before response SLAs are breached. Real‑world data show that response times depend nonlinearly on ticket load (queues saturate), that priority interacts with time‑of‑day (off‑peak vs. peak hours), and that specific categories (e.g., “Network”) incur larger delays. A plain linear model underfits these curvatures, while an unconstrained high‑degree polynomial overfits noise. By applying Polynomial Regression to engineered features with Ridge (ℓ²) regularisation, we capture smooth response-time trends and deliver reliable, interpretable forecasts to optimise staffing and SLA adherence.

Dataset

Customer Support Ticket Dataset

Step-by-Step Code Implementation

1. Libraries Required

import pandas as pd                            # data loading & handling  
import numpy as np                             # numerical operations  

import matplotlib.pyplot as plt                # plotting  
import seaborn as sns                          # enhanced visualization  

from sklearn.model_selection import train_test_split, GridSearchCV  
from sklearn.preprocessing import StandardScaler, PolynomialFeatures, OneHotEncoder  
from sklearn.compose import ColumnTransformer  
from sklearn.linear_model import Ridge  
from sklearn.pipeline import Pipeline  
from sklearn.metrics import mean_squared_error, r2_score

2. Load Data & Compute Response Time

import pandas as pd

# Load tickets (adjust path)
df = pd.read_csv("data/customer_support_ticket_dataset.csv", parse_dates=["CreatedTime","ResponseTime"])

# Compute response time in hours
df["ResponseHours"] = (df["ResponseTime"] - df["CreatedTime"]).dt.total_seconds() / 3600

# Inspect
df[['Priority','Category','CreatedTime','ResponseHours']].head()

3. Feature Engineering & Exploratory Analysis

  • Priority & Category: one‑hot encoded to model different SLA tiers and issue types.
  • HourOfDay: encodes time‑of‑day effects (peak vs. off‑peak).
  • DailyVolume: total tickets submitted per day, capturing queue load.
  • ResponseHours: computed in hours from ticket creation to first response.
import seaborn as sns
import matplotlib.pyplot as plt

# Derive time‑of‑day and daily volume
df["HourOfDay"] = df["CreatedTime"].dt.hour
daily_counts = df.groupby(df["CreatedTime"].dt.date).size().rename("DailyVolume")
df = df.merge(daily_counts, left_on=df["CreatedTime"].dt.date, right_index=True)

# Visualize nonlinear load effect
sns.scatterplot(x="DailyVolume", y="ResponseHours", hue="Priority", data=df, alpha=0.4)
plt.title("Daily Ticket Volume vs. Response Time")
plt.xlabel("Tickets per Day")
plt.ylabel("Response Time (hrs)")
plt.show()

4. Define Features & Target

PolynomialFeatures: generates squares and interactions—for example, DailyVolume² (queue saturation) and HourOfDay × Priority_High (peak‑hour priority effects).

# Select predictors
X = df[[
    "DailyVolume",       # queue load
    "HourOfDay",         # time‑of‑day
    "Priority",          # categorical priority
    "Category"           # categorical issue type
]]
y = df["ResponseHours"]

5. Build Polynomial Regression Pipeline

StandardScaler: normalises numeric features so Ridge’s ℓ² penalty treats them uniformly.

from sklearn.preprocessing import OneHotEncoder, StandardScaler, PolynomialFeatures
from sklearn.compose import ColumnTransformer
from sklearn.linear_model import Ridge
from sklearn.pipeline import Pipeline

# Preprocess numeric vs categorical
num_cols = ["DailyVolume","HourOfDay"]
cat_cols = ["Priority","Category"]

preprocessor = ColumnTransformer([
    ("num", StandardScaler(), num_cols),
    ("cat", OneHotEncoder(drop="first"), cat_cols)
])

pipe = Pipeline([
    ("prep", preprocessor),
    ("poly", PolynomialFeatures(include_bias=False)),
    ("ridge", Ridge(random_state=42))
])

6. Train/Test Split & Hyperparameter Search

  • GridSearchCV: selects optimal degree (1–3) and alpha (10⁻³…10³) via 5‑fold CV to minimise RMSE.
  • Ridge Regression: applies ℓ² regularisation (alpha) to shrink noisy high‑order coefficients and prevent overfitting.
from sklearn.model_selection import train_test_split, GridSearchCV
import numpy as np

# Split data
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# Grid of polynomial degrees and regularisation strengths
param_grid = {
    "poly__degree": [1, 2, 3],
    "ridge__alpha": np.logspace(-3, 3, 7)
}

gs = GridSearchCV(
    pipe, param_grid,
    cv=5,
    scoring="neg_root_mean_squared_error",
    n_jobs=-1, verbose=1
)
gs.fit(X_train, y_train)

print("Best parameters:", gs.best_params_)

7. Evaluate Model

# Predict and score
y_pred = gs.predict(X_test)
rmse = mean_squared_error(y_test, y_pred, squared=False)
r2   = r2_score(y_test, y_pred)

print(f"Test RMSE: {rmse:.2f} hrs")
print(f"Test R²  : {r2:.3f}")

8. Inspect Key Polynomial Coefficients

# Retrieve expanded feature names
prep      = gs.best_estimator_.named_steps["prep"]
cat_feats = prep.named_transformers_["cat"] \
                 .get_feature_names_out(cat_cols).tolist()
input_feats = num_cols + cat_feats

poly      = gs.best_estimator_.named_steps["poly"]
feat_names = poly.get_feature_names_out(input_features=input_feats)
coefs     = gs.best_estimator_.named_steps["ridge"].coef_

# Top 10 by magnitude
import pandas as pd
imp = pd.Series(coefs, index=feat_names).abs().sort_values(ascending=False).head(10)

import matplotlib.pyplot as plt
plt.figure(figsize=(8,5))
imp.plot(kind="barh")
plt.gca().invert_yaxis()
plt.title("Top Polynomial Features Driving Response Time")
plt.xlabel("Coefficient Magnitude")
plt.tight_layout()
plt.show()

Summary

This Polynomial Regression pipeline with Ridge regularisation delivers:

  • Accurate forecasting of helpdesk response times, capturing nonlinear queue and time‑of‑day dynamics (low RMSE, high R²).
  • Controlled complexity, avoiding overfitting via grid‑searched degree and ℓ² penalty.
  • Interpretable insights, with top polynomial features—such as DailyVolume² and HourOfDay×Priority_Critical—guiding staffing and prioritisation strategies to meet SLAs consistently.

Did we exceed your expectations?
If Yes, share your valuable feedback on Google | Facebook

ProjectGurukul Team

ProjectGurukul Team specializes in creating project-based learning resources for programming, Java, Python, Android, AI, Webdevelopment and machine learning. Our mission is to help learners build practical skills through engaging, hands-on projects. We also offer free major and minor projects with source code for engineering students

Leave a Reply

Your email address will not be published. Required fields are marked *