Worker Productivity Curve Prediction with Polynomial Regression in ML

FREE Online Courses: Elevate Your Skills, Zero Cost Attached - Enroll Now!

Operations managers and HR analysts need to forecast individual worker productivity scores based on early‑week indicators—hours logged, task completion counts, collaboration events, and digital‑tool usage—before the week ends so that they can adjust staffing and support in real time. Empirical data show that productivity growth over the week follows a nonlinear curve: gains may plateau or even dip due to fatigue or task complexity. A simple linear model underfits these dynamics, while an unconstrained high‑degree polynomial overfits noise. By applying Polynomial Regression (linear regression on polynomially expanded features) with Ridge regularisation, we can capture smooth productivity curves and deliver reliable forecasts for proactive workforce management.

Dataset

Remote Worker Productivity Dataset

Step-by-Step Code Implementation

1. Libraries Required

import pandas as pd                                      # data manipulation  
import numpy as np                                       # numerical operations  

import matplotlib.pyplot as plt                          # plotting  
import seaborn as sns                                    # enhanced visualisation  

from sklearn.model_selection import train_test_split, GridSearchCV  
from sklearn.preprocessing import StandardScaler, PolynomialFeatures  
from sklearn.linear_model import Ridge  
from sklearn.pipeline import Pipeline  
from sklearn.metrics import mean_squared_error, r2_score

2. Load Libraries & Data

import pandas as pd

# Adjust filename as needed
df = pd.read_csv("data/remote_worker_productivity.csv")

# Preview relevant columns
df[['day_of_week','hours_worked','tasks_completed','meetings_count',
    'tool_usage_minutes','productivity_score']].head()

3. Exploratory Analysis

import seaborn as sns
import matplotlib.pyplot as plt

# Productivity vs hours shows curvature
sns.scatterplot(x='hours_worked', y='productivity_score', data=df, alpha=0.5)
plt.title("Hours Worked vs Productivity")
plt.xlabel("Hours Worked")
plt.ylabel("Productivity Score")
plt.show()

4. Feature Engineering & Target

PolynomialFeatures expands inputs to include squares and interactions (e.g., hours_worked², hours_worked×tasks_completed), capturing curvature and synergy effects in productivity growth.

# Use day‑of‑week as numeric (1=Mon…7=Sun)
df['day_num'] = df['day_of_week'].map({
    'Monday':1,'Tuesday':2,'Wednesday':3,
    'Thursday':4,'Friday':5,'Saturday':6,'Sunday':7
})

# Predictor matrix and target vector
X = df[['day_num','hours_worked','tasks_completed',
        'meetings_count','tool_usage_minutes']]
y = df['productivity_score']

5. Build Polynomial Regression Pipeline

Ridge regression applies shrinkage to control overfitting from high‑dimensional polynomial terms.

from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler, PolynomialFeatures
from sklearn.linear_model import Ridge

pipe = Pipeline([
    ('scale', StandardScaler()),  
    ('poly', PolynomialFeatures(include_bias=False)),  
    ('ridge', Ridge(random_state=42))  
])

6. Train/Test Split & Hyperparameter Search

  • GridSearchCV tunes the polynomial degree (1–3) and Ridge α (10⁻³ to 10³) across 5‑fold CV, optimising for the lowest RMSE on held‑out folds.
  • StandardScaler normalises each feature, so Ridge’s ℓ² penalty treats them equally.
from sklearn.model_selection import train_test_split, GridSearchCV
import numpy as np

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

param_grid = {
    'poly__degree' : [1, 2, 3],
    'ridge__alpha' : np.logspace(-3, 3, 7)
}

gs = GridSearchCV(
    pipe, param_grid,
    cv=5,
    scoring='neg_root_mean_squared_error',
    n_jobs=-1, verbose=1
)
gs.fit(X_train, y_train)

print("Best parameters:", gs.best_params_)

7. Evaluate Model

from sklearn.metrics import mean_squared_error, r2_score

y_pred = gs.predict(X_test)
rmse = mean_squared_error(y_test, y_pred, squared=False)
r2   = r2_score(y_test, y_pred)

print(f"Test RMSE: {rmse:.2f} points")
print(f"Test R²  : {r2:.3f}")

8. Inspect Key Polynomial Coefficients

Coefficient inspection highlights which nonlinear or interaction terms most influence predicted productivity, guiding operational interventions (e.g., optimal hours/tasks balance).

# Retrieve feature names after polynomial expansion
poly = gs.best_estimator_.named_steps['poly']
feat_names = poly.get_feature_names_out(input_features=X.columns)

# Get Ridge coefficients
coefs = gs.best_estimator_.named_steps['ridge'].coef_

# Present top 10 by absolute value
import pandas as pd
coef_series = pd.Series(coefs, index=feat_names)
top10 = coef_series.abs().sort_values(ascending=False).head(10)

plt.figure(figsize=(8,5))
top10.plot(kind='barh')
plt.gca().invert_yaxis()
plt.title("Top Polynomial Features Driving Productivity")
plt.xlabel("Coefficient Magnitude")
plt.tight_layout()
plt.show()

Summary

By integrating polynomial feature engineering with Ridge regularisation in a concise pipeline, we achieve:

  1. Accurate, nonlinear forecasts of worker productivity (low RMSE, strong R²).
  2. Controlled complexity to avoid overfitting while capturing essential curve effects (diminishing returns, synergy).
  3. Interpretability: the most influential polynomial terms (e.g., hours_worked², hours_worked×tasks_completed) reveal actionable levers for workforce planning and real‑time support.

Did you know we work 24x7 to provide you best tutorials
Please encourage us - write a review on Google | Facebook

ProjectGurukul Team

ProjectGurukul Team specializes in creating project-based learning resources for programming, Java, Python, Android, AI, Webdevelopment and machine learning. Our mission is to help learners build practical skills through engaging, hands-on projects. We also offer free major and minor projects with source code for engineering students

Leave a Reply

Your email address will not be published. Required fields are marked *