Worker Efficiency Prediction using Bayesian Regression in ML

We offer you a brighter future with FREE online courses - Start Now!!

Operations managers need to predict individual worker efficiency scores—quantified as a productivity index—before the end of the evaluation period, using early‐week indicators such as hours worked, number of tasks completed, break frequency, and tool‑usage metrics. Efficiency often shows nonlinear dependencies (e.g., diminishing returns on long hours, thresholds in task batching) and uncertainty due to human variability. By applying Bayesian Regression, we can produce both a point estimate of each worker’s expected efficiency and credible intervals that quantify our uncertainty, enabling targeted coaching and more informed staffing decisions.

Libraries Required

import pandas as pd                              # data loading & handling  
import numpy as np                               # numerical operations  

import matplotlib.pyplot as plt                  # plotting  
import seaborn as sns                            # enhanced visualization  

import pymc3 as pm                               # Bayesian modeling  
import arviz as az                               # posterior analysis  

from sklearn.model_selection import train_test_split  
from sklearn.preprocessing import StandardScaler  
from sklearn.metrics import mean_absolute_error  

Dataset

Remote Worker Productivity Dataset

Step-by-Step Code Implementation

Import Libraries & Load Data

import pandas as pd

# Load dataset
df = pd.read_csv("data/remote-worker-productivity-dataset/worker_productivity.csv")
df.head()

Preprocessing & Train/Test Split

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

# Select predictors and target
# Features: 'hours_worked','tasks_completed','breaks_taken','tool_usage_rate'
X = df[['hours_worked','tasks_completed','breaks_taken','tool_usage_rate']]
y = df['productivity_score']

# Split into train/test (80/20)
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# Standardize numeric predictors for stable MCMC sampling
scaler = StandardScaler().fit(X_train)
X_train_s = pd.DataFrame(scaler.transform(X_train), columns=X.columns)
X_test_s  = pd.DataFrame(scaler.transform(X_test),  columns=X.columns)

Define & Fit the Bayesian Regression Model

  • Priors (α,β,σ): We use weakly‑informative normals (σ=1) for coefficients and a half‑normal for noise, reflecting moderate uncertainty.
  • Linear predictor: μ = α + β·X_standardized learns the relationship between features and productivity.
  • MCMC sampling: 2,000 posterior draws (plus 1,000 tuning) with target_accept=0.9 for stable convergence.
  • Likelihood: Observed productivity scores are modelled as Normal(μ,σ).
import pymc3 as pm

with pm.Model() as model:
    # Priors for intercept and coefficients
    α = pm.Normal("α", mu=0, sigma=1)
    β = pm.Normal("β", mu=0, sigma=1, shape=X_train_s.shape[1])
    σ = pm.HalfNormal("σ", sigma=1)
    
    # Expected productivity
    μ = α + pm.math.dot(X_train_s, β)
    
    # Likelihood
    Y_obs = pm.Normal("Y_obs", mu=μ, sigma=σ, observed=y_train)
    
    # Sample posterior
    trace = pm.sample(
        draws=2000, tune=1000,
        target_accept=0.9,
        return_inferencedata=True
    )

Posterior Analysis & Point Predictions

  • Point predictions: We use posterior means of α and β for MAE evaluation.
  • Posterior predictive: Sampling from the posterior predictive distribution generates predictive uncertainty for each new input.
import arviz as az

# Summarize the posterior distributions
az.summary(trace, round_to=2)

# Posterior predictive sampling for training model
with model:
    ppc = pm.sample_posterior_predictive(trace, var_names=["Y_obs"])

# Compute posterior means of parameters
α_post = trace.posterior["α"].mean().item()
β_post = trace.posterior["β"].mean(dim=["chain","draw"]).values

# Point predictions on standardized test set
y_pred = α_post + X_test_s.dot(β_post)

# Evaluate with MAE
from sklearn.metrics import mean_absolute_error
mae = mean_absolute_error(y_test, y_pred)
print(f"Test MAE: {mae:.2f} productivity points")

Visualisation of Posterior Predictive with Credible Intervals

Plotting predictive mean and 94% credible bands against one feature (tasks completed) illustrates both central tendency and uncertainty in forecasts.

# Example: vary tasks_completed, hold others at median
tasks_grid = np.linspace(
    X_train_s['tasks_completed'].min(),
    X_train_s['tasks_completed'].max(), 50
)
grid = pd.DataFrame({
    'hours_worked':    X_train_s['hours_worked'].median(),
    'tasks_completed': tasks_grid,
    'breaks_taken':    X_train_s['breaks_taken'].median(),
    'tool_usage_rate': X_train_s['tool_usage_rate'].median()
})

with model:
    pm.set_data({'X': grid.values})
    ppc_grid = pm.sample_posterior_predictive(trace, var_names=["Y_obs"])

# Extract predictive mean and 94% credible interval
preds = ppc_grid["Y_obs"]
mean_pred = preds.mean(axis=0)
hpd = az.hdi(preds, hdi_prob=0.94)

import matplotlib.pyplot as plt

# Transform grid back to original scale for tasks_completed
tasks_orig = scaler.inverse_transform(
    np.column_stack([
      grid['hours_worked'], 
      grid['tasks_completed'],
      grid['breaks_taken'],
      grid['tool_usage_rate']
    ])
)[:,1]

plt.figure(figsize=(8,5))
plt.plot(tasks_orig, mean_pred, color="blue", label="Posterior mean")
plt.fill_between(tasks_orig, hpd[:,0], hpd[:,1], color="blue", alpha=0.3,
                 label="94% Credible interval")
plt.xlabel("Tasks Completed")
plt.ylabel("Predicted Productivity Score")
plt.title("Bayesian Regression: Productivity vs. Tasks Completed")
plt.legend()
plt.show()

Summary

By employing Bayesian Regression for worker efficiency:

1. Point estimates of individual productivity are accompanied by credible intervals that quantify forecast uncertainty.

2. Regularisation via priors mitigates overfitting when training data are limited or noisy.

3. Actionable insights: managers can identify employees with high predictive variance for targeted training, and understand how early‑week behaviours drive overall efficiency—enabling proactive operational decisions.

We work very hard to provide you quality material
Could you take 15 seconds and share your happy experience on Google | Facebook

ProjectGurukul Team

The ProjectGurukul Team delivers project-based tutorials on programming, machine learning, and web development. We simplify learning by providing hands-on projects to help you master real-world skills. We also provide free major and minor projects for enginering students.

Leave a Reply

Your email address will not be published. Required fields are marked *