Worker Efficiency Prediction using Bayesian Regression in ML

FREE Online Courses: Enroll Now, Thank us Later!

Operations managers need to predict individual worker efficiency scores—quantified as a productivity index—before the end of the evaluation period, using early‐week indicators such as hours worked, number of tasks completed, break frequency, and tool‑usage metrics. Efficiency often shows nonlinear dependencies (e.g., diminishing returns on long hours, thresholds in task batching) and uncertainty due to human variability. By applying Bayesian Regression, we can produce both a point estimate of each worker’s expected efficiency and credible intervals that quantify our uncertainty, enabling targeted coaching and more informed staffing decisions.

Libraries Required

import pandas as pd                              # data loading & handling  
import numpy as np                               # numerical operations  

import matplotlib.pyplot as plt                  # plotting  
import seaborn as sns                            # enhanced visualization  

import pymc3 as pm                               # Bayesian modeling  
import arviz as az                               # posterior analysis  

from sklearn.model_selection import train_test_split  
from sklearn.preprocessing import StandardScaler  
from sklearn.metrics import mean_absolute_error

Dataset

Remote Worker Productivity Dataset

Step-by-Step Code Implementation

Import Libraries & Load Data

import pandas as pd

# Load dataset
df = pd.read_csv("data/remote-worker-productivity-dataset/worker_productivity.csv")
df.head()

Preprocessing & Train/Test Split

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

# Select predictors and target
# Features: 'hours_worked','tasks_completed','breaks_taken','tool_usage_rate'
X = df[['hours_worked','tasks_completed','breaks_taken','tool_usage_rate']]
y = df['productivity_score']

# Split into train/test (80/20)
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# Standardize numeric predictors for stable MCMC sampling
scaler = StandardScaler().fit(X_train)
X_train_s = pd.DataFrame(scaler.transform(X_train), columns=X.columns)
X_test_s  = pd.DataFrame(scaler.transform(X_test),  columns=X.columns)

Define & Fit the Bayesian Regression Model

Priors (α,β,σ): We use weakly‑informative normals (σ=1) for coefficients and a half‑normal for noise, reflecting moderate uncertainty.
Linear predictor: μ = α + β·X_standardized learns the relationship between features and productivity.
MCMC sampling: 2,000 posterior draws (plus 1,000 tuning) with target_accept=0.9 for stable convergence.
Likelihood: Observed productivity scores are modelled as Normal(μ,σ).

import pymc3 as pm

with pm.Model() as model:
    # Priors for intercept and coefficients
    α = pm.Normal("α", mu=0, sigma=1)
    β = pm.Normal("β", mu=0, sigma=1, shape=X_train_s.shape[1])
    σ = pm.HalfNormal("σ", sigma=1)
    
    # Expected productivity
    μ = α + pm.math.dot(X_train_s, β)
    
    # Likelihood
    Y_obs = pm.Normal("Y_obs", mu=μ, sigma=σ, observed=y_train)
    
    # Sample posterior
    trace = pm.sample(
        draws=2000, tune=1000,
        target_accept=0.9,
        return_inferencedata=True
    )

Posterior Analysis & Point Predictions

Point predictions: We use posterior means of α and β for MAE evaluation.
Posterior predictive: Sampling from the posterior predictive distribution generates predictive uncertainty for each new input.

import arviz as az

# Summarize the posterior distributions
az.summary(trace, round_to=2)

# Posterior predictive sampling for training model
with model:
    ppc = pm.sample_posterior_predictive(trace, var_names=["Y_obs"])

# Compute posterior means of parameters
α_post = trace.posterior["α"].mean().item()
β_post = trace.posterior["β"].mean(dim=["chain","draw"]).values

# Point predictions on standardized test set
y_pred = α_post + X_test_s.dot(β_post)

# Evaluate with MAE
from sklearn.metrics import mean_absolute_error
mae = mean_absolute_error(y_test, y_pred)
print(f"Test MAE: {mae:.2f} productivity points")

Visualisation of Posterior Predictive with Credible Intervals

Plotting predictive mean and 94% credible bands against one feature (tasks completed) illustrates both central tendency and uncertainty in forecasts.

# Example: vary tasks_completed, hold others at median
tasks_grid = np.linspace(
    X_train_s['tasks_completed'].min(),
    X_train_s['tasks_completed'].max(), 50
)
grid = pd.DataFrame({
    'hours_worked':    X_train_s['hours_worked'].median(),
    'tasks_completed': tasks_grid,
    'breaks_taken':    X_train_s['breaks_taken'].median(),
    'tool_usage_rate': X_train_s['tool_usage_rate'].median()
})

with model:
    pm.set_data({'X': grid.values})
    ppc_grid = pm.sample_posterior_predictive(trace, var_names=["Y_obs"])

# Extract predictive mean and 94% credible interval
preds = ppc_grid["Y_obs"]
mean_pred = preds.mean(axis=0)
hpd = az.hdi(preds, hdi_prob=0.94)

import matplotlib.pyplot as plt

# Transform grid back to original scale for tasks_completed
tasks_orig = scaler.inverse_transform(
    np.column_stack([
      grid['hours_worked'], 
      grid['tasks_completed'],
      grid['breaks_taken'],
      grid['tool_usage_rate']
    ])
)[:,1]

plt.figure(figsize=(8,5))
plt.plot(tasks_orig, mean_pred, color="blue", label="Posterior mean")
plt.fill_between(tasks_orig, hpd[:,0], hpd[:,1], color="blue", alpha=0.3,
                 label="94% Credible interval")
plt.xlabel("Tasks Completed")
plt.ylabel("Predicted Productivity Score")
plt.title("Bayesian Regression: Productivity vs. Tasks Completed")
plt.legend()
plt.show()

Summary

By employing Bayesian Regression for worker efficiency:

1. Point estimates of individual productivity are accompanied by credible intervals that quantify forecast uncertainty.

2. Regularisation via priors mitigates overfitting when training data are limited or noisy.

3. Actionable insights: managers can identify employees with high predictive variance for targeted training, and understand how early‑week behaviours drive overall efficiency—enabling proactive operational decisions.

Did you know we work 24x7 to provide you best tutorials
Please encourage us - write a review on Google | Facebook

Worker Efficiency Prediction using Bayesian Regression in ML

Libraries Required

Dataset

Step-by-Step Code Implementation

Import Libraries & Load Data

Preprocessing & Train/Test Split