Worker Efficiency Prediction using Bayesian Regression in ML
We offer you a brighter future with FREE online courses - Start Now!!
Operations managers need to predict individual worker efficiency scores—quantified as a productivity index—before the end of the evaluation period, using early‐week indicators such as hours worked, number of tasks completed, break frequency, and tool‑usage metrics. Efficiency often shows nonlinear dependencies (e.g., diminishing returns on long hours, thresholds in task batching) and uncertainty due to human variability. By applying Bayesian Regression, we can produce both a point estimate of each worker’s expected efficiency and credible intervals that quantify our uncertainty, enabling targeted coaching and more informed staffing decisions.
Libraries Required
import pandas as pd # data loading & handling import numpy as np # numerical operations import matplotlib.pyplot as plt # plotting import seaborn as sns # enhanced visualization import pymc3 as pm # Bayesian modeling import arviz as az # posterior analysis from sklearn.model_selection import train_test_split from sklearn.preprocessing import StandardScaler from sklearn.metrics import mean_absolute_error
Dataset
Remote Worker Productivity Dataset
Step-by-Step Code Implementation
Import Libraries & Load Data
import pandas as pd
# Load dataset
df = pd.read_csv("data/remote-worker-productivity-dataset/worker_productivity.csv")
df.head()
Preprocessing & Train/Test Split
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
# Select predictors and target
# Features: 'hours_worked','tasks_completed','breaks_taken','tool_usage_rate'
X = df[['hours_worked','tasks_completed','breaks_taken','tool_usage_rate']]
y = df['productivity_score']
# Split into train/test (80/20)
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42
)
# Standardize numeric predictors for stable MCMC sampling
scaler = StandardScaler().fit(X_train)
X_train_s = pd.DataFrame(scaler.transform(X_train), columns=X.columns)
X_test_s = pd.DataFrame(scaler.transform(X_test), columns=X.columns)
Define & Fit the Bayesian Regression Model
- Priors (α,β,σ): We use weakly‑informative normals (σ=1) for coefficients and a half‑normal for noise, reflecting moderate uncertainty.
- Linear predictor: μ = α + β·X_standardized learns the relationship between features and productivity.
- MCMC sampling: 2,000 posterior draws (plus 1,000 tuning) with target_accept=0.9 for stable convergence.
- Likelihood: Observed productivity scores are modelled as Normal(μ,σ).
import pymc3 as pm
with pm.Model() as model:
# Priors for intercept and coefficients
α = pm.Normal("α", mu=0, sigma=1)
β = pm.Normal("β", mu=0, sigma=1, shape=X_train_s.shape[1])
σ = pm.HalfNormal("σ", sigma=1)
# Expected productivity
μ = α + pm.math.dot(X_train_s, β)
# Likelihood
Y_obs = pm.Normal("Y_obs", mu=μ, sigma=σ, observed=y_train)
# Sample posterior
trace = pm.sample(
draws=2000, tune=1000,
target_accept=0.9,
return_inferencedata=True
)
Posterior Analysis & Point Predictions
- Point predictions: We use posterior means of α and β for MAE evaluation.
- Posterior predictive: Sampling from the posterior predictive distribution generates predictive uncertainty for each new input.
import arviz as az
# Summarize the posterior distributions
az.summary(trace, round_to=2)
# Posterior predictive sampling for training model
with model:
ppc = pm.sample_posterior_predictive(trace, var_names=["Y_obs"])
# Compute posterior means of parameters
α_post = trace.posterior["α"].mean().item()
β_post = trace.posterior["β"].mean(dim=["chain","draw"]).values
# Point predictions on standardized test set
y_pred = α_post + X_test_s.dot(β_post)
# Evaluate with MAE
from sklearn.metrics import mean_absolute_error
mae = mean_absolute_error(y_test, y_pred)
print(f"Test MAE: {mae:.2f} productivity points")
Visualisation of Posterior Predictive with Credible Intervals
Plotting predictive mean and 94% credible bands against one feature (tasks completed) illustrates both central tendency and uncertainty in forecasts.
# Example: vary tasks_completed, hold others at median
tasks_grid = np.linspace(
X_train_s['tasks_completed'].min(),
X_train_s['tasks_completed'].max(), 50
)
grid = pd.DataFrame({
'hours_worked': X_train_s['hours_worked'].median(),
'tasks_completed': tasks_grid,
'breaks_taken': X_train_s['breaks_taken'].median(),
'tool_usage_rate': X_train_s['tool_usage_rate'].median()
})
with model:
pm.set_data({'X': grid.values})
ppc_grid = pm.sample_posterior_predictive(trace, var_names=["Y_obs"])
# Extract predictive mean and 94% credible interval
preds = ppc_grid["Y_obs"]
mean_pred = preds.mean(axis=0)
hpd = az.hdi(preds, hdi_prob=0.94)
import matplotlib.pyplot as plt
# Transform grid back to original scale for tasks_completed
tasks_orig = scaler.inverse_transform(
np.column_stack([
grid['hours_worked'],
grid['tasks_completed'],
grid['breaks_taken'],
grid['tool_usage_rate']
])
)[:,1]
plt.figure(figsize=(8,5))
plt.plot(tasks_orig, mean_pred, color="blue", label="Posterior mean")
plt.fill_between(tasks_orig, hpd[:,0], hpd[:,1], color="blue", alpha=0.3,
label="94% Credible interval")
plt.xlabel("Tasks Completed")
plt.ylabel("Predicted Productivity Score")
plt.title("Bayesian Regression: Productivity vs. Tasks Completed")
plt.legend()
plt.show()
Summary
By employing Bayesian Regression for worker efficiency:
1. Point estimates of individual productivity are accompanied by credible intervals that quantify forecast uncertainty.
2. Regularisation via priors mitigates overfitting when training data are limited or noisy.
3. Actionable insights: managers can identify employees with high predictive variance for targeted training, and understand how early‑week behaviours drive overall efficiency—enabling proactive operational decisions.