Manufacturing Efficiency Cost Prediction in ML
FREE Online Courses: Your Passport to Excellence - Start Now
To remain competitive, manufacturers must reduce the cost per unit of effective output. This depends on various factors like machine utilization, energy consumption, material yield loss, and labor efficiency.
In this manufacturing efficiency cost prediction ML project, we’ll predict the efficiency cost—defined as total production cost divided by effective output—using stepwise linear regression on smart‑manufacturing telemetry (cycle time, downtime percentage, scrap rate, energy use, and labor hours).
By isolating the most significant predictors, the resulting model will help operations teams identify key levers to reduce cost and boost throughput.
Libraries Required
import pandas as pd # Data loading & manipulation import numpy as np # Numerical operations import statsmodels.api as sm # Ordinary Least Squares regression from sklearn.model_selection import train_test_split # Train/test split from sklearn.metrics import r2_score, mean_squared_error # Evaluation metrics import matplotlib.pyplot as plt # Visualization
Dataset
Smart Manufacturing Resource Efficiency Dataset
Step-by-Step Code Implementation
Data Loading & Initial Inspection
We import a smart‑manufacturing dataset capturing cycle times, downtime, scrap rates, energy, labor usage, and total production cost. Initial checks (.info(), .describe()) ensure data quality.
# Block 1: Load dataset
# Smart Manufacturing Resource Efficiency Dataset – Kaggle :contentReference[oaicite:1]{index=1}
df = pd.read_csv("smart_manufacturing_resource_efficiency.csv")
print(df.head()) # glimpse at columns
print(df.info()) # types & missingness
print(df.describe())# summary statistics
Feature Engineering & Cost Definition
We compute the effective output by adjusting for scrap and define efficiency_cost_usd as the total cost divided by that output. Rows with missing fields or zero effective output are removed.
# Block 2: Compute efficiency cost and clean data
# Assume df has: 'cycle_time_sec','downtime_pct','scrap_rate_pct',
# 'energy_kwh_per_unit','labor_hours_per_unit','production_units',
# and 'total_production_cost_usd'
# Calculate effective output (units minus scrap)
df['effective_output'] = df['production_units'] * (1 - df['scrap_rate_pct']/100)
# Define cost per effective unit
df['efficiency_cost_usd'] = df['total_production_cost_usd'] / df['effective_output']
# Drop rows with missing or zero effective output
df = df.dropna(subset=[
'cycle_time_sec','downtime_pct','scrap_rate_pct',
'energy_kwh_per_unit','labor_hours_per_unit','efficiency_cost_usd'
])
df = df[df['effective_output'] > 0]
Prepare Predictors and Split Data
Predictors (X) include five operational metrics; the response (y) is the calculated cost per effective unit. An 80/20 split creates training and test subsets for unbiased evaluation.
# Block 3: Define feature matrix X and target y
X = df[[
'cycle_time_sec','downtime_pct','scrap_rate_pct',
'energy_kwh_per_unit','labor_hours_per_unit'
]]
y = df['efficiency_cost_usd']
# Split into training and test sets (80/20)
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42
)
Stepwise Regression Function
The stepwise_selection function performs forward inclusion (adding features with p < 0.01) and backward elimination (dropping features with p > 0.05) iteratively until no changes occur—yielding a parsimonious predictor set.
# Block 4: Implement forward–backward stepwise selection
def stepwise_selection(X, y,
initial_list=[],
threshold_in=0.01,
threshold_out=0.05,
verbose=True):
included = list(initial_list)
while True:
changed = False
# Forward step: test adding each excluded feature
excluded = [f for f in X.columns if f not in included]
new_pvals = pd.Series(index=excluded, dtype=float)
for col in excluded:
model = sm.OLS(y, sm.add_constant(X[included + [col]])).fit()
new_pvals[col] = model.pvalues[col]
best_pval = new_pvals.min()
if best_pval < threshold_in:
best_feat = new_pvals.idxmin()
included.append(best_feat)
changed = True
if verbose:
print(f"Add {best_feat:20} p-value {best_pval:.4f}")
# Backward step: test removing each included feature
model = sm.OLS(y, sm.add_constant(X[included])).fit()
pvals = model.pvalues.iloc[1:] # drop intercept
worst_pval = pvals.max()
if worst_pval > threshold_out:
worst_feat = pvals.idxmax()
included.remove(worst_feat)
changed = True
if verbose:
print(f"Drop {worst_feat:20} p-value {worst_pval:.4f}")
if not changed:
break
return included
Model Building & Evaluation
We fit an OLS regression on the selected features using statsmodels. The .summary() output provides coefficients, p -values, R², adjusted R², AIC, and F‑statistics—clarifying which operational levers most affect cost.
Predictions on held‑out data yield R² (variance explained) and RMSE (average error), quantifying model accuracy out of sample.
# Block 5: Feature selection and model fitting
selected = stepwise_selection(X_train, y_train)
# Fit final OLS model
X_train_sel = sm.add_constant(X_train[selected])
model = sm.OLS(y_train, X_train_sel).fit()
print(model.summary())
# Predict on test set
X_test_sel = sm.add_constant(X_test[selected])
y_pred = model.predict(X_test_sel)
# Compute R² and RMSE
print("Test R²:", r2_score(y_test, y_pred))
print("Test RMSE:", np.sqrt(mean_squared_error(y_test, y_pred)))
Residual Diagnostics
Plotting residuals versus predicted values checks for non‑random patterns or heteroscedasticity, validating OLS assumptions and model reliability.
# Block 6: Residual plot
residuals = y_test - y_pred
plt.scatter(y_pred, residuals, alpha=0.6)
plt.axhline(0, linestyle="--")
plt.xlabel("Predicted Efficiency Cost (USD/unit)")
plt.ylabel("Residuals")
plt.title("Residuals vs. Predicted Efficiency Cost")
plt.show()
Summary
By applying stepwise regression to smart‑manufacturing telemetry, we distill the key drivers of per‑unit cost—such as downtime percentage and energy usage—while discarding less informative metrics.
The resulting linear model achieves a balance between interpretability (few, significant predictors) and predictive performance (strong test R², low RMSE), equipping manufacturing teams with a transparent tool to forecast and reduce efficiency costs.