Manufacturing Efficiency Cost Prediction in ML

FREE Online Courses: Dive into Knowledge for Free. Learn More!

To remain competitive, manufacturers must reduce the cost per unit of effective output. This depends on various factors like machine utilization, energy consumption, material yield loss, and labor efficiency.

In this manufacturing efficiency cost prediction ML project, we’ll predict the efficiency cost—defined as total production cost divided by effective output—using stepwise linear regression on smart‑manufacturing telemetry (cycle time, downtime percentage, scrap rate, energy use, and labor hours).

By isolating the most significant predictors, the resulting model will help operations teams identify key levers to reduce cost and boost throughput.

Libraries Required

import pandas as pd               # Data loading & manipulation  
import numpy as np                # Numerical operations  
import statsmodels.api as sm      # Ordinary Least Squares regression  
from sklearn.model_selection import train_test_split   # Train/test split  
from sklearn.metrics import r2_score, mean_squared_error  # Evaluation metrics  
import matplotlib.pyplot as plt   # Visualization

Dataset

Smart Manufacturing Resource Efficiency Dataset

Step-by-Step Code Implementation

Data Loading & Initial Inspection

We import a smart‑manufacturing dataset capturing cycle times, downtime, scrap rates, energy, labor usage, and total production cost. Initial checks (.info(), .describe()) ensure data quality.

# Block 1: Load dataset
# Smart Manufacturing Resource Efficiency Dataset – Kaggle :contentReference[oaicite:1]{index=1}  
df = pd.read_csv("smart_manufacturing_resource_efficiency.csv")

print(df.head())    # glimpse at columns  
print(df.info())    # types & missingness  
print(df.describe())# summary statistics

Feature Engineering & Cost Definition

We compute the effective output by adjusting for scrap and define efficiency_cost_usd as the total cost divided by that output. Rows with missing fields or zero effective output are removed.

# Block 2: Compute efficiency cost and clean data
# Assume df has: 'cycle_time_sec','downtime_pct','scrap_rate_pct',
# 'energy_kwh_per_unit','labor_hours_per_unit','production_units',
# and 'total_production_cost_usd'

# Calculate effective output (units minus scrap)
df['effective_output'] = df['production_units'] * (1 - df['scrap_rate_pct']/100)

# Define cost per effective unit
df['efficiency_cost_usd'] = df['total_production_cost_usd'] / df['effective_output']

# Drop rows with missing or zero effective output
df = df.dropna(subset=[
    'cycle_time_sec','downtime_pct','scrap_rate_pct',
    'energy_kwh_per_unit','labor_hours_per_unit','efficiency_cost_usd'
])
df = df[df['effective_output'] > 0]

Prepare Predictors and Split Data

Predictors (X) include five operational metrics; the response (y) is the calculated cost per effective unit. An 80/20 split creates training and test subsets for unbiased evaluation.

# Block 3: Define feature matrix X and target y
X = df[[
    'cycle_time_sec','downtime_pct','scrap_rate_pct',
    'energy_kwh_per_unit','labor_hours_per_unit'
]]
y = df['efficiency_cost_usd']

# Split into training and test sets (80/20)
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

Stepwise Regression Function

The stepwise_selection function performs forward inclusion (adding features with p < 0.01) and backward elimination (dropping features with p > 0.05) iteratively until no changes occur—yielding a parsimonious predictor set.

# Block 4: Implement forward–backward stepwise selection
def stepwise_selection(X, y,
                       initial_list=[],
                       threshold_in=0.01,
                       threshold_out=0.05,
                       verbose=True):
    included = list(initial_list)
    while True:
        changed = False

        # Forward step: test adding each excluded feature
        excluded = [f for f in X.columns if f not in included]
        new_pvals = pd.Series(index=excluded, dtype=float)
        for col in excluded:
            model = sm.OLS(y, sm.add_constant(X[included + [col]])).fit()
            new_pvals[col] = model.pvalues[col]
        best_pval = new_pvals.min()
        if best_pval < threshold_in:
            best_feat = new_pvals.idxmin()
            included.append(best_feat)
            changed = True
            if verbose:
                print(f"Add  {best_feat:20} p-value {best_pval:.4f}")

        # Backward step: test removing each included feature
        model = sm.OLS(y, sm.add_constant(X[included])).fit()
        pvals = model.pvalues.iloc[1:]  # drop intercept
        worst_pval = pvals.max()
        if worst_pval > threshold_out:
            worst_feat = pvals.idxmax()
            included.remove(worst_feat)
            changed = True
            if verbose:
                print(f"Drop {worst_feat:20} p-value {worst_pval:.4f}")

        if not changed:
            break
    return included

Model Building & Evaluation

We fit an OLS regression on the selected features using statsmodels. The .summary() output provides coefficients, p -values, R², adjusted R², AIC, and F‑statistics—clarifying which operational levers most affect cost.

Predictions on held‑out data yield R² (variance explained) and RMSE (average error), quantifying model accuracy out of sample.

# Block 5: Feature selection and model fitting
selected = stepwise_selection(X_train, y_train)

# Fit final OLS model
X_train_sel = sm.add_constant(X_train[selected])
model = sm.OLS(y_train, X_train_sel).fit()
print(model.summary())

# Predict on test set
X_test_sel = sm.add_constant(X_test[selected])
y_pred = model.predict(X_test_sel)

# Compute R² and RMSE
print("Test R²:", r2_score(y_test, y_pred))
print("Test RMSE:", np.sqrt(mean_squared_error(y_test, y_pred)))

Residual Diagnostics

Plotting residuals versus predicted values checks for non‑random patterns or heteroscedasticity, validating OLS assumptions and model reliability.

# Block 6: Residual plot
residuals = y_test - y_pred
plt.scatter(y_pred, residuals, alpha=0.6)
plt.axhline(0, linestyle="--")
plt.xlabel("Predicted Efficiency Cost (USD/unit)")
plt.ylabel("Residuals")
plt.title("Residuals vs. Predicted Efficiency Cost")
plt.show()

Summary

By applying stepwise regression to smart‑manufacturing telemetry, we distill the key drivers of per‑unit cost—such as downtime percentage and energy usage—while discarding less informative metrics.

The resulting linear model achieves a balance between interpretability (few, significant predictors) and predictive performance (strong test R², low RMSE), equipping manufacturing teams with a transparent tool to forecast and reduce efficiency costs.

Did you like our efforts? If Yes, please give ProjectGurukul 5 Stars on Google | Facebook