Assembly Line Efficiency Prediction using Polynomial Regression in ML

FREE Online Courses: Elevate Skills, Zero Cost. Enroll Now!

Manufacturing engineers and operations managers need to forecast the efficiency of an assembly line—measured as the percentage of defect-free units produced per hour—based on early indicators such as machine downtime, throughput rate, number of operators, and maintenance hours, before full-shift data are available. Real‑world observations show that efficiency responds nonlinearly to downtime (small reductions yield significant gains up to a point), to operator count (diminishing returns beyond optimal staffing), and to maintenance hours (too little or too much both hurt). A simple linear model underfits these curves; a high‑degree polynomial without regularisation overfits to noise. By employing Polynomial Regression on a set of engineered numeric and categorical features with Ridge (ℓ²) regularisation, we can capture smooth efficiency trends and deliver reliable, interpretable predictions for proactive resource planning.

Libraries Required

import pandas as pd                                # data loading & manipulation  
import numpy as np                                 # numerical operations  

import matplotlib.pyplot as plt                    # plotting  
import seaborn as sns                              # enhanced visualization  

from sklearn.model_selection import train_test_split, GridSearchCV  
from sklearn.preprocessing import StandardScaler, PolynomialFeatures, OneHotEncoder  
from sklearn.compose import ColumnTransformer  
from sklearn.linear_model import Ridge  
from sklearn.pipeline import Pipeline  
from sklearn.metrics import mean_squared_error, r2_score

Dataset

Bosch Production Line Performance

Step-by-Step Code Implementation

Load Data & Libraries

We merge part‑level measurements (train_numeric.csv) with pass/fail labels (train_date.csv), then group by an inferred LineID to compute session‑level efficiency (fraction passed) and throughput (parts processed).

import pandas as pd
import numpy as np

# Load feature and target files (adjust paths)
features = pd.read_csv("data/train_numeric.csv", nrows=500000)  
labels   = pd.read_csv("data/train_date.csv",   nrows=500000)

# Merge on 'Id' and sample down for speed
df = features.merge(labels[['Id','Response']], on='Id').sample(100000, random_state=42)

# Compute per‑Id pass/fail as efficiency indicator
df['Passed'] = (df['Response'] == 0).astype(int)

Feature Engineering & Aggregation

In practice, you’d extract actual downtime, operator counts, and maintenance logs; here, we simulate for demonstration purposes.

# For simplicity, aggregate at the line-session level by Id prefix
# Assume Ids encode line (e.g., first digits); extract a mock 'LineID'
df['LineID'] = (df['Id'] // 1000000).astype(int)

# Group by LineID to get features: 
#   - avg machine downtime (mocked from date columns)
#   - throughput = avg parts per session
#   - avg operators (mock feature)
#   - avg maintenance hours (mock feature)
agg = df.groupby('LineID').agg({
    'Passed': ['mean','count']
})
agg.columns = ['Efficiency','Throughput']
# Mock additional features
np.random.seed(42)
agg['Downtime_Hours']       = np.random.uniform(0, 2, size=len(agg))
agg['Operator_Count']       = np.random.randint(5, 15, size=len(agg))
agg['Maintenance_Hours']    = np.random.uniform(0, 3, size=len(agg))
agg = agg.reset_index()

Define Features & Target

Expands numeric inputs into squared and interaction terms (e.g., Throughput², Throughput×Downtime_Hours), capturing nonlinear returns and trade‑offs.

X = agg[['Throughput','Downtime_Hours','Operator_Count','Maintenance_Hours']]
y = agg['Efficiency']  # fraction passed per session

Build Polynomial Regression Pipeline

Standard Scaler: Z‑scores feature so Ridge’s ℓ² penalty treats them equally, regardless of original scale.
Ridge Regression: Applies an ℓ² penalty (controlled by alpha) to shrink high‑order coefficients and prevent overfitting.

from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler, PolynomialFeatures
from sklearn.linear_model import Ridge

pipe = Pipeline([
    ('scale', StandardScaler()),  
    ('poly', PolynomialFeatures(include_bias=False)),  
    ('ridge', Ridge(random_state=42))  
])

Train/Test Split & Hyperparameter Search

GridSearchCV: Tunes polynomial degree (1–3) and alpha (10⁻³…10³) via 5‑fold CV, optimising for lowest RMSE on held‑out folds.

from sklearn.model_selection import train_test_split, GridSearchCV

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

param_grid = {
    'poly__degree': [1, 2, 3],
    'ridge__alpha': np.logspace(-3, 3, 7)
}

gs = GridSearchCV(
    pipe, param_grid,
    cv=5,
    scoring='neg_root_mean_squared_error',
    n_jobs=-1, verbose=1
)
gs.fit(X_train, y_train)

print("Best parameters:", gs.best_params_)

Evaluate Model

y_pred = gs.predict(X_test)
rmse = mean_squared_error(y_test, y_pred, squared=False)
r2   = r2_score(y_test, y_pred)

print(f"Test RMSE: {rmse:.3f} (efficiency fraction)")
print(f"Test R²  : {r2:.3f}")

Inspect Key Polynomial Coefficients

Identifies which polynomial or interaction terms most strongly influence predicted efficiency, offering actionable levers (e.g., reducing downtime squared term) for process improvements.

# Retrieve feature names after polynomial expansion
poly = gs.best_estimator_.named_steps['poly']
feat_names = poly.get_feature_names_out(input_features=X.columns)
coefs = gs.best_estimator_.named_steps['ridge'].coef_

import pandas as pd
important = pd.Series(coefs, index=feat_names).abs().sort_values(ascending=False).head(10)

import matplotlib.pyplot as plt
plt.figure(figsize=(8,5))
important.plot(kind='barh')
plt.gca().invert_yaxis()
plt.title("Top Polynomial Features Driving Assembly Efficiency")
plt.xlabel("Coefficient Magnitude")
plt.tight_layout()
plt.show()

Summary

By integrating polynomial feature engineering with Ridge regularisation in a streamlined pipeline, this approach provides:

1. Accurate modelling of assembly line efficiency, capturing nonlinear effects of throughput, downtime, staffing, and maintenance planning.

2. Controlled complexity, avoiding overfitting to idiosyncratic noise via α‑tuning.

3. Interpretable insights, highlighting the most influential polynomial terms—guiding targeted interventions to maximize defect‑free output.

Did you know we work 24x7 to provide you best tutorials
Please encourage us - write a review on Google | Facebook

Assembly Line Efficiency Prediction using Polynomial Regression in ML

Libraries Required

Dataset

Step-by-Step Code Implementation

Load Data & Libraries

Feature Engineering & Aggregation

Define Features & Target

Build Polynomial Regression Pipeline

Train/Test Split & Hyperparameter Search

Evaluate Model

Inspect Key Polynomial Coefficients