Assembly Line Efficiency Prediction using Polynomial Regression in ML

FREE Online Courses: Knowledge Awaits – Click for Free Access!

Manufacturing engineers and operations managers need to forecast the efficiency of an assembly line—measured as the percentage of defect-free units produced per hour—based on early indicators such as machine downtime, throughput rate, number of operators, and maintenance hours, before full-shift data are available. Real‑world observations show that efficiency responds nonlinearly to downtime (small reductions yield significant gains up to a point), to operator count (diminishing returns beyond optimal staffing), and to maintenance hours (too little or too much both hurt). A simple linear model underfits these curves; a high‑degree polynomial without regularisation overfits to noise. By employing Polynomial Regression on a set of engineered numeric and categorical features with Ridge (ℓ²) regularisation, we can capture smooth efficiency trends and deliver reliable, interpretable predictions for proactive resource planning.

Libraries Required

import pandas as pd                                # data loading & manipulation  
import numpy as np                                 # numerical operations  

import matplotlib.pyplot as plt                    # plotting  
import seaborn as sns                              # enhanced visualization  

from sklearn.model_selection import train_test_split, GridSearchCV  
from sklearn.preprocessing import StandardScaler, PolynomialFeatures, OneHotEncoder  
from sklearn.compose import ColumnTransformer  
from sklearn.linear_model import Ridge  
from sklearn.pipeline import Pipeline  
from sklearn.metrics import mean_squared_error, r2_score  

Dataset

Bosch Production Line Performance

Step-by-Step Code Implementation

Load Data & Libraries

We merge part‑level measurements (train_numeric.csv) with pass/fail labels (train_date.csv), then group by an inferred LineID to compute session‑level efficiency (fraction passed) and throughput (parts processed).

import pandas as pd
import numpy as np

# Load feature and target files (adjust paths)
features = pd.read_csv("data/train_numeric.csv", nrows=500000)  
labels   = pd.read_csv("data/train_date.csv",   nrows=500000)

# Merge on 'Id' and sample down for speed
df = features.merge(labels[['Id','Response']], on='Id').sample(100000, random_state=42)

# Compute per‑Id pass/fail as efficiency indicator
df['Passed'] = (df['Response'] == 0).astype(int)

Feature Engineering & Aggregation

In practice, you’d extract actual downtime, operator counts, and maintenance logs; here, we simulate for demonstration purposes.

# For simplicity, aggregate at the line-session level by Id prefix
# Assume Ids encode line (e.g., first digits); extract a mock 'LineID'
df['LineID'] = (df['Id'] // 1000000).astype(int)

# Group by LineID to get features: 
#   - avg machine downtime (mocked from date columns)
#   - throughput = avg parts per session
#   - avg operators (mock feature)
#   - avg maintenance hours (mock feature)
agg = df.groupby('LineID').agg({
    'Passed': ['mean','count']
})
agg.columns = ['Efficiency','Throughput']
# Mock additional features
np.random.seed(42)
agg['Downtime_Hours']       = np.random.uniform(0, 2, size=len(agg))
agg['Operator_Count']       = np.random.randint(5, 15, size=len(agg))
agg['Maintenance_Hours']    = np.random.uniform(0, 3, size=len(agg))
agg = agg.reset_index()

Define Features & Target

Expands numeric inputs into squared and interaction terms (e.g., Throughput², Throughput×Downtime_Hours), capturing nonlinear returns and trade‑offs.

X = agg[['Throughput','Downtime_Hours','Operator_Count','Maintenance_Hours']]
y = agg['Efficiency']  # fraction passed per session

Build Polynomial Regression Pipeline

  • Standard Scaler: Z‑scores feature so Ridge’s ℓ² penalty treats them equally, regardless of original scale.
  • Ridge Regression: Applies an ℓ² penalty (controlled by alpha) to shrink high‑order coefficients and prevent overfitting.
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler, PolynomialFeatures
from sklearn.linear_model import Ridge

pipe = Pipeline([
    ('scale', StandardScaler()),  
    ('poly', PolynomialFeatures(include_bias=False)),  
    ('ridge', Ridge(random_state=42))  
])

Train/Test Split & Hyperparameter Search

  • GridSearchCV: Tunes polynomial degree (1–3) and alpha (10⁻³…10³) via 5‑fold CV, optimising for lowest RMSE on held‑out folds.
from sklearn.model_selection import train_test_split, GridSearchCV

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

param_grid = {
    'poly__degree': [1, 2, 3],
    'ridge__alpha': np.logspace(-3, 3, 7)
}

gs = GridSearchCV(
    pipe, param_grid,
    cv=5,
    scoring='neg_root_mean_squared_error',
    n_jobs=-1, verbose=1
)
gs.fit(X_train, y_train)

print("Best parameters:", gs.best_params_)

Evaluate Model

y_pred = gs.predict(X_test)
rmse = mean_squared_error(y_test, y_pred, squared=False)
r2   = r2_score(y_test, y_pred)

print(f"Test RMSE: {rmse:.3f} (efficiency fraction)")
print(f"Test R²  : {r2:.3f}")

Inspect Key Polynomial Coefficients

Identifies which polynomial or interaction terms most strongly influence predicted efficiency, offering actionable levers (e.g., reducing downtime squared term) for process improvements.

# Retrieve feature names after polynomial expansion
poly = gs.best_estimator_.named_steps['poly']
feat_names = poly.get_feature_names_out(input_features=X.columns)
coefs = gs.best_estimator_.named_steps['ridge'].coef_

import pandas as pd
important = pd.Series(coefs, index=feat_names).abs().sort_values(ascending=False).head(10)

import matplotlib.pyplot as plt
plt.figure(figsize=(8,5))
important.plot(kind='barh')
plt.gca().invert_yaxis()
plt.title("Top Polynomial Features Driving Assembly Efficiency")
plt.xlabel("Coefficient Magnitude")
plt.tight_layout()
plt.show()

Summary

By integrating polynomial feature engineering with Ridge regularisation in a streamlined pipeline, this approach provides:

1. Accurate modelling of assembly line efficiency, capturing nonlinear effects of throughput, downtime, staffing, and maintenance planning.

2. Controlled complexity, avoiding overfitting to idiosyncratic noise via α‑tuning.

3. Interpretable insights, highlighting the most influential polynomial terms—guiding targeted interventions to maximize defect‑free output.

You give me 15 seconds I promise you best tutorials
Please share your happy experience on Google | Facebook

ProjectGurukul Team

The ProjectGurukul Team delivers project-based tutorials on programming, machine learning, and web development. We simplify learning by providing hands-on projects to help you master real-world skills. We also provide free major and minor projects for enginering students.

Leave a Reply

Your email address will not be published. Required fields are marked *