Factory Production Rate Prediction with Polynomial Regression in ML

FREE Online Courses: Your Passport to Excellence - Start Now

Manufacturing engineers need to forecast the production line’s throughput rate (units/hour) before scaling up or scheduling maintenance, based solely on early‑stage process parameters. Historic records show that the output rate depends nonlinearly on machine cycle time, the number of active workstations, the material feed rate, and the energy input. A simple linear model underfits—missing curved relationships—while high‑degree polynomials risk overfitting without structure. By applying Polynomial Regression (linear regression on engineered polynomial features) with careful regularisation (e.g., via Ridge or Lasso), we can capture smooth nonlinear effects and deliver accurate, interpretable predictions of production rate for resource planning and bottleneck analysis.

Libraries Required

import pandas as pd                   # data handling  
import numpy as np                    # numeric operations  
import matplotlib.pyplot as plt       # plotting  
import seaborn as sns                 # enhanced visualization  

from sklearn.model_selection import train_test_split, GridSearchCV  
from sklearn.preprocessing import PolynomialFeatures, StandardScaler  
from sklearn.linear_model import Ridge  
from sklearn.pipeline import Pipeline  
from sklearn.metrics import mean_squared_error, r2_score

Dataset

Manufacturing Production Data

Step-by-Step Code Implementation

1. Load Libraries & Data

import pandas as pd
df = pd.read_csv("data/production_data.csv")   # adjust filename

# Preview key columns
df.head()[['cycle_time_s','workstations','feed_rate_kg_h','energy_kW','output_units_h']]

2. Exploratory Data Analysis

import seaborn as sns, matplotlib.pyplot as plt

sns.pairplot(df[['cycle_time_s','workstations','feed_rate_kg_h','energy_kW','output_units_h']])
plt.suptitle("Pairwise relationships", y=1.02)
plt.show()

3. Define Features & Target

X = df[['cycle_time_s','workstations','feed_rate_kg_h','energy_kW']]
y = df['output_units_h']

4. Build Pipeline with Polynomial Features

PolynomialFeatures augments inputs with squared and interaction terms, capturing curvature and synergies (e.g., faster feed may be more effective when cycle time is low).
StandardScaler normalizes each feature so that regularisation treats all terms equally.
Ridge regression applies an ℓ² penalty to shrink noisy coefficients, mitigating overfitting from high‑dimensional polynomial terms.

from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler, PolynomialFeatures
from sklearn.linear_model import Ridge

pipe = Pipeline([
    ('scale', StandardScaler()),  
    ('poly', PolynomialFeatures(degree=2, include_bias=False)),
    ('ridge', Ridge())
])

5. Hyperparameter Search & Train/Test Split

GridSearchCV explores polynomial degrees {1,2,3} and Ridge α values {1e‑3…1e3}, using 5‑fold CV to minimise RMSE.

from sklearn.model_selection import train_test_split, GridSearchCV

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

param_grid = {
    'poly__degree': [1, 2, 3],
    'ridge__alpha': np.logspace(-3, 3, 7)
}

gs = GridSearchCV(pipe, param_grid,
                  cv=5,
                  scoring='neg_root_mean_squared_error',
                  n_jobs=-1, verbose=1)
gs.fit(X_train, y_train)

print("Best params:", gs.best_params_)

6. Evaluate Model

from sklearn.metrics import mean_squared_error, r2_score

y_pred = gs.predict(X_test)
rmse = mean_squared_error(y_test, y_pred, squared=False)
r2   = r2_score(y_test, y_pred)

print(f"Test RMSE: {rmse:.2f} units/h")
print(f"Test R²  : {r2:.3f}")

7. Inspect Coefficients

Coefficient inspection on scaled features highlights which squared or cross‑terms most influence throughput, guiding process engineers on levers to adjust.

# Extract feature names from PolynomialFeatures
poly = gs.best_estimator_.named_steps['poly']
feat_names = poly.get_feature_names_out(X.columns)

# Unscale coefficients
coefs = gs.best_estimator_.named_steps['ridge'].coef_
scales = gs.best_estimator_.named_steps['scale'].scale_
# For simplicity, interpret on scaled features

coef_series = pd.Series(coefs, index=feat_names).sort_values(key=abs, ascending=False)
plt.figure(figsize=(8,6))
coef_series.head(10).plot(kind='barh')
plt.gca().invert_yaxis()
plt.title("Top polynomial features driving throughput")
plt.xlabel("Coefficient magnitude")
plt.tight_layout()
plt.show()

Summary

By combining polynomial feature expansion with Ridge regularisation in a clean Pipeline, we achieve:

Accurate nonlinear throughput forecasts (low RMSE, high R²) for production planning.
Controlled model complexity via α tuning, preventing overfitting to spurious interactions.
Interpretability: coefficient magnitudes for key interactions (e.g., feed_rate² or cycle_time × workstations) reveal actionable process insights for capacity optimisation.

Did you know we work 24x7 to provide you best tutorials
Please encourage us - write a review on Google | Facebook