Factory Production Rate Prediction with Polynomial Regression in ML
FREE Online Courses: Your Passport to Excellence - Start Now
Manufacturing engineers need to forecast the production line’s throughput rate (units/hour) before scaling up or scheduling maintenance, based solely on early‑stage process parameters. Historic records show that the output rate depends nonlinearly on machine cycle time, the number of active workstations, the material feed rate, and the energy input. A simple linear model underfits—missing curved relationships—while high‑degree polynomials risk overfitting without structure. By applying Polynomial Regression (linear regression on engineered polynomial features) with careful regularisation (e.g., via Ridge or Lasso), we can capture smooth nonlinear effects and deliver accurate, interpretable predictions of production rate for resource planning and bottleneck analysis.
Libraries Required
import pandas as pd # data handling import numpy as np # numeric operations import matplotlib.pyplot as plt # plotting import seaborn as sns # enhanced visualization from sklearn.model_selection import train_test_split, GridSearchCV from sklearn.preprocessing import PolynomialFeatures, StandardScaler from sklearn.linear_model import Ridge from sklearn.pipeline import Pipeline from sklearn.metrics import mean_squared_error, r2_score
Dataset
Step-by-Step Code Implementation
1. Load Libraries & Data
import pandas as pd
df = pd.read_csv("data/production_data.csv") # adjust filename
# Preview key columns
df.head()[['cycle_time_s','workstations','feed_rate_kg_h','energy_kW','output_units_h']]
2. Exploratory Data Analysis
import seaborn as sns, matplotlib.pyplot as plt
sns.pairplot(df[['cycle_time_s','workstations','feed_rate_kg_h','energy_kW','output_units_h']])
plt.suptitle("Pairwise relationships", y=1.02)
plt.show()
3. Define Features & Target
X = df[['cycle_time_s','workstations','feed_rate_kg_h','energy_kW']] y = df['output_units_h']
4. Build Pipeline with Polynomial Features
- PolynomialFeatures augments inputs with squared and interaction terms, capturing curvature and synergies (e.g., faster feed may be more effective when cycle time is low).
- StandardScaler normalizes each feature so that regularisation treats all terms equally.
- Ridge regression applies an ℓ² penalty to shrink noisy coefficients, mitigating overfitting from high‑dimensional polynomial terms.
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler, PolynomialFeatures
from sklearn.linear_model import Ridge
pipe = Pipeline([
('scale', StandardScaler()),
('poly', PolynomialFeatures(degree=2, include_bias=False)),
('ridge', Ridge())
])
5. Hyperparameter Search & Train/Test Split
GridSearchCV explores polynomial degrees {1,2,3} and Ridge α values {1e‑3…1e3}, using 5‑fold CV to minimise RMSE.
from sklearn.model_selection import train_test_split, GridSearchCV
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42
)
param_grid = {
'poly__degree': [1, 2, 3],
'ridge__alpha': np.logspace(-3, 3, 7)
}
gs = GridSearchCV(pipe, param_grid,
cv=5,
scoring='neg_root_mean_squared_error',
n_jobs=-1, verbose=1)
gs.fit(X_train, y_train)
print("Best params:", gs.best_params_)
6. Evaluate Model
from sklearn.metrics import mean_squared_error, r2_score
y_pred = gs.predict(X_test)
rmse = mean_squared_error(y_test, y_pred, squared=False)
r2 = r2_score(y_test, y_pred)
print(f"Test RMSE: {rmse:.2f} units/h")
print(f"Test R² : {r2:.3f}")
7. Inspect Coefficients
Coefficient inspection on scaled features highlights which squared or cross‑terms most influence throughput, guiding process engineers on levers to adjust.
# Extract feature names from PolynomialFeatures
poly = gs.best_estimator_.named_steps['poly']
feat_names = poly.get_feature_names_out(X.columns)
# Unscale coefficients
coefs = gs.best_estimator_.named_steps['ridge'].coef_
scales = gs.best_estimator_.named_steps['scale'].scale_
# For simplicity, interpret on scaled features
coef_series = pd.Series(coefs, index=feat_names).sort_values(key=abs, ascending=False)
plt.figure(figsize=(8,6))
coef_series.head(10).plot(kind='barh')
plt.gca().invert_yaxis()
plt.title("Top polynomial features driving throughput")
plt.xlabel("Coefficient magnitude")
plt.tight_layout()
plt.show()
Summary
By combining polynomial feature expansion with Ridge regularisation in a clean Pipeline, we achieve:
- Accurate nonlinear throughput forecasts (low RMSE, high R²) for production planning.
- Controlled model complexity via α tuning, preventing overfitting to spurious interactions.
- Interpretability: coefficient magnitudes for key interactions (e.g., feed_rate² or cycle_time × workstations) reveal actionable process insights for capacity optimisation.