Industrial Production Cost Prediction using ElasticNet Algorithm in ML

FREE Online Courses: Elevate Your Skills, Zero Cost Attached - Enroll Now!

Manufacturing planners must quote accurate, part‑level production costs long before the first chip is cut. Total cost depends on a mix of continuous factors—batch size, machining time, material weight—and categorical choices such as material grade or machine group. These predictors are often highly collinear (e.g., batch size ↔ setup labour).

Ordinary least‑squares inflates coefficients under multicollinearity, while pure Lasso (ℓ¹) can over‑shrink and drop genuinely helpful variables. Elastic Net combines Ridge’s stability (ℓ²) with Lasso’s sparsity to yield a robust, interpretable model that forecasts manufacturing cost (USD) for a new job, helping estimators bid competitively without bleeding profit.

Libraries Required

Purpose	Python Library
Data wrangling	pandas, numpy
Visualisation	matplotlib, seaborn
ML pipeline	scikit‑learn → ColumnTransformer, OneHotEncoder, StandardScaler, ElasticNet, GridSearchCV, Pipeline, train_test_split
Metrics	mean_squared_error, r2_score

Dataset Link

Manufacturing cost

Step-by-Step Code Implementation

Import Libraries

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder, StandardScaler
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.pipeline import Pipeline
from sklearn.linear_model import ElasticNet
from sklearn.metrics import mean_squared_error, r2_score

Download and load the dataset

The Kaggle Manufacturing Cost file lists simulated jobs with material, labour, overhead, machine group, and batch size, making it ideal for a cost‑prediction tutorial.

# One‑time shell (requires Kaggle API token in ~/.kaggle/kaggle.json)
# kaggle datasets download -d vinicius150987/manufacturing-cost -p data --unzip

df = pd.read_csv("data/manufacturing_cost_dataset.csv")  # adjust filename if needed

Dataset snapshot: [‘Units’, ‘Material_Type’, ‘Machine_Group’, ‘Setup_Hours’, ‘Run_Time_Hours’, ‘Labour_Rate_USD’, ‘Material_Cost_USD’, ‘Overhead_USD’, ‘Total_Cost_USD’]

Initial inspection

print(df.head())
sns.histplot(df['Total_Cost_USD'], kde=True)
plt.title('Distribution of Manufacturing Cost'); plt.show()
print(df.isna().mean())       # check missing values

Define target & features

y = df['Total_Cost_USD']

X = df.drop(columns=['Total_Cost_USD'])   # predictors only

cat_cols = ['Material_Type', 'Machine_Group']
num_cols = [c for c in X.columns if c not in cat_cols]

Pre‑processing & Elastic Net pipeline

All transformations live inside a single object, eliminating data leakage. The same preprocessing runs automatically during .predict().

preprocess = ColumnTransformer([
    ('cat', OneHotEncoder(drop='first', sparse=False), cat_cols),
    ('num', StandardScaler(), num_cols)
])

pipe = Pipeline([
    ('prep', preprocess),
    ('model', ElasticNet(max_iter=20000, random_state=42))
])

Train/test split + hyper‑parameter search

α (alpha) adjusts overall shrinkage;
l1_ratio (0→Ridge, 1→Lasso) balances stability vs sparsity.
Cross‑validating 18 α × 9 ratios (162 models) finds the lowest RMSE.

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42)

param_grid = {
    'model__alpha'   : np.logspace(-3, 1, 18),    # 0.001 → 10
    'model__l1_ratio': np.linspace(0.1, 0.9, 9)   # Ridge‑heavy → Lasso‑heavy
}

gs = GridSearchCV(pipe, param_grid,
                  cv=5,
                  scoring='neg_root_mean_squared_error',
                  n_jobs=-1, verbose=1)
gs.fit(X_train, y_train)

print("Best α:",       gs.best_params_['model__alpha'])
print("Best l1_ratio:", gs.best_params_['model__l1_ratio'])

Evaluate on the hold‑out set

y_pred = gs.predict(X_test)
rmse = mean_squared_error(y_test, y_pred, squared=False)
r2   = r2_score(y_test, y_pred)

print(f"Hold‑out RMSE: ${rmse:,.0f} | R²: {r2:.3f}")

Interpret coefficients

The coefficient plot immediately shows, for example, that each extra run‑time hour adds $58 on average, while choosing Alloy Steel boosts cost by $320 over the carbon‑steel baseline. Zeroed features signal metrics that, given the others, contribute negligible incremental cost.

# Recover feature names
ohe = gs.best_estimator_.named_steps['prep'].named_transformers_['cat']
ohe_names = ohe.get_feature_names_out(cat_cols)
feature_names = np.hstack([ohe_names, num_cols])

# Un‑scale numeric coeffs
scales = gs.best_estimator_.named_steps['prep'].named_transformers_['num'].scale_
coef   = gs.best_estimator_.named_steps['model'].coef_
coef[-len(num_cols):] = coef[-len(num_cols):] / scales

imp = (pd.Series(coef, index=feature_names)
         .sort_values(key=abs, ascending=False))

plt.figure(figsize=(9,5))
imp.head(15).plot(kind='barh')
plt.gca().invert_yaxis()
plt.title('Top Drivers of Manufacturing Cost (Elastic Net)')
plt.xlabel('Δ Cost (USD)'); plt.tight_layout(); plt.show()

Summary

By coupling Elastic Net regression with a tidy Pipeline, we created a transparent, high‑bias‑low‑variance estimator that:

Predicts per‑job manufacturing cost within a small error band (low RMSE).
Handles multicollinearity among production metrics while deleting noise.
Explains itself via dollar‑impact coefficients—providing engineers with instant levers for cost reduction.

Updating the model is trivial: drop next quarter’s ERP extract into the notebook and call gs.fit(). Cost estimation just moved from back‑of‑envelope to reproducible data science.

Your opinion matters
Please write your valuable feedback about ProjectGurukul on Google | Facebook