Digital Marketing ROI Prediction using ElasticNet Algorithm in ML
FREE Online Courses: Transform Your Career – Enroll for Free!
Growth teams live or die by ROI (Return on Investment): every media dollar must return with interest. Yet most marketers learn true ROI only after a campaign ends, when the budget has already been burned. Using historic Facebook‑ad logs, we will build a mixed‑penalty Elastic Net model that:
- Forecasts the ROI of an ad set before launch, using metrics known at planning time (audience, creative, impressions, clicks, spend, conversions).
- Balances Ridge’s stability and Lasso’s sparsity so that correlated features (e.g., clicks & CTR) stay sensible. In contrast, weak ones are shrunk to zero, yielding an interpretable driver chart that media buyers can act on.
Libraries Required
| Role | Package |
| Data wrangling | pandas, numpy |
| Visuals | matplotlib, seaborn |
| ML pipeline | scikit‑learn → ColumnTransformer, OneHotEncoder, StandardScaler, ElasticNet, GridSearchCV, Pipeline, train_test_split |
| Metrics | mean_squared_error, r2_score |
Dataset Link
Online Advertising Digital Marketing Data
Step-by-Step Code Implementation
Why Elastic Net?: Impressions, Clicks, and Spent are correlated; the Ridge part keeps them stable, and the Lasso part prunes noisy audience dummies.
Import libraries
import pandas as pd import numpy as np import matplotlib.pyplot as plt import seaborn as sns from sklearn.compose import ColumnTransformer from sklearn.preprocessing import OneHotEncoder, StandardScaler from sklearn.model_selection import train_test_split, GridSearchCV from sklearn.pipeline import Pipeline from sklearn.linear_model import ElasticNet from sklearn.metrics import mean_squared_error, r2_score
Download & load dataset
Dataset: each row represents a Facebook ad, including spend, impressions, clicks, demographic targeting, and conversions.
# one‑time shell (requires Kaggle API key):
# kaggle datasets download -d naniruddhan/online-advertising-digital-marketing-data -p data --unzip
ads = pd.read_csv("data/online_ads.csv") # adjust filename if necessary
Target engineering – ROI
Target (ROI): using a fixed conversion value turns counts + spend into a dimensionless profitability ratio that finance teams understand.
Assumption: every approved conversion brings $100 in revenue.
VALUE_PER_CONV = 100 # business‑specific; tune as needed ads = ads[ads['Approved_Conversion'] > 0] # avoid div‑by‑zero ads['Revenue'] = ads['Approved_Conversion'] * VALUE_PER_CONV ads['ROI'] = (ads['Revenue'] - ads['Spent']) / ads['Spent'] y = ads['ROI']
Feature matrix
X = ads.drop(columns=['ROI', 'Revenue', 'Approved_Conversion', 'ad_id']) cat_cols = X.select_dtypes(include='object').columns num_cols = X.select_dtypes(exclude='object').columns
Elastic Net pipeline
Pipeline: everything (encoding, scaling, modelling) is wrapped in a Pipeline so cross‑validation sees only training data—no leakage, easy deployment.
preprocess = ColumnTransformer([
('cat', OneHotEncoder(drop='first', sparse=False), cat_cols),
('num', StandardScaler(), num_cols)
])
pipe = Pipeline([
('prep', preprocess),
('enet', ElasticNet(max_iter=15_000, random_state=42))
])
Train/test split + grid search
Hyper‑search: scanning 162 models (18 α × 9 mixes) with 5‑fold CV finds the lowest RMSE while keeping the model sparse.
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42, stratify=ads['campaign_id'])
param_grid = {
'enet__alpha' : np.logspace(-3, 1, 18), # overall shrinkage
'enet__l1_ratio': np.linspace(0.1, 0.9, 9) # 0.1≈Ridge‑heavy … 0.9≈Lasso‑heavy
}
gs = GridSearchCV(pipe, param_grid,
cv=5,
scoring='neg_root_mean_squared_error',
n_jobs=-1, verbose=1)
gs.fit(X_train, y_train)
print("Optimal α :", gs.best_params_['enet__alpha'])
print("Optimal l1_ratio :", gs.best_params_['enet__l1_ratio'])
Evaluate model
y_pred = gs.predict(X_test)
rmse = mean_squared_error(y_test, y_pred, squared=False)
r2 = r2_score(y_test, y_pred)
print(f"Test RMSE: {rmse:.3f} ROI points | R²: {r2:.3f}")
Interpret top coefficients
Interpretation: the bar chart reveals that the female 25‑34 interest‑cluster three boosts ROI by 0.12, while overspending on impressions or running weekend ads drags ROI down—actionable guidance for media buyers.
# Recover full feature names
ohe = gs.best_estimator_.named_steps['prep'].named_transformers_['cat']
ohe_names = ohe.get_feature_names_out(cat_cols)
feat_names = np.hstack([ohe_names, num_cols])
# Reverse scale numeric coeffs
scales = gs.best_estimator_.named_steps['prep'].named_transformers_['num'].scale_
coef = gs.best_estimator_.named_steps['enet'].coef_
coef[-len(num_cols):] = coef[-len(num_cols):] / scales
imp = (pd.Series(coef, index=feat_names)
.sort_values(key=abs, ascending=False)
.head(15))
plt.figure(figsize=(9,5))
imp.plot(kind='barh')
plt.gca().invert_yaxis()
plt.title('Elastic Net – Top ROI Drivers'); plt.xlabel('Δ ROI'); plt.tight_layout(); plt.show()
Summary
This Elastic Net workflow turns raw digital‑ad logs into a pre‑launch ROI forecaster that is:
- Accurate: low test RMSE and solid R2R^2.
- Interpretable: clear coefficient rankings identify high‑return segments and wasteful settings.
- Easy to refresh: drop next month’s campaign CSV, call .fit(), and the pipeline retrains end‑to‑end in seconds
With a single notebook, growth teams can stop guessing and spend every ad dollar where it multiplies fastest.