Ad Campaign Revenue Prediction with Mixed Ridge & Lasso Regression in ML
FREE Online Courses: Your Passport to Excellence - Start Now
Marketing managers often bet millions across TV, radio, and print, but they seldom know—before launching—how much revenue that spend will actually drive. Linear regression can model the relationship, yet pure Ridge (ℓ²) may keep every noisy variable, while pure Lasso (ℓ¹) can overshrink coefficients in the presence of collinearity. Elastic Net blends both penalties, retaining Lasso’s feature‑selection power and Ridge’s stability. In this project, we will:
- Predict campaign revenue (product sales in $ 000s) from planned media spend.
- Balance bias and variance with Elastic Net’s dual penalty, yielding a sparse yet stable model that is easy to interpret for budget reallocations.
Libraries Required
| Purpose | Library |
| Core data wrangling | pandas, numpy |
| Visualisation | matplotlib, seaborn |
| ML workflow | scikit‑learn → StandardScaler, ColumnTransformer, ElasticNet, GridSearchCV, Pipeline, train_test_split |
| Evaluation | mean_squared_error, r2_score |
Dataset Link
Step-by-Step Code Implementation
1. Import Libraries
import pandas as pd import numpy as np import matplotlib.pyplot as plt import seaborn as sns from sklearn.compose import ColumnTransformer from sklearn.preprocessing import StandardScaler from sklearn.model_selection import train_test_split, GridSearchCV from sklearn.pipeline import Pipeline from sklearn.linear_model import ElasticNet from sklearn.metrics import mean_squared_error, r2_score
2. Download and load the dataset
200 campaigns where TV, Radio, and Newspaper spend (in thousands of dollars) are paired with the resulting Sales revenue.
# one‑time shell command (requires Kaggle API token):
# kaggle datasets download -d yasserh/advertising-sales-dataset -p data --unzip
data = pd.read_csv("data/Advertising.csv") # 200 rows, 4 columns
3. Quick EDA
print(data.head()) sns.pairplot(data); plt.show()
4. Define features & target
X = data[['TV', 'Radio', 'Newspaper']] # spends in $000s y = data['Sales'] # revenue in $000s
5. Build pipeline & hyper‑parameter grid
ColumnTransformer standardises spends so the penalty is scale‑agnostic; wrapping scaler + model inside a Pipeline prevents data leakage in cross‑validation.
preprocess = ColumnTransformer(
[('num', StandardScaler(), X.columns)],
remainder='drop'
)
pipe = Pipeline([
('prep', preprocess),
('model', ElasticNet(max_iter=10000, random_state=42))
])
param_grid = {
'model__alpha': np.logspace(-3, 1, 20), # overall strength
'model__l1_ratio': np.linspace(0.1, 0.9, 9) # 0.1 ≈ Ridge‑heavy, 0.9 ≈ Lasso‑heavy
}
6. Train/Test split & grid search
- TV, radio, and newspaper budgets can be correlated; pure Lasso tends to pick one and drop the rest, while Ridge keeps all of them. Elastic Net blends both:
min∥y−Xβ∥22+α[(1−ρ)∥β∥22/2+ρ∥β∥1]\min \|y – Xβ\|_{2}^{2} + α\bigl[(1 – ρ)\|β\|_{2}^{2}/2 + ρ\|β\|_{1}\bigr]
where αα controls total penalty strength and ρρ (l1_ratio) sets the L1/L2 mix. - GridSearchCV explores 20 α values × 9 l1‑ratios = 180 candidate models, selecting the one with the smallest five‑fold RMSE.
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42)
search = GridSearchCV(
pipe, param_grid,
cv=5, scoring='neg_root_mean_squared_error',
n_jobs=-1, verbose=1
)
search.fit(X_train, y_train)
print("Best α:", search.best_params_['model__alpha'])
print("Best l1_ratio:", search.best_params_['model__l1_ratio'])
7. Evaluate model
y_pred = search.predict(X_test)
rmse = mean_squared_error(y_test, y_pred, squared=False)
r2 = r2_score(y_test, y_pred)
print(f"Hold‑out RMSE: {rmse:.2f} | R²: {r2:.3f}")
8. Inspect coefficients
The bar plot reveals which channel contributes the most per additional $ 1,000. Coefficients driven exactly to zero (if any) indicate channels that add no predictive power given others.
# Coefficients after inverse‑scaling
scaler = search.best_estimator_.named_steps['prep'].named_transformers_['num']
coefs = search.best_estimator_.named_steps['model'].coef_ / scaler.scale_
imp = pd.Series(coefs, index=X.columns).sort_values(key=abs, ascending=False)
plt.figure(figsize=(6,4))
imp.plot(kind='barh'); plt.gca().invert_yaxis()
plt.title('Elastic Net Coefficients'); plt.xlabel('Δ Sales ($000)'); plt.show()
Summary
This notebook shows how an Elastic Net model—combining Ridge and Lasso penalties—can turn a small advertising table into a reliable, interpretable revenue predictor:
- Forecast accuracy: low RMSE and solid R2R^2 on unseen data.
- Business insight: clear ranking of channels’ revenue impact, with automatic elimination of redundant features.
- Easy upkeep: a single fit() retrains the whole pipeline when fresh media‑mix data arrive.
Armed with these predictions, marketers can allocate budgets toward the highest‑return channels before spending a cent—maximising revenue while keeping modelling transparent and defensible.