Ad Campaign Revenue Prediction with Mixed Ridge & Lasso Regression in ML

FREE Online Courses: Your Passport to Excellence - Start Now

Marketing managers often bet millions across TV, radio, and print, but they seldom know—before launching—how much revenue that spend will actually drive. Linear regression can model the relationship, yet pure Ridge (ℓ²) may keep every noisy variable, while pure Lasso (ℓ¹) can overshrink coefficients in the presence of collinearity. Elastic Net blends both penalties, retaining Lasso’s feature‑selection power and Ridge’s stability. In this project, we will:

  • Predict campaign revenue (product sales in $ 000s) from planned media spend.
  • Balance bias and variance with Elastic Net’s dual penalty, yielding a sparse yet stable model that is easy to interpret for budget reallocations.

Libraries Required

Purpose Library
Core data wrangling pandas, numpy
Visualisation matplotlib, seaborn
ML workflow scikit‑learnStandardScaler, ColumnTransformer, ElasticNet, GridSearchCV, Pipeline, train_test_split
Evaluation mean_squared_error, r2_score

Dataset Link

Advertising Sales Dataset

Step-by-Step Code Implementation

1. Import Libraries

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.pipeline import Pipeline
from sklearn.linear_model import ElasticNet
from sklearn.metrics import mean_squared_error, r2_score

2. Download and load the dataset

200 campaigns where TV, Radio, and Newspaper spend (in thousands of dollars) are paired with the resulting Sales revenue.

# one‑time shell command (requires Kaggle API token):
# kaggle datasets download -d yasserh/advertising-sales-dataset -p data --unzip

data = pd.read_csv("data/Advertising.csv")      # 200 rows, 4 columns

3. Quick EDA

print(data.head())
sns.pairplot(data); plt.show()

4. Define features & target

X = data[['TV', 'Radio', 'Newspaper']]      # spends in $000s
y = data['Sales']                           # revenue in $000s

5. Build pipeline & hyper‑parameter grid

ColumnTransformer standardises spends so the penalty is scale‑agnostic; wrapping scaler + model inside a Pipeline prevents data leakage in cross‑validation.

preprocess = ColumnTransformer(
    [('num', StandardScaler(), X.columns)],
    remainder='drop'
)

pipe = Pipeline([
    ('prep', preprocess),
    ('model', ElasticNet(max_iter=10000, random_state=42))
])

param_grid = {
    'model__alpha':  np.logspace(-3, 1, 20),   # overall strength
    'model__l1_ratio': np.linspace(0.1, 0.9, 9)  # 0.1 ≈ Ridge‑heavy, 0.9 ≈ Lasso‑heavy
}

6. Train/Test split & grid search

  • TV, radio, and newspaper budgets can be correlated; pure Lasso tends to pick one and drop the rest, while Ridge keeps all of them. Elastic Net blends both:
    min⁡∥y−Xβ∥22+α[(1−ρ)∥β∥22/2+ρ∥β∥1]\min \|y – Xβ\|_{2}^{2} + α\bigl[(1 – ρ)\|β\|_{2}^{2}/2 + ρ\|β\|_{1}\bigr]
    where αα controls total penalty strength and ρρ (l1_ratio) sets the L1/L2 mix.
  • GridSearchCV explores 20 α values × 9 l1‑ratios = 180 candidate models, selecting the one with the smallest five‑fold RMSE.
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42)

search = GridSearchCV(
    pipe, param_grid,
    cv=5, scoring='neg_root_mean_squared_error',
    n_jobs=-1, verbose=1
)
search.fit(X_train, y_train)

print("Best α:", search.best_params_['model__alpha'])
print("Best l1_ratio:", search.best_params_['model__l1_ratio'])

7. Evaluate model

y_pred = search.predict(X_test)
rmse = mean_squared_error(y_test, y_pred, squared=False)
r2   = r2_score(y_test, y_pred)

print(f"Hold‑out RMSE: {rmse:.2f} | R²: {r2:.3f}")

8. Inspect coefficients

The bar plot reveals which channel contributes the most per additional $ 1,000. Coefficients driven exactly to zero (if any) indicate channels that add no predictive power given others.

# Coefficients after inverse‑scaling
scaler = search.best_estimator_.named_steps['prep'].named_transformers_['num']
coefs  = search.best_estimator_.named_steps['model'].coef_ / scaler.scale_

imp = pd.Series(coefs, index=X.columns).sort_values(key=abs, ascending=False)
plt.figure(figsize=(6,4))
imp.plot(kind='barh'); plt.gca().invert_yaxis()
plt.title('Elastic Net Coefficients'); plt.xlabel('Δ Sales ($000)'); plt.show()

Summary

This notebook shows how an Elastic Net model—combining Ridge and Lasso penalties—can turn a small advertising table into a reliable, interpretable revenue predictor:

  • Forecast accuracy: low RMSE and solid R2R^2 on unseen data.
  • Business insight: clear ranking of channels’ revenue impact, with automatic elimination of redundant features.
  • Easy upkeep: a single fit() retrains the whole pipeline when fresh media‑mix data arrive.

Armed with these predictions, marketers can allocate budgets toward the highest‑return channels before spending a cent—maximising revenue while keeping modelling transparent and defensible.

Your opinion matters
Please write your valuable feedback about ProjectGurukul on Google | Facebook

ProjectGurukul Team

ProjectGurukul Team specializes in creating project-based learning resources for programming, Java, Python, Android, AI, Webdevelopment and machine learning. Our mission is to help learners build practical skills through engaging, hands-on projects. We also offer free major and minor projects with source code for engineering students

Leave a Reply

Your email address will not be published. Required fields are marked *