Ad Engagement Prediction using Quantile Regression in ML
FREE Online Courses: Transform Your Career – Enroll for Free!
Most digital advertising models forecast the average engagement rate (e.g., clicks or conversions per impression). Still, media buyers also need to prepare for variability—understanding what engagement might look like in underperforming campaigns (10th percentile) or exceptionally viral ones (90th percentile).
In the ad engagement prediction project, we’ll predict multiple quantiles (10th, 50th, 90th percentiles) of the daily click‑through rate (CTR) for paid ads based on features like impressions, budget, channel type, ad length, and campaign duration.
By fitting separate quantile regression models, we’ll uncover how each driver affects low‑, median‑, and high‑engagement scenarios—helping marketers plan for both conservative and optimistic outcomes.
Libraries Required
import pandas as pd import numpy as np import statsmodels.formula.api as smf # Quantile regression via formula API from sklearn.model_selection import train_test_split from sklearn.metrics import mean_pinball_loss
Dataset
Social Media Advertising Dataset
Step-by-Step Code Implementation
Load & Inspect Data
We import daily campaign data—covering impressions, clicks, spend, channel, ad length, and run duration—and inspect its schema and summary statistics to ensure data quality.
# Load the Social Media Advertising dataset (Kaggle)
# Contains daily campaign stats: Impressions, Clicks, Cost_USD, Channel, Ad_Length_s, Duration_Days
df = pd.read_csv("social_media_advertising_dataset.csv")
# Inspect structure and basic stats
print(df.head())
print(df.info())
print(df.describe())
Preprocessing & Feature Engineering
- We compute the click‑through rate (CTR) as clicks divided by impressions, our continuous response.
- We one‑hot encode the Channel categorical variable (e.g., Channel_Display, Channel_Social) to convert campaign medium into numeric predictors.
- We assemble our final modelling DataFrame with four numeric inputs (impressions, cost, ad length, duration) plus the channel dummies, dropping any rows with missing values.
# Compute CTR as target: clicks per thousand impressions
df['CTR'] = df['Clicks'] / df['Impressions']
# One-hot encode categorical Channel field
df = pd.get_dummies(df, columns=['Channel'], drop_first=True)
# Define predictor list and response
features = [
'Impressions', 'Cost_USD', 'Ad_Length_s', 'Duration_Days'
] + [c for c in df.columns if c.startswith('Channel_')]
data = df[features + ['CTR']].dropna()
Train/Test Split
We reserve 20% of campaigns for evaluation, ensuring our quantile models are tested on unseen data.
# Reserve 20% of data for out‑of‑sample evaluation train, test = train_test_split(data, test_size=0.2, random_state=42)
Fit Quantile Regression Models
For each target quantile (10th, 50th, 90th percentiles):
- We define a formula string linking CTR to our predictors.
- We fit a QuantReg model on the training set at that quantile.
- We print the coefficient table, which shows how each predictor’s effect differs across low‑, median‑, and high‑engagement scenarios (for example, cost may have a stronger positive effect on the 90th percentile than on the 10th).
quantiles = [0.10, 0.50, 0.90]
results = {}
formula = "CTR ~ " + " + ".join(features)
for q in quantiles:
model = smf.quantreg(formula, train)
res = model.fit(q=q)
results[q] = res
print(f"\n--- {int(q*100)}th Percentile Coefficients ---")
print(res.summary().tables[1]) # coefficient table only
Evaluation with Pinball Loss
- We generate quantile‑specific CTR predictions on the test set.
- We compute pinball loss for each quantile—a loss function tailored to quantile estimates—to quantify forecast accuracy. Lower pinball loss indicates better alignment of predicted and actual engagement at each percentile.
for q, res in results.items():
preds = res.predict(test[features])
loss = mean_pinball_loss(test['CTR'], preds, alpha=q)
print(f"{int(q*100)}th quantile pinball loss: {loss:.4f}")
Summary
Quantile regression provides distribution‑aware insights into ad engagement: rather than a single average CTR estimate, marketers obtain forecasts for pessimistic (10th percentile), typical (50th percentile), and optimistic (90th percentile) scenarios.
For instance, short ads may boost median CTR, but longer videos might drive outsized engagement only in top‑performing slots.
By modeling multiple quantiles, media planners can set conservative budgets, anticipate standard performance, and allocate additional spend where high‑engagement outliers are likely—optimizing ROI under uncertainty.