Ad Engagement Prediction using Quantile Regression in ML

FREE Online Courses: Dive into Knowledge for Free. Learn More!

Most digital advertising models forecast the average engagement rate (e.g., clicks or conversions per impression). Still, media buyers also need to prepare for variability—understanding what engagement might look like in underperforming campaigns (10th percentile) or exceptionally viral ones (90th percentile).

In the ad engagement prediction project, we’ll predict multiple quantiles (10th, 50th, 90th percentiles) of the daily click‑through rate (CTR) for paid ads based on features like impressions, budget, channel type, ad length, and campaign duration.

By fitting separate quantile regression models, we’ll uncover how each driver affects low‑, median‑, and high‑engagement scenarios—helping marketers plan for both conservative and optimistic outcomes.

Libraries Required

import pandas as pd  
import numpy as np  
import statsmodels.formula.api as smf    # Quantile regression via formula API  
from sklearn.model_selection import train_test_split  
from sklearn.metrics import mean_pinball_loss

Dataset

Social Media Advertising Dataset

Step-by-Step Code Implementation

Load & Inspect Data

We import daily campaign data—covering impressions, clicks, spend, channel, ad length, and run duration—and inspect its schema and summary statistics to ensure data quality.

# Load the Social Media Advertising dataset (Kaggle)
# Contains daily campaign stats: Impressions, Clicks, Cost_USD, Channel, Ad_Length_s, Duration_Days
df = pd.read_csv("social_media_advertising_dataset.csv")

# Inspect structure and basic stats
print(df.head())
print(df.info())
print(df.describe())

Preprocessing & Feature Engineering

We compute the click‑through rate (CTR) as clicks divided by impressions, our continuous response.
We one‑hot encode the Channel categorical variable (e.g., Channel_Display, Channel_Social) to convert campaign medium into numeric predictors.
We assemble our final modelling DataFrame with four numeric inputs (impressions, cost, ad length, duration) plus the channel dummies, dropping any rows with missing values.

# Compute CTR as target: clicks per thousand impressions
df['CTR'] = df['Clicks'] / df['Impressions']

# One-hot encode categorical Channel field
df = pd.get_dummies(df, columns=['Channel'], drop_first=True)

# Define predictor list and response
features = [
    'Impressions', 'Cost_USD', 'Ad_Length_s', 'Duration_Days'
] + [c for c in df.columns if c.startswith('Channel_')]
data = df[features + ['CTR']].dropna()

Train/Test Split

We reserve 20% of campaigns for evaluation, ensuring our quantile models are tested on unseen data.

# Reserve 20% of data for out‑of‑sample evaluation
train, test = train_test_split(data, test_size=0.2, random_state=42)

Fit Quantile Regression Models

For each target quantile (10th, 50th, 90th percentiles):

We define a formula string linking CTR to our predictors.
We fit a QuantReg model on the training set at that quantile.
We print the coefficient table, which shows how each predictor’s effect differs across low‑, median‑, and high‑engagement scenarios (for example, cost may have a stronger positive effect on the 90th percentile than on the 10th).

quantiles = [0.10, 0.50, 0.90]
results   = {}
formula   = "CTR ~ " + " + ".join(features)

for q in quantiles:
    model = smf.quantreg(formula, train)
    res   = model.fit(q=q)
    results[q] = res
    print(f"\n--- {int(q*100)}th Percentile Coefficients ---")
    print(res.summary().tables[1])   # coefficient table only

Evaluation with Pinball Loss

We generate quantile‑specific CTR predictions on the test set.
We compute pinball loss for each quantile—a loss function tailored to quantile estimates—to quantify forecast accuracy. Lower pinball loss indicates better alignment of predicted and actual engagement at each percentile.

for q, res in results.items():
    preds = res.predict(test[features])
    loss  = mean_pinball_loss(test['CTR'], preds, alpha=q)
    print(f"{int(q*100)}th quantile pinball loss: {loss:.4f}")

Summary

Quantile regression provides distribution‑aware insights into ad engagement: rather than a single average CTR estimate, marketers obtain forecasts for pessimistic (10th percentile), typical (50th percentile), and optimistic (90th percentile) scenarios.

For instance, short ads may boost median CTR, but longer videos might drive outsized engagement only in top‑performing slots.

By modeling multiple quantiles, media planners can set conservative budgets, anticipate standard performance, and allocate additional spend where high‑engagement outliers are likely—optimizing ROI under uncertainty.

Did you know we work 24x7 to provide you best tutorials
Please encourage us - write a review on Google | Facebook