Digital Ad Cost Prediction using Quantile Regression in ML
FREE Online Courses: Click for Success, Learn for Free - Start Now!
Digital advertisers and finance teams typically forecast the average daily ad spend per campaign, but they also must prepare for variability—from low‑cost days (10th percentile) to expensive bursts (90th percentile).
In this digital ad cost prediction project, we’ll predict the 10th, 50th, and 90th percentiles of daily ad cost (Cost_USD) using campaign attributes such as impressions, clicks, conversions, channel type, ad creative length, and campaign duration. By fitting separate quantile regression models, we’ll uncover how each driver’s influence shifts across the cost distribution—enabling marketing managers to budget conservatively, plan around typical spend, and provision for peak‑cost days.
Libraries Required
import pandas as pd import numpy as np import statsmodels.formula.api as smf # Quantile regression via formula API from sklearn.model_selection import train_test_split from sklearn.metrics import mean_pinball_loss # Proper loss for quantile forecasts
Dataset
Social Media Advertising Dataset
Step-by-Step Code Implementation
Load & Inspect Data
We load daily campaign data—including spend (Cost_USD), impressions, clicks, conversions, ad length, duration, and channel—and inspect with .info() and .describe() to verify completeness and understand cost distribution
# Load the Social Media Advertising dataset
# Source: Kaggle :contentReference[oaicite:1]{index=1}
df = pd.read_csv("social_media_advertising_dataset.csv")
# Inspect structure and core statistics
print(df.head())
print(df.info())
print(df['Cost_USD'].describe())
Preprocessing & Feature Engineering
- We drop incomplete records to maintain data integrity.
- We one‑hot encode Channel to capture medium‑specific cost effects.
- We compute CTR (clicks / impressions) as an efficiency metric.
- We assemble our feature matrix (features) and rename the cost target to Cost.
# Drop any rows missing key fields
df = df.dropna(subset=[
'Impressions','Clicks','Conversions',
'Cost_USD','Channel','Ad_Length_s','Duration_Days'
])
# One-hot encode the ad channel (e.g., Social, Display, Search)
df = pd.get_dummies(df, columns=['Channel'], drop_first=True)
# Compute click-through rate (CTR) as an additional predictor
df['CTR'] = df['Clicks'] / df['Impressions']
# Define predictor list and target
features = [
'Impressions','Clicks','Conversions',
'Ad_Length_s','Duration_Days','CTR'
] + [col for col in df.columns if col.startswith('Channel_')]
# Rename target for convenience
df.rename(columns={'Cost_USD':'Cost'}, inplace=True)
Train/Test Split
A random 80/20 split reserves 20% of campaigns for unbiased, out‑of‑sample evaluation of our quantile models.
# Reserve 20% for out‑of‑sample evaluation
train, test = train_test_split(df[features + ['Cost']],
test_size=0.2,
random_state=42)
Fit Quantile Regression Models
For each quantile (10th, 50th, 90th percentiles):
- We define a formula string, e.g., “Cost ~ Impressions + Clicks + …”.
- We fit a QuantReg model at that quantile on the training set.
- We print the coefficient table, revealing how each predictor’s marginal effect on cost changes across the distribution—for instance, Conversions may reduce the 90th‑percentile cost more than the 10th.
quantiles = [0.10, 0.50, 0.90]
results = {}
formula = "Cost ~ " + " + ".join(features)
for q in quantiles:
mod = smf.quantreg(formula, train)
res = mod.fit(q=q)
results[q] = res
print(f"\n--- {int(q*100)}th Percentile Coefficients ---")
print(res.summary().tables[1]) # coefficient estimates only
Evaluation with Pinball Loss
- We predict quantile‑specific costs on the held‑out test set.
- We compute pinball loss—an asymmetric loss function tailored to quantile forecasts—that penalizes under‑ and over‑predictions relative to each percentile. Lower pinball loss indicates more accurate quantile calibration.
for q, res in results.items():
preds = res.predict(test[features])
loss = mean_pinball_loss(test['Cost'], preds, alpha=q)
print(f"{int(q*100)}th percentile pinball loss: {loss:.2f}")
Summary
By applying quantile regression to digital ad spend data, we generate distribution‑aware cost forecasts:
- The 10th‑percentile model guides conservative budgeting for low‑spend days.
- The median (50th‑percentile) model predicts typical campaign costs for routine planning.
- The 90th‑percentile model provisions for high‑cost spikes—such as viral content or bidding wars.
These quantile forecasts equip marketing, finance, and operations teams with nuanced spending insights—optimizing budget allocation, reducing financial risk, and improving ROI under spend variability.