Digital Ad Cost Prediction using Quantile Regression in ML

FREE Online Courses: Elevate Skills, Zero Cost. Enroll Now!

Digital advertisers and finance teams typically forecast the average daily ad spend per campaign, but they also must prepare for variability—from low‑cost days (10th percentile) to expensive bursts (90th percentile).

In this digital ad cost prediction project, we’ll predict the 10th, 50th, and 90th percentiles of daily ad cost (Cost_USD) using campaign attributes such as impressions, clicks, conversions, channel type, ad creative length, and campaign duration. By fitting separate quantile regression models, we’ll uncover how each driver’s influence shifts across the cost distribution—enabling marketing managers to budget conservatively, plan around typical spend, and provision for peak‑cost days.

Libraries Required

import pandas as pd  
import numpy as np  
import statsmodels.formula.api as smf    # Quantile regression via formula API  
from sklearn.model_selection import train_test_split  
from sklearn.metrics import mean_pinball_loss  # Proper loss for quantile forecasts

Dataset

Social Media Advertising Dataset

Step-by-Step Code Implementation

Load & Inspect Data

We load daily campaign data—including spend (Cost_USD), impressions, clicks, conversions, ad length, duration, and channel—and inspect with .info() and .describe() to verify completeness and understand cost distribution

# Load the Social Media Advertising dataset
# Source: Kaggle :contentReference[oaicite:1]{index=1}
df = pd.read_csv("social_media_advertising_dataset.csv")

# Inspect structure and core statistics
print(df.head())
print(df.info())
print(df['Cost_USD'].describe())

Preprocessing & Feature Engineering

We drop incomplete records to maintain data integrity.
We one‑hot encode Channel to capture medium‑specific cost effects.
We compute CTR (clicks / impressions) as an efficiency metric.
We assemble our feature matrix (features) and rename the cost target to Cost.

# Drop any rows missing key fields
df = df.dropna(subset=[
    'Impressions','Clicks','Conversions',
    'Cost_USD','Channel','Ad_Length_s','Duration_Days'
])

# One-hot encode the ad channel (e.g., Social, Display, Search)
df = pd.get_dummies(df, columns=['Channel'], drop_first=True)

# Compute click-through rate (CTR) as an additional predictor
df['CTR'] = df['Clicks'] / df['Impressions']

# Define predictor list and target
features = [
    'Impressions','Clicks','Conversions',
    'Ad_Length_s','Duration_Days','CTR'
] + [col for col in df.columns if col.startswith('Channel_')]

# Rename target for convenience
df.rename(columns={'Cost_USD':'Cost'}, inplace=True)

Train/Test Split

A random 80/20 split reserves 20% of campaigns for unbiased, out‑of‑sample evaluation of our quantile models.

# Reserve 20% for out‑of‑sample evaluation
train, test = train_test_split(df[features + ['Cost']],
                               test_size=0.2,
                               random_state=42)

Fit Quantile Regression Models

For each quantile (10th, 50th, 90th percentiles):

We define a formula string, e.g., “Cost ~ Impressions + Clicks + …”.
We fit a QuantReg model at that quantile on the training set.
We print the coefficient table, revealing how each predictor’s marginal effect on cost changes across the distribution—for instance, Conversions may reduce the 90th‑percentile cost more than the 10th.

quantiles = [0.10, 0.50, 0.90]
results   = {}
formula   = "Cost ~ " + " + ".join(features)

for q in quantiles:
    mod = smf.quantreg(formula, train)
    res = mod.fit(q=q)
    results[q] = res
    print(f"\n--- {int(q*100)}th Percentile Coefficients ---")
    print(res.summary().tables[1])   # coefficient estimates only

Evaluation with Pinball Loss

We predict quantile‑specific costs on the held‑out test set.
We compute pinball loss—an asymmetric loss function tailored to quantile forecasts—that penalizes under‑ and over‑predictions relative to each percentile. Lower pinball loss indicates more accurate quantile calibration.

for q, res in results.items():
    preds = res.predict(test[features])
    loss  = mean_pinball_loss(test['Cost'], preds, alpha=q)
    print(f"{int(q*100)}th percentile pinball loss: {loss:.2f}")

Summary

By applying quantile regression to digital ad spend data, we generate distribution‑aware cost forecasts:

The 10th‑percentile model guides conservative budgeting for low‑spend days.
The median (50th‑percentile) model predicts typical campaign costs for routine planning.
The 90th‑percentile model provisions for high‑cost spikes—such as viral content or bidding wars.

These quantile forecasts equip marketing, finance, and operations teams with nuanced spending insights—optimizing budget allocation, reducing financial risk, and improving ROI under spend variability.

Did you like this article? If Yes, please give ProjectGurukul 5 Stars on Google | Facebook