Marketing Campaign Cost Prediction using Bayesian Regression in ML

FREE Online Courses: Elevate Skills, Zero Cost. Enroll Now!

Marketing managers need to forecast the total cost of a digital media campaign—before finalising budgets—using early campaign metrics such as impressions, clicks, click‑through rate (CTR), engagement rate, and channel spend allocations (e.g. TV, radio, social media). Campaign costs exhibit nonlinear dependencies (e.g., diminishing returns in high-CTR segments and bulk-buy discounts on high-volume impressions). It is subject to uncertainty from bid-auction dynamics and creative performance. A single point estimate obscures this risk, potentially leading to under‑ or overspend. By applying Bayesian Regression, we obtain both:

1. A point forecast of total campaign cost.

2. A credible interval quantifying forecast uncertainty—enabling data‑driven budget allocation and risk management.

Libraries Required

import pandas as pd                              # data loading & manipulation  
import numpy as np                               # numerical operations  

import matplotlib.pyplot as plt                  # plotting  
import seaborn as sns                            # visualization  

import pymc3 as pm                               # Bayesian modeling  
import arviz as az                               # posterior analysis  

from sklearn.model_selection import train_test_split  
from sklearn.preprocessing import StandardScaler  
from sklearn.metrics import mean_absolute_error

Dataset

Media Campaign Cost Prediction

Step-by-Step Code Implementation

Data Loading & Preprocessing

Data preparation: We load all campaign metrics, drop missing values, and designate Cost as our target.
Scaling: We z‑score predictors so that coefficient priors (Normal(0,1)) apply uniformly and MCMC converges stably.

import pandas as pd

# Load the campaign data
df = pd.read_csv("data/media-campaign-cost-prediction.csv")
df = df.dropna()

# Assume the CSV has a 'Cost' column and the rest are predictors
features = [c for c in df.columns if c != 'Cost']
X = df[features].values
y = df['Cost'].values  # USD total campaign cost

# Train/test split (80% train / 20% test)
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# Standardize predictors for stable MCMC sampling
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler().fit(X_train)
X_train_s = scaler.transform(X_train)
X_test_s  = scaler.transform(X_test)

Define & Fit Bayesian Regression Model

Priors:

α ∼ Normal(mean of costs, 2×std) centres the intercept on the typical cost scale.
β ∼ Normal(0, 1) expresses modest initial uncertainty per standardised predictor.
σ ∼ HalfNormal(std of costs) enforces positive residual noise.

Model: Observed Cost ∼ Normal(α + β·X_std, σ).

Sampling: We draw 2,000 posterior samples after 1,000 tuning steps, with target_accept=0.9 for robust convergence diagnostics.

import pymc3 as pm

with pm.Model() as campaign_cost_model:
    # Priors for intercept and coefficients
    α = pm.Normal("α", mu=np.mean(y_train), sigma=np.std(y_train)*2)
    β = pm.Normal("β", mu=0, sigma=1, shape=X_train_s.shape[1])
    σ = pm.HalfNormal("σ", sigma=np.std(y_train))

    # Expected cost linear predictor
    μ = α + pm.math.dot(X_train_s, β)

    # Likelihood
    Y_obs = pm.Normal("Y_obs", mu=μ, sigma=σ, observed=y_train)

    # Sample the posterior
    trace = pm.sample(
        draws=2000,
        tune=1000,
        target_accept=0.9,
        return_inferencedata=True
    )

Posterior Analysis & Point Predictions

We sample Y_obs to obtain full predictive distributions, from which we derive:
Posterior mean point forecasts, and
94% Highest Posterior Density intervals for uncertainty bounds.

Evaluation: Mean Absolute Error (MAE) on the held‑out test set quantifies average forecast error.

import arviz as az
from sklearn.metrics import mean_absolute_error

# Summarize posterior distributions
az.summary(trace, round_to=2)

# Posterior predictive sampling
with campaign_cost_model:
    ppc = pm.sample_posterior_predictive(trace, var_names=["Y_obs"])

# Posterior means of parameters
α_post = trace.posterior["α"].mean().item()
β_post = trace.posterior["β"].mean(dim=["chain","draw"]).values

# Point predictions on the test set
y_pred = α_post + X_test_s.dot(β_post)

# Evaluate Mean Absolute Error
mae = mean_absolute_error(y_test, y_pred)
print(f"Test MAE: ${mae:.2f}")

Visualise Predictions & Credible Intervals

By sweeping a single predictor (e.g., impressions) and holding others fixed, we plot the posterior mean cost curve and its credible band—illuminating both expected cost sensitivity and the uncertainty around that relation.

import numpy as np
import matplotlib.pyplot as plt

# Sweep one key predictor (e.g., impressions) if it's the first column
grid_vals = np.linspace(X_train_s[:,0].min(), X_train_s[:,0].max(), 100)
grid = np.median(X_train_s, axis=0)[None,:].repeat(100, axis=0)
grid[:,0] = grid_vals

with campaign_cost_model:
    ppc_grid = pm.sample_posterior_predictive(
        trace, var_names=["Y_obs"], samples=1000
    )

preds     = ppc_grid["Y_obs"]
mean_pred = preds.mean(axis=0)
hpd       = az.hdi(preds, hdi_prob=0.94)

# Back‐transform the first predictor to original scale
orig_vals = scaler.inverse_transform(grid)[:,0]

plt.figure(figsize=(8,5))
plt.plot(orig_vals, mean_pred, label="Posterior mean")
plt.fill_between(orig_vals, hpd[:,0], hpd[:,1], alpha=0.3,
                 label="94% credible interval")
plt.scatter(
    scaler.inverse_transform(X_test_s)[:,0],
    y_test, color="k", alpha=0.5, label="Test data"
)
plt.xlabel(features[0])
plt.ylabel("Campaign Cost (USD)")
plt.title("Bayesian Regression: Cost vs. " + features[0])
plt.legend()
plt.tight_layout()
plt.show()

Summary

This Bayesian Regression workflow for Marketing Campaign Cost Prediction provides:

1. Accurate point forecasts of campaign costs from early performance metrics.

2. Credible intervals quantifying forecast uncertainty—essential for budget risk management.

3. Actionable insights: marketing leaders can allocate budgets, set bid strategies, and plan contingencies with explicit cost‑risk bounds—optimising both spend efficiency and campaign effectiveness.

Did you like our efforts? If Yes, please give ProjectGurukul 5 Stars on Google | Facebook

Marketing Campaign Cost Prediction using Bayesian Regression in ML

Libraries Required

Dataset

Step-by-Step Code Implementation

Data Loading & Preprocessing

Define & Fit Bayesian Regression Model

Posterior Analysis & Point Predictions

Visualise Predictions & Credible Intervals