Ad Placement Cost Prediction using Stepwise Regression in ML

FREE Online Courses: Click for Success, Learn for Free - Start Now!

Digital advertisers allocate budgets across a variety of placements—banner ads, native content slots, pre‑roll videos—yet the cost efficiency of each placement can vary widely based on factors such as site position, time of day, and targeting parameters.

In this ad placement cost prediction ML project, we’ll predict the daily placement cost (media_cost_usd) for social and display ads based on features including impressions, clicks, conversions, ad position in content, and campaign duration.

Libraries Required

import pandas as pd               # Data loading & manipulation  
import numpy as np                # Numerical operations  
import statsmodels.api as sm      # Ordinary Least Squares regression  
from sklearn.model_selection import train_test_split   # Train/test split  
from sklearn.metrics import r2_score, mean_squared_error  # Evaluation metrics  
import matplotlib.pyplot as plt   # Visualization

Dataset

Marketing Campaign Dataset

Step-by-Step Code Implementation

Data Loading & Initial Inspection

We load daily ad campaign spend data—including impressions, clicks, conversions, campaign duration, and ad position—and inspect its schema and descriptive statistics.

# Block 1: Load dataset
url = "https://www.kaggle.com/datasets/rahulchavan99/marketing-campaign-dataset/download"
df = pd.read_csv(url)

# Inspect structure
print(df.head())
print(df.info())
print(df.describe())

Dataset contains daily rows with media_cost_usd, impressions, clicks, conversions, duration_in_days, and position_in_content

Data Preprocessing

Incomplete records are dropped. We log‑transform impressions to reduce skew and compute CTR (clicks/impressions) as an efficiency metric.
Predictors (X) include transformed and raw features; the target (y) is media_cost_usd. We split the data 80/20 for training and testing.

# Block 2: Clean & feature engineering
# Drop incomplete records
df = df.dropna(subset=['media_cost_usd','impressions','clicks','conversions','duration_in_days','position_in_content'])

# Optionally scale or transform variables to reduce skew (e.g., log(impressions+1))
df['log_impressions'] = np.log1p(df['impressions'])
df['ctr'] = df['clicks'] / df['impressions']  # click‑through rate

# Define predictors and target
X = df[['log_impressions','ctr','conversions','duration_in_days','position_in_content']]
y = df['media_cost_usd']

# Split into training and test sets (80/20)
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

Stepwise Regression Function

The stepwise_selection function iteratively adds the excluded variable with the lowest p -value below 0.01 (forward inclusion).
It removes the included variable with the highest p -value above 0.05 (backward elimination), stopping when no further changes occur.
Therefore yields a concise set of statistically significant predictors.

# Block 3: Forward–backward stepwise selection
def stepwise_selection(X, y,
                       initial_list=[],
                       threshold_in=0.01,
                       threshold_out=0.05,
                       verbose=True):
    included = list(initial_list)
    while True:
        changed = False

        # Forward step: try each excluded feature
        excluded = list(set(X.columns) - set(included))
        new_pvals = pd.Series(index=excluded, dtype=float)
        for col in excluded:
            model = sm.OLS(y, sm.add_constant(X[included + [col]])).fit()
            new_pvals[col] = model.pvalues[col]
        best_pval = new_pvals.min()
        if best_pval < threshold_in:
            best_var = new_pvals.idxmin()
            included.append(best_var)
            changed = True
            if verbose:
                print(f"Add  {best_var:25} p-value {best_pval:.4f}")

        # Backward step: test removing each included feature
        model = sm.OLS(y, sm.add_constant(X[included])).fit()
        pvals = model.pvalues.iloc[1:]  # exclude intercept
        worst_pval = pvals.max()
        if worst_pval > threshold_out:
            worst_var = pvals.idxmax()
            included.remove(worst_var)
            changed = True
            if verbose:
                print(f"Drop {worst_var:25} p-value {worst_pval:.4f}")

        if not changed:
            break

    return included

Model Building & Evaluation

Using the selected features, we fit an Ordinary Least Squares regression via statsmodels.
The .summary() output reports coefficient estimates (cost impact per unit change), p -values (statistical significance), R², adjusted R², and diagnostic metrics (F‑statistic, AIC), offering transparent insight into spend drivers.
Predictions on the held‑out test set yield R² (variance explained) and RMSE (root‑mean‑square error), quantifying model generalization.

# Block 4: Feature selection
selected_features = stepwise_selection(X_train, y_train)

# Fit final OLS model
X_train_sel = sm.add_constant(X_train[selected_features])
model = sm.OLS(y_train, X_train_sel).fit()
print(model.summary())

# Predict on test set
X_test_sel = sm.add_constant(X_test[selected_features])
y_pred = model.predict(X_test_sel)

# Compute performance metrics
print("Test R²:", r2_score(y_test, y_pred))
print("Test RMSE:", np.sqrt(mean_squared_error(y_test, y_pred)))

Residual Diagnostics

We plot residuals versus predicted costs to check for heteroscedasticity or systematic patterns, therefore validating key OLS assumptions and ensuring the model’s reliability.

# Block 5: Residual plot
residuals = y_test - y_pred
plt.scatter(y_pred, residuals, alpha=0.5)
plt.axhline(0, linestyle="--")
plt.xlabel("Predicted Media Cost (USD)")
plt.ylabel("Residuals")
plt.title("Residuals vs. Predicted Cost")
plt.show()

Summary

By applying stepwise regression to ad placement data, we isolate the most influential cost drivers—such as log‑impressions, CTR, conversions, campaign length, and ad position—while pruning non‑informative variables.

Hence, the resulting linear model balances interpretability (few, significant predictors) with predictive accuracy (high test‑set R², low RMSE), providing media planners a transparent, data‑driven tool to forecast placement costs and optimize budget allocation across channels.

Did you know we work 24x7 to provide you best tutorials
Please encourage us - write a review on Google | Facebook