Digital Marketing Cost Prediction using Stepwise Regression in ML

FREE Online Courses: Elevate Skills, Zero Cost. Enroll Now!

Marketing leaders need to forecast how much they’ll spend on campaigns—across channels like TV, radio, social, and search—to plan budgets effectively and maximise ROI. In this project, we’ll predict weekly media spend based on campaign performance metrics (impressions, clicks, conversions), channel type, and temporal factors (week of year).

With stepwise regression, we’ll find the strongest cost drivers and build a linear model that will help CMOs and media planners allocate budgets more strategically.

Libraries Required

import pandas as pd                                       # Data manipulation  
import numpy as np                                        # Numerical operations  
import statsmodels.api as sm                              # OLS regression  
from sklearn.model_selection import train_test_split      # Data splitting  
from sklearn.metrics import r2_score, mean_squared_error  # Evaluation  
import matplotlib.pyplot as plt                           # Visualization

Dataset

Sample Media Spends Data

Step-by-Step Code Implementation

Data Loading & Initial Inspection

We load a sample media‑spends dataset containing weekly metrics—Impressions, Clicks, Conversions, and Cost—for multiple channels (TV, Radio, Digital). We inspect its schema and summary statistics to understand variable distributions.

# Block 1: Load dataset
# Sample Media Spends Data – Kaggle :contentReference[oaicite:1]{index=1}
url = "https://www.kaggle.com/datasets/yugagrawal95/sample-media-spends-data/download"
df = pd.read_csv(url)

# Inspect the first few rows and structure
print(df.head())
print(df.info())
print(df.describe())

Data Preprocessing

The categorical Channel field is one‑hot encoded to transform it into numeric dummy variables (e.g., Channel_Digital, Channel_Radio). We drop any incomplete records to ensure a clean dataset. We separate predictors (X) from the target cost (y) and split the data 80/20 into training and test sets.

# Block 2: Encode categoricals and clean
# Assume columns: 'Channel', 'Week', 'Impressions', 'Clicks', 'Conversions', 'Cost'
df_enc = pd.get_dummies(df, columns=["Channel"], drop_first=True)

# Drop any missing values
df_enc = df_enc.dropna()

# Define predictors and target
X = df_enc.drop("Cost", axis=1)
y = df_enc["Cost"]

# Train–test split (80/20)
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

Stepwise Regression Function

The stepwise_selection function performs a hybrid forward‑backward algorithm—adding the excluded predictor with the smallest p‑value below 0.01 (forward step) and removing the included predictor with the largest p‑value above 0.05 (backward step) iteratively—until no variables meet the criteria for addition or removal. This yields a concise subset of significant predictors.

# Block 3: Forward–backward stepwise selection
def stepwise_selection(X, y,
                       initial_list=[],
                       threshold_in=0.01,
                       threshold_out=0.05,
                       verbose=True):
    included = list(initial_list)
    while True:
        changed = False

        # Forward step: consider adding each excluded predictor
        excluded = list(set(X.columns) - set(included))
        pvals = pd.Series(index=excluded, dtype=float)
        for col in excluded:
            model = sm.OLS(y, sm.add_constant(X[included + [col]])).fit()
            pvals[col] = model.pvalues[col]
        best_pval = pvals.min()
        if best_pval < threshold_in:
            best_var = pvals.idxmin()
            included.append(best_var)
            changed = True
            if verbose:
                print(f"Add  {best_var:25} p-value {best_pval:.4f}")

        # Backward step: consider removing each included predictor
        model = sm.OLS(y, sm.add_constant(X[included])).fit()
        pvals_included = model.pvalues.iloc[1:]  # exclude intercept
        worst_pval = pvals_included.max()
        if worst_pval > threshold_out:
            worst_var = pvals_included.idxmax()
            included.remove(worst_var)
            changed = True
            if verbose:
                print(f"Drop {worst_var:25} p-value {worst_pval:.4f}")

        if not changed:
            break

    return included

Model Building & Evaluation

Using the selected features, we fit an Ordinary Least Squares regression via statsmodels. The printed .summary() provides coefficient estimates (spend impact per unit change in each predictor), p‑values (statistical significance), R², and diagnostic statistics (AIC, F‑statistic), allowing interpretation of each factor’s effect on spend.

We predict weekly spend on the held‑out test set and compute R² (variance explained) and RMSE (root‑mean‑square error) to quantify the model’s generalisation performance.

# Block 4: Feature selection
selected_features = stepwise_selection(X_train, y_train)

# Fit final OLS model
X_train_sel = sm.add_constant(X_train[selected_features])
model = sm.OLS(y_train, X_train_sel).fit()
print(model.summary())

# Predict on test set
X_test_sel = sm.add_constant(X_test[selected_features])
y_pred = model.predict(X_test_sel)

# Compute performance metrics
print("Test R²:", r2_score(y_test, y_pred))
print("Test RMSE:", np.sqrt(mean_squared_error(y_test, y_pred)))

Residual Diagnostics

A residual plot (predictions vs. residuals) checks for non‑random patterns or heteroscedasticity, validating key assumptions of linear regression.

# Block 5: Residual plot
residuals = y_test - y_pred
plt.scatter(y_pred, residuals)
plt.axhline(0, linestyle="--")
plt.xlabel("Predicted Weekly Spend")
plt.ylabel("Residuals")
plt.title("Residuals vs. Predicted Spend")
plt.show()

Summary

By applying stepwise regression to digital and traditional media metrics, we isolate the most influential cost drivers—such as campaign impressions, clicks, and specific channel dummies—while pruning less informative predictors.

The resulting linear model achieves a strong balance between interpretability (clear coefficient estimates and p‑values) and predictive accuracy (high test‑set R², low RMSE).

Marketing teams can leverage these insights to forecast weekly media spend more precisely and optimise budget allocation across channels.

Did you know we work 24x7 to provide you best tutorials
Please encourage us - write a review on Google | Facebook