Retail Promotion Cost Prediction using Stepwise Regression in ML

FREE Online Courses: Enroll Now, Thank us Later!

Retailers run promotions (discounts, coupons, multi‑buy offers) to boost sales, but these campaigns incur costs—both direct (rebate value, printing/distribution) and indirect (increased handling, spoilage). In this project, we will predict the total promotion cost for individual campaigns based on features such as promotional type, discount rate, expected uplift in units sold, product category, store location, and campaign duration.

By applying stepwise regression, we’ll isolate the most influential cost drivers and build an interpretable linear model that balances simplicity with predictive performance—helping marketing teams budget promotions more accurately and maximise ROI.

Libraries Required

import pandas as pd               # Data loading & manipulation  
import numpy as np                # Numerical operations  
import statsmodels.api as sm      # Ordinary Least Squares regression  
from sklearn.model_selection import train_test_split   # Train/test split  
from sklearn.metrics import r2_score, mean_squared_error  # Evaluation metrics  
import matplotlib.pyplot as plt   # Visualization

Dataset

Cost Prediction for Acquiring Customers

Step-by-Step Code Implementation

Data Loading & Initial Inspection

We load a Food Mart promotions dataset containing ~60,000 campaigns, each with product category, store, promotion type, discount rate, expected sales uplift, duration, and observed cost.

# Block 1: Load dataset
# Media Campaign Cost Prediction – Food Mart (60K campaigns) :contentReference[oaicite:0]{index=0}
url = "https://www.kaggle.com/datasets/ramjasmaurya/medias-cost-prediction-in-foodmart/download"
df = pd.read_csv(url)

print(df.head())
print(df.info())
print(df.describe())

Data Preprocessing

Rows missing any core field are removed. We one‑hot encode categorical predictors (Product_Category, Promo_Type, Store_ID) to prepare for regression. We separate predictors (X) from the target cost (y) and perform an 80/20 train/test split.

# Block 2: Clean & encode
# Assume columns include: 'Campaign_ID', 'Product_Category', 'Store_ID',
# 'Promo_Type', 'Discount_Rate', 'Expected_Uplift', 'Duration_Days', 'Cost_USD'

# Drop any rows with missing critical values
df = df.dropna(subset=[
    'Product_Category','Store_ID','Promo_Type',
    'Discount_Rate','Expected_Uplift','Duration_Days','Cost_USD'
])

# One‑hot encode categorical columns
df_enc = pd.get_dummies(df,
                        columns=['Product_Category','Promo_Type','Store_ID'],
                        drop_first=True)

# Define predictors and target
X = df_enc.drop(['Campaign_ID','Cost_USD'], axis=1)
y = df_enc['Cost_USD']

# Split into training and testing sets (80% train / 20% test)
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

Stepwise Regression Function

The stepwise_selection function alternates forward inclusion (adding predictors with p < 0.01) and backward elimination (dropping predictors with p > 0.05) until no further changes, yielding a concise set of statistically significant features.

# Block 3: Forward–backward stepwise selection
def stepwise_selection(X, y,
                       initial_list=[],
                       threshold_in=0.01,
                       threshold_out=0.05,
                       verbose=True):
    included = list(initial_list)
    while True:
        changed = False

        # Forward step: test each excluded predictor
        excluded = list(set(X.columns) - set(included))
        new_pvals = pd.Series(index=excluded, dtype=float)
        for col in excluded:
            model = sm.OLS(y, sm.add_constant(X[included + [col]])).fit()
            new_pvals[col] = model.pvalues[col]
        best_pval = new_pvals.min()
        if best_pval < threshold_in:
            best_var = new_pvals.idxmin()
            included.append(best_var)
            changed = True
            if verbose:
                print(f"Add  {best_var:25} p-value {best_pval:.4f}")

        # Backward step: test each included predictor
        model = sm.OLS(y, sm.add_constant(X[included])).fit()
        pvals = model.pvalues.iloc[1:]  # exclude intercept
        worst_pval = pvals.max()
        if worst_pval > threshold_out:
            worst_var = pvals.idxmax()
            included.remove(worst_var)
            changed = True
            if verbose:
                print(f"Drop {worst_var:25} p-value {worst_pval:.4f}")

        if not changed:
            break

    return included

Model Building & Evaluation

Using the selected features, we fit an Ordinary Least Squares regression via statsmodels. The .summary() output provides coefficient estimates (cost impact per unit change), p -values (significance), R², and diagnostic statistics (AIC, F‑statistic), enabling interpretation of each driver’s effect on promotion cost.
Predictions on unseen test data yield R² (explained variance) and RMSE (average error magnitude), quantifying model generalisation.

# Block 4: Perform stepwise feature selection
selected_features = stepwise_selection(X_train, y_train)

# Fit the final OLS model
X_train_sel = sm.add_constant(X_train[selected_features])
model = sm.OLS(y_train, X_train_sel).fit()
print(model.summary())

# Predict on test set
X_test_sel = sm.add_constant(X_test[selected_features])
y_pred = model.predict(X_test_sel)

# Compute performance metrics
print("Test R²:", r2_score(y_test, y_pred))
print("Test RMSE:", np.sqrt(mean_squared_error(y_test, y_pred)))

Residual Diagnostics

A scatter plot of residuals vs. predicted costs checks for patterns or heteroscedasticity—key OLS assumptions—ensuring model validity.

# Block 5: Plot residuals to check assumptions
residuals = y_test - y_pred
plt.scatter(y_pred, residuals, alpha=0.5)
plt.axhline(0, linestyle="--")
plt.xlabel("Predicted Promotion Cost (USD)")
plt.ylabel("Residuals")
plt.title("Residuals vs. Predicted Cost")
plt.show()

Summary

Applying stepwise regression to retail promotion data isolates the major cost drivers—such as discount rate, expected uplift, campaign duration, and specific promo/store categories—while pruning redundant variables. The resulting linear model strikes a strong balance between interpretability (precise coefficient estimates and p-values) and predictive accuracy (high test‑set R², low RMSE), equipping retail marketers with a transparent tool to forecast promotion costs and optimise campaign planning.

Did we exceed your expectations?
If Yes, share your valuable feedback on Google | Facebook