Ad Revenue Growth Prediction with Polynomial Regression in ML

FREE Online Courses: Dive into Knowledge for Free. Learn More!

Digital marketing teams need to forecast next-period ad revenue (USD) based on planning-stage metrics—channel mix, spend, impressions, clicks, and creative complexity—before campaigns launch or budgets are locked in. The relationship between spend and revenue growth is nonlinear: at low spend levels, you get diminishing returns, while at high spend, you may hit saturation or require bid adjustments. A pure linear model underfits these curves, whereas an unregularised high‑degree polynomial overfits. By deploying Polynomial Regression (i.e., linear regression on polynomially expanded features) with ℓ² regularisation (Ridge), we can capture smooth nonlinear effects and deliver reliable, interpretable revenue forecasts to guide budget allocation.

Dataset

Online Advertising Digital Marketing Data

Step-by-Step Code Implementation

1. Libraries Required

import pandas as pd                             # data handling  
import numpy as np                              # numerical operations  

import matplotlib.pyplot as plt                 # plotting  
import seaborn as sns                           # visualization enhancements  

from sklearn.model_selection import train_test_split, GridSearchCV  
from sklearn.preprocessing import PolynomialFeatures, StandardScaler  
from sklearn.linear_model import Ridge  
from sklearn.pipeline import Pipeline  
from sklearn.metrics import mean_squared_error, r2_score

2. Load Data & Libraries

import pandas as pd
import numpy as np

# Load dataset
df = pd.read_csv("data/online_advertising_digital_marketing_data.csv")

# Engineer revenue target
df['Revenue'] = df['Approved_Conversion'] * 100
df['Revenue_Growth'] = df['Revenue'].pct_change().fillna(0)  # growth rate as target

3. Exploratory Analysis

import seaborn as sns
import matplotlib.pyplot as plt

# Inspect nonlinear patterns between Spend and Revenue_Growth
sns.scatterplot(x='Spend', y='Revenue_Growth', data=df, alpha=0.3)
plt.title('Spend vs Growth Rate')
plt.xlabel('Spend (USD)')
plt.ylabel('Revenue Growth (%)')
plt.show()

4. Define Features & Target

# Features at planning stage
X = df[['Spend','Impressions','Clicks']]
y = df['Revenue_Growth']

5. Build Pipeline with PolynomialFeatures

  • PolynomialFeatures expands raw spend, impressions, and clicks into their squares and interactions, capturing curved ROI dynamics (e.g., diminishing returns on spend).
  • StandardScaler normalises all inputs so the Ridge penalty treats them uniformly.
  • Ridge (ℓ²) regularisation shrinks noisy high‑order coefficients, preventing overfitting while preserving key nonlinear patterns.
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler, PolynomialFeatures
from sklearn.linear_model import Ridge

pipe = Pipeline([
    ('scale', StandardScaler()),
    ('poly', PolynomialFeatures(include_bias=False)),
    ('ridge', Ridge())
])

6. Train/Test Split & Hyperparameter Search

GridSearchCV tunes the polynomial degree (1–3) and Ridge α (10⁻³–10³) with 5‑fold CV, optimising for the lowest RMSE on growth‐rate prediction.

from sklearn.model_selection import train_test_split, GridSearchCV

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

param_grid = {
    'poly__degree': [1, 2, 3],
    'ridge__alpha': np.logspace(-3, 3, 7)
}

gs = GridSearchCV(
    pipe, param_grid,
    cv=5, scoring='neg_root_mean_squared_error',
    n_jobs=-1, verbose=1
)
gs.fit(X_train, y_train)
print("Best params:", gs.best_params_)

7. Evaluate Model

y_pred = gs.predict(X_test)
rmse = mean_squared_error(y_test, y_pred, squared=False)
r2   = r2_score(y_test, y_pred)

print(f"Test RMSE: {rmse:.4f}")
print(f"Test R²  : {r2:.3f}")

8. Inspect Key Polynomial Coefficients

Coefficient inspection reveals which terms—like Spend², Spend × Clicks, or Clicks²—most strongly drive predicted revenue growth, highlighting areas for spend reallocation or creative focus.

# Feature names
poly = gs.best_estimator_.named_steps['poly']
feat_names = poly.get_feature_names_out(input_features=['Spend','Impressions','Clicks'])

# Coefficients
coefs = gs.best_estimator_.named_steps['ridge'].coef_
coeff_df = pd.Series(coefs, index=feat_names).abs().sort_values(ascending=False)

# Plot top 10
coeff_df.head(10).plot(kind='barh', figsize=(8,5))
plt.gca().invert_yaxis()
plt.title('Top Polynomial Features for Revenue Growth')
plt.xlabel('|Coefficient|')
plt.tight_layout()
plt.show()

Summary

By integrating polynomial feature engineering with Ridge regularisation in a single pipeline, we deliver a model that:

  • Accurately predicts revenue growth rate from pre‐launch metrics (low RMSE, strong R²).
  • Captures nonlinear campaign dynamics—diminishing returns, synergy effects—without overfitting.
  • Provides interpretable insights: the most influential polynomial terms point to optimal spend levels and engagement metrics for maximizing return on ad investments.

Your 15 seconds will encourage us to work even harder
Please share your happy experience on Google | Facebook

ProjectGurukul Team

ProjectGurukul Team specializes in creating project-based learning resources for programming, Java, Python, Android, AI, Webdevelopment and machine learning. Our mission is to help learners build practical skills through engaging, hands-on projects. We also offer free major and minor projects with source code for engineering students

Leave a Reply

Your email address will not be published. Required fields are marked *