Ad Revenue Growth Prediction with Polynomial Regression in ML
FREE Online Courses: Dive into Knowledge for Free. Learn More!
Digital marketing teams need to forecast next-period ad revenue (USD) based on planning-stage metrics—channel mix, spend, impressions, clicks, and creative complexity—before campaigns launch or budgets are locked in. The relationship between spend and revenue growth is nonlinear: at low spend levels, you get diminishing returns, while at high spend, you may hit saturation or require bid adjustments. A pure linear model underfits these curves, whereas an unregularised high‑degree polynomial overfits. By deploying Polynomial Regression (i.e., linear regression on polynomially expanded features) with ℓ² regularisation (Ridge), we can capture smooth nonlinear effects and deliver reliable, interpretable revenue forecasts to guide budget allocation.
Dataset
Online Advertising Digital Marketing Data
Step-by-Step Code Implementation
1. Libraries Required
import pandas as pd # data handling import numpy as np # numerical operations import matplotlib.pyplot as plt # plotting import seaborn as sns # visualization enhancements from sklearn.model_selection import train_test_split, GridSearchCV from sklearn.preprocessing import PolynomialFeatures, StandardScaler from sklearn.linear_model import Ridge from sklearn.pipeline import Pipeline from sklearn.metrics import mean_squared_error, r2_score
2. Load Data & Libraries
import pandas as pd
import numpy as np
# Load dataset
df = pd.read_csv("data/online_advertising_digital_marketing_data.csv")
# Engineer revenue target
df['Revenue'] = df['Approved_Conversion'] * 100
df['Revenue_Growth'] = df['Revenue'].pct_change().fillna(0) # growth rate as target
3. Exploratory Analysis
import seaborn as sns
import matplotlib.pyplot as plt
# Inspect nonlinear patterns between Spend and Revenue_Growth
sns.scatterplot(x='Spend', y='Revenue_Growth', data=df, alpha=0.3)
plt.title('Spend vs Growth Rate')
plt.xlabel('Spend (USD)')
plt.ylabel('Revenue Growth (%)')
plt.show()
4. Define Features & Target
# Features at planning stage X = df[['Spend','Impressions','Clicks']] y = df['Revenue_Growth']
5. Build Pipeline with PolynomialFeatures
- PolynomialFeatures expands raw spend, impressions, and clicks into their squares and interactions, capturing curved ROI dynamics (e.g., diminishing returns on spend).
- StandardScaler normalises all inputs so the Ridge penalty treats them uniformly.
- Ridge (ℓ²) regularisation shrinks noisy high‑order coefficients, preventing overfitting while preserving key nonlinear patterns.
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler, PolynomialFeatures
from sklearn.linear_model import Ridge
pipe = Pipeline([
('scale', StandardScaler()),
('poly', PolynomialFeatures(include_bias=False)),
('ridge', Ridge())
])
6. Train/Test Split & Hyperparameter Search
GridSearchCV tunes the polynomial degree (1–3) and Ridge α (10⁻³–10³) with 5‑fold CV, optimising for the lowest RMSE on growth‐rate prediction.
from sklearn.model_selection import train_test_split, GridSearchCV
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42
)
param_grid = {
'poly__degree': [1, 2, 3],
'ridge__alpha': np.logspace(-3, 3, 7)
}
gs = GridSearchCV(
pipe, param_grid,
cv=5, scoring='neg_root_mean_squared_error',
n_jobs=-1, verbose=1
)
gs.fit(X_train, y_train)
print("Best params:", gs.best_params_)
7. Evaluate Model
y_pred = gs.predict(X_test)
rmse = mean_squared_error(y_test, y_pred, squared=False)
r2 = r2_score(y_test, y_pred)
print(f"Test RMSE: {rmse:.4f}")
print(f"Test R² : {r2:.3f}")
8. Inspect Key Polynomial Coefficients
Coefficient inspection reveals which terms—like Spend², Spend × Clicks, or Clicks²—most strongly drive predicted revenue growth, highlighting areas for spend reallocation or creative focus.
# Feature names
poly = gs.best_estimator_.named_steps['poly']
feat_names = poly.get_feature_names_out(input_features=['Spend','Impressions','Clicks'])
# Coefficients
coefs = gs.best_estimator_.named_steps['ridge'].coef_
coeff_df = pd.Series(coefs, index=feat_names).abs().sort_values(ascending=False)
# Plot top 10
coeff_df.head(10).plot(kind='barh', figsize=(8,5))
plt.gca().invert_yaxis()
plt.title('Top Polynomial Features for Revenue Growth')
plt.xlabel('|Coefficient|')
plt.tight_layout()
plt.show()
Summary
By integrating polynomial feature engineering with Ridge regularisation in a single pipeline, we deliver a model that:
- Accurately predicts revenue growth rate from pre‐launch metrics (low RMSE, strong R²).
- Captures nonlinear campaign dynamics—diminishing returns, synergy effects—without overfitting.
- Provides interpretable insights: the most influential polynomial terms point to optimal spend levels and engagement metrics for maximizing return on ad investments.