Ad Engagement Trend Prediction using Polynomial regression in ML

FREE Online Courses: Click, Learn, Succeed, Start Now!

Digital marketers and campaign managers need to forecast week‑over‑week ad engagement growth (%)—measured by click‑through rate (CTR)—using only early‑week indicators such as previous‑week CTR, impressions, average ad position, and budget pacing, before mid‑week optimisations. Engagement curves are nonlinear—for example, diminishing returns on additional impressions, threshold effects of ad position, and synergy between budget spend and ad quality. A simple linear model underfits these dynamics, while an unregularised polynomial model overfits noise in short‑term fluctuations.

Libraries Required

import pandas as pd                              # data loading & handling  
import numpy as np                               # numerical operations  

import matplotlib.pyplot as plt                  # plotting  
import seaborn as sns                            # enhanced visualization  

from sklearn.model_selection import train_test_split, GridSearchCV  
from sklearn.preprocessing import StandardScaler, PolynomialFeatures  
from sklearn.linear_model import Ridge  
from sklearn.pipeline import Pipeline  
from sklearn.metrics import mean_squared_error, r2_score  

Dataset

CTR In Advertisement

Step-by-Step Code Implementation

Load Data & Libraries

import pandas as pd
import numpy as np

# Load anonymized ad logs
df = pd.read_csv("data/advertising.csv")

# Preview key columns
df.head()[[
    'week','impressions','clicks','avg_position','cost','device_type'
]]

Target Engineering & Feature Creation

1. ctr_prev: inertia in engagement.

2. impressions: exposure scale.

3. avg_position: quality of placement (lower is better).

4. cost: spend, reflecting bid aggressiveness.

5. CTR growth target: percentage change in click‑through rate (CTR) from the previous week, isolating momentum.

# Compute CTR and week-over-week growth
df['ctr'] = df['clicks'] / df['impressions']
df = df.sort_values(['device_type','week'])

# Lag previous-week CTR per device
df['ctr_prev'] = df.groupby('device_type')['ctr'].shift(1)
df['ctr_growth_pct'] = (df['ctr'] - df['ctr_prev']) / df['ctr_prev'] * 100

# Drop first-week rows without lag
df = df.dropna(subset=['ctr_growth_pct'])

# Select features and target
X = df[['ctr_prev','impressions','avg_position','cost']]
y = df['ctr_growth_pct']

Build Polynomial Regression Pipeline

1. Standard Scaler zero‑means and unit‑scales features, so Ridge treats all terms equally.

2. Polynomial Features expands to squares and interactions (e.g., ctr_prev², ctr_prev×avg_position), capturing nonlinear saturation and synergistic effects.

3. Ridge regression applies ℓ² penalty (alpha) to shrink high‑order coefficients, controlling overfitting in the enlarged feature space.

from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler, PolynomialFeatures
from sklearn.linear_model import Ridge

pipe = Pipeline([
    ('scale', StandardScaler()),            # normalize scales
    ('poly', PolynomialFeatures(include_bias=False)),
    ('ridge', Ridge(random_state=42))
])

Train/Test Split & Hyperparameter Search

Hyperparameter tuning: grid‑search over polynomial degree (1–3) and regularisation strength α (10⁻³…10³) via 5‑fold CV, optimising for lowest RMSE on growth forecasts.

from sklearn.model_selection import train_test_split, GridSearchCV

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, shuffle=False
)

param_grid = {
    'poly__degree': [1, 2, 3],
    'ridge__alpha': np.logspace(-3, 3, 7)
}

gs = GridSearchCV(
    pipe, param_grid,
    cv=5,
    scoring='neg_root_mean_squared_error',
    n_jobs=-1, verbose=1
)
gs.fit(X_train, y_train)

print("Best polynomial degree:", gs.best_params_['poly__degree'])
print("Best Ridge α          :", gs.best_params_['ridge__alpha'])

Evaluate Model

from sklearn.metrics import mean_squared_error, r2_score

y_pred = gs.predict(X_test)
rmse = mean_squared_error(y_test, y_pred, squared=False)
r2   = r2_score(y_test, y_pred)

print(f"Test RMSE: {rmse:.2f}% growth")
print(f"Test R²  : {r2:.3f}")

Inspect Key Polynomial Coefficients

Coefficient inspection: ranking absolute coefficients reveals which nonlinear and cross terms—such as squared prior CTR or interaction of position and spend—most drive predicted growth, offering interpretable levers for campaign adjustment.

# Retrieve expanded feature names
poly       = gs.best_estimator_.named_steps['poly']
feat_names = poly.get_feature_names_out(input_features=X.columns)
coefs      = gs.best_estimator_.named_steps['ridge'].coef_

import pandas as pd
import matplotlib.pyplot as plt

coef_series = pd.Series(coefs, index=feat_names).abs() \
                    .sort_values(ascending=False)

plt.figure(figsize=(8,5))
coef_series.head(10).plot(kind='barh')
plt.gca().invert_yaxis()
plt.title("Top Polynomial Features Driving CTR Growth")
plt.xlabel("Coefficient Magnitude")
plt.tight_layout()
plt.show()

Summary

By integrating polynomial feature engineering with Ridge regularisation, this workflow provides:

1. Accurate nonlinear forecasts of ad engagement growth (low RMSE, strong R²).

2. Controlled model complexity, avoiding spurious high‑order effects through α tuning.

3. Actionable insights, with top polynomial features highlighting which engagement inertia, volume, position, and spend interactions most influence CTR growth—enabling data‑driven, timely campaign optimisations.

Did you like our efforts? If Yes, please give ProjectGurukul 5 Stars on Google | Facebook

ProjectGurukul Team

The ProjectGurukul Team delivers project-based tutorials on programming, machine learning, and web development. We simplify learning by providing hands-on projects to help you master real-world skills. We also provide free major and minor projects for enginering students.

Leave a Reply

Your email address will not be published. Required fields are marked *