Ad Placement Cost Prediction with Lasso Regression in ML

FREE Online Courses: Your Passport to Excellence - Start Now

Media buyers juggle dozens of variables—audience traits, creative formats, bid strategy—yet they seldom know beforehand what a single placement on a major ad network will cost. This project builds a Lasso‑regularised linear model that:

  • Predicts the placement cost (USD) for a planned ad impression bundle using features such as ad type, campaign objective, target age‑band, gender, industry vertical, day‑part, device, and estimated reach.
  • Isolates the handful of drivers that truly inflate or deflate cost, because Lasso’s ℓ¹ penalty shrinks uninformative coefficients to zero—giving planners an immediate, interpretable shortlist for budget optimisation.

Libraries Required

Purpose Library
Data wrangling pandas, numpy
Visualisation matplotlib, seaborn
ML workflow scikit‑learn (Lasso, Pipeline, ColumnTransformer, StandardScaler, OneHotEncoder, GridSearchCV)
Evaluation mean_squared_error, r2_score

Dataset Link

Online Advertising Digital Marketing

Step-by-Step Code Implementation

1. Import Libraries

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder, StandardScaler
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.pipeline import Pipeline
from sklearn.linear_model import Lasso
from sklearn.metrics import mean_squared_error, r2_score

2. Download and load the dataset

Dataset logs three months of performance data for an e‑commerce brand, including audience parameters, creative type, bid strategy, impressions, clicks, conversions, and spend. The last field becomes our direct placement cost target.

# One–time command (needs Kaggle API token):
# kaggle datasets download -d naniruddhan/online-advertising-digital-marketing-data -p data --unzip

df = pd.read_csv("data/online_ad_performance.csv")   # adjust filename if required

3. Target engineering

Assume the dataset records Spend and Impressions; the price of placing the ad block is Spend.

y = df['Spend']                          # placement cost (USD)
X = df.drop(columns=['Spend', 'Campaign_ID'])  # drop leakage / ID

4. Pre‑processing pipeline

Categorical columns such as Age_Band, Gender, and Ad_Type are one‑hot encoded (dropping the first level to avoid dummy traps); numeric columns like Impressions, Clicks, and CTR are z‑scaled so the Lasso penalty treats every feature on equal footing.

cat_cols = X.select_dtypes(include='object').columns
num_cols = X.select_dtypes(exclude='object').columns

preprocess = ColumnTransformer([
        ('cat', OneHotEncoder(drop='first', sparse=False), cat_cols),
        ('num', StandardScaler(), num_cols)
    ])

5. Train/test split

X_train, X_test, y_train, y_test = train_test_split(
        X, y, test_size=0.2, random_state=42, stratify=df['Ad_Type'])

6. Build and tune the Lasso pipeline

Wrapping scaling and modelling in a single Pipeline prevents data leakage between folds. A log‑spaced α sweep finds the optimal balance between sparsity and fit, using five‑fold CV for robustness.

pipe = Pipeline([
        ('prep', preprocess),
        ('model', Lasso(max_iter=10_000, random_state=42))
    ])

param_grid = {'model__alpha': np.logspace(-3, 1, 30)}  # 0.001 → 10
grid = GridSearchCV(pipe, param_grid, cv=5,
                    scoring='neg_root_mean_squared_error', n_jobs=-1)
grid.fit(X_train, y_train)

print("Best α:", grid.best_params_['model__alpha'])

7. Evaluate model

RMSE communicates the average dollar error, easily grasped by media buyers, while R² reveals the variance explained.

y_pred = grid.predict(X_test)
rmse = mean_squared_error(y_test, y_pred, squared=False)
r2   = r2_score(y_test, y_pred)
print(f"Test RMSE: ${rmse:,.2f} | R²: {r2:.3f}")

8. Interpret coefficients

The coefficient bar plot shows which knobs—e.g., mobile‑only placement, late‑night day‑part, broad interest targeting—inflate cost the most. Zeroed coefficients suggest a negligible impact, trimming the analyst’s focus.

ohe = grid.best_estimator_.named_steps['prep'].named_transformers_['cat']
ohe_names = ohe.get_feature_names_out(cat_cols)
feature_names = np.concatenate([ohe_names, num_cols])

coef = grid.best_estimator_.named_steps['model'].coef_
imp = (pd.Series(coef, index=feature_names)
         .sort_values(key=abs, ascending=False))

plt.figure(figsize=(9,6))
imp.head(20).plot(kind='barh')
plt.gca().invert_yaxis()
plt.title('Top Drivers of Ad Placement Cost (Lasso Coefficients)')
plt.xlabel('Coefficient (Δ USD)')
plt.show()

Summary

In under 120 lines of code, we produced an interpretable, cross‑validated pipeline that forecasts ad placement costs before a campaign goes live and delivers a ranked list of cost drivers. Media planners can plug hypothetical settings into the model, compare projected spend scenarios, and confidently allocate the budget toward the most efficient placements—turning ad buying from guesswork into a data-driven strategy.

Your 15 seconds will encourage us to work even harder
Please share your happy experience on Google | Facebook

ProjectGurukul Team

ProjectGurukul Team specializes in creating project-based learning resources for programming, Java, Python, Android, AI, Webdevelopment and machine learning. Our mission is to help learners build practical skills through engaging, hands-on projects. We also offer free major and minor projects with source code for engineering students

1 Response

Leave a Reply

Your email address will not be published. Required fields are marked *