Advertising Sales Impact Prediction using Linear Regression in ML

FREE Online Courses: Click for Success, Learn for Free - Start Now!

To promote product sales, companies invest in various advertising channels, including television and radio. However, finding how each ad medium contributes to sales can be challenging. In this project, we will build a linear regression model that predicts sales depending on advertising spend across various channels. By fitting a line (or hyperplane) to historical spend versus sales data, our model uncovers which channels have the most substantial impact and provides sales forecasts for future budgets.

Libraries Required

  • pandas: for data loading and manipulation
  • numpy: for numerical computations
  • matplotlib: for visualizing relationships and results
  • seaborn: for enhanced statistical plotting
  • scikit-learn: for building and evaluating the regression model

Dataset Link

Advertising Sales Dataset

Step by Step Code Implementation

1. Import Libraries

We load pandas and numpy for data handling, matplotlib and seaborn for informative visualizations, and scikit-learn for model training and evaluation.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score

2. Loading the Dataset

The CSV file includes three input columns—TV, Radio, and Online advertising budgets—and one output column, Sales. We read it into a DataFrame.

# 'advertising.csv' contains columns: 'TV', 'Radio', 'Online', and 'Sales'
data = pd.read_csv('advertising.csv')

3. Exploratory Data Analysis

  • Pairplot: Visualizes the relationship between each ad spend and sales, helping confirm linear trends.
  • Heatmap: Displays correlation coefficients, indicating which channels have the strongest relationship with sales.
# Show first few records
print(data.head())

# Pairplot to inspect relationships
sns.pairplot(data, x_vars=['TV', 'Radio', 'Online'], y_vars='Sales', height=4, aspect=1)
plt.suptitle('Advertising Spend vs Sales', y=1.02)
plt.show()

# Correlation heatmap
sns.heatmap(data.corr(), annot=True, fmt=".2f")
plt.title('Feature Correlations')
plt.show()

4. Preparing Features and Target

We assign the three ad-spend columns to X (features) and Sales to y (target).

X = data[['TV', 'Radio', 'Online']]  # Feature matrix: ad spends
y = data['Sales']                    # Target vector: sales volume

5. Train–Test Split

We hold out 20% of the data for testing to evaluate model generalization, using a fixed random_state for reproducibility.

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

6. Training the Linear Regression Model

LinearRegression learns coefficients for each advertising channel, quantifying dollars spent per unit increase in sales.

model = LinearRegression()
model.fit(X_train, y_train)

7. Making Predictions

The trained model estimates sales for the test set budgets via predict().

y_pred = model.predict(X_test)

8. Evaluating Model Performance

  • MSE measures average squared errors; lower values indicate better precision.
  • The R² Score reflects the proportion of sales variance explained by ad spend; a value close to 1.0 denotes strong predictive power.
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print(f"Mean Squared Error: {mse:.3f}")
print(f"R² Score: {r2:.3f}")

9. Visualizing Actual vs Predicted

A scatter plot of actual versus predicted sales, with a 45° reference line, reveals how closely predictions match real outcomes.

plt.figure(figsize=(6,6))
plt.scatter(y_test, y_pred)
plt.plot([y.min(), y.max()], [y.min(), y.max()], 'r--', linewidth=2)
plt.xlabel('Actual Sales')
plt.ylabel('Predicted Sales')
plt.title('Actual vs Predicted Sales')
plt.show()

Summary

This linear regression project illustrates how to quantify the impact of various advertising channels on sales. By analysing spend and revenue based on historical data, the model not only forecasts future sales but also shows which channel yields the highest return on investment.

Did you like our efforts? If Yes, please give ProjectGurukul 5 Stars on Google | Facebook

ProjectGurukul Team

ProjectGurukul Team specializes in creating project-based learning resources for programming, Java, Python, Android, AI, Webdevelopment and machine learning. Our mission is to help learners build practical skills through engaging, hands-on projects. We also offer free major and minor projects with source code for engineering students

Leave a Reply

Your email address will not be published. Required fields are marked *