Advertising Sales Impact Prediction using Linear Regression in ML
FREE Online Courses: Click for Success, Learn for Free - Start Now!
To promote product sales, companies invest in various advertising channels, including television and radio. However, finding how each ad medium contributes to sales can be challenging. In this project, we will build a linear regression model that predicts sales depending on advertising spend across various channels. By fitting a line (or hyperplane) to historical spend versus sales data, our model uncovers which channels have the most substantial impact and provides sales forecasts for future budgets.
Libraries Required
- pandas: for data loading and manipulation
- numpy: for numerical computations
- matplotlib: for visualizing relationships and results
- seaborn: for enhanced statistical plotting
- scikit-learn: for building and evaluating the regression model
Dataset Link
Step by Step Code Implementation
1. Import Libraries
We load pandas and numpy for data handling, matplotlib and seaborn for informative visualizations, and scikit-learn for model training and evaluation.
import pandas as pd import numpy as np import matplotlib.pyplot as plt import seaborn as sns from sklearn.model_selection import train_test_split from sklearn.linear_model import LinearRegression from sklearn.metrics import mean_squared_error, r2_score
2. Loading the Dataset
The CSV file includes three input columns—TV, Radio, and Online advertising budgets—and one output column, Sales. We read it into a DataFrame.
# 'advertising.csv' contains columns: 'TV', 'Radio', 'Online', and 'Sales'
data = pd.read_csv('advertising.csv')
3. Exploratory Data Analysis
- Pairplot: Visualizes the relationship between each ad spend and sales, helping confirm linear trends.
- Heatmap: Displays correlation coefficients, indicating which channels have the strongest relationship with sales.
# Show first few records
print(data.head())
# Pairplot to inspect relationships
sns.pairplot(data, x_vars=['TV', 'Radio', 'Online'], y_vars='Sales', height=4, aspect=1)
plt.suptitle('Advertising Spend vs Sales', y=1.02)
plt.show()
# Correlation heatmap
sns.heatmap(data.corr(), annot=True, fmt=".2f")
plt.title('Feature Correlations')
plt.show()
4. Preparing Features and Target
We assign the three ad-spend columns to X (features) and Sales to y (target).
X = data[['TV', 'Radio', 'Online']] # Feature matrix: ad spends y = data['Sales'] # Target vector: sales volume
5. Train–Test Split
We hold out 20% of the data for testing to evaluate model generalization, using a fixed random_state for reproducibility.
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42
)
6. Training the Linear Regression Model
LinearRegression learns coefficients for each advertising channel, quantifying dollars spent per unit increase in sales.
model = LinearRegression() model.fit(X_train, y_train)
7. Making Predictions
The trained model estimates sales for the test set budgets via predict().
y_pred = model.predict(X_test)
8. Evaluating Model Performance
- MSE measures average squared errors; lower values indicate better precision.
- The R² Score reflects the proportion of sales variance explained by ad spend; a value close to 1.0 denotes strong predictive power.
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
print(f"Mean Squared Error: {mse:.3f}")
print(f"R² Score: {r2:.3f}")
9. Visualizing Actual vs Predicted
A scatter plot of actual versus predicted sales, with a 45° reference line, reveals how closely predictions match real outcomes.
plt.figure(figsize=(6,6))
plt.scatter(y_test, y_pred)
plt.plot([y.min(), y.max()], [y.min(), y.max()], 'r--', linewidth=2)
plt.xlabel('Actual Sales')
plt.ylabel('Predicted Sales')
plt.title('Actual vs Predicted Sales')
plt.show()
Summary
This linear regression project illustrates how to quantify the impact of various advertising channels on sales. By analysing spend and revenue based on historical data, the model not only forecasts future sales but also shows which channel yields the highest return on investment.