Customer Spending Prediction using Linear Regression in ML

FREE Online Courses: Dive into Knowledge for Free. Learn More!

E‑commerce companies log everything—from a visitor’s session length to the number of minutes they spend browsing the mobile app. Converting those behavioural breadcrumbs into a dollar‑value forecast of annual spend lets marketers target discounts wisely, finance teams budget revenue more realistically, and product managers decide whether to invest in mobile or web features.

In this project, we build a linear regression baseline that predicts a customer’s Yearly Amount Spent (USD) from four readily available engagement metrics: average session length, time on app, time on website, and length of membership.

Libraries Required

pandas # data wrangling
numpy # numerical helpers
matplotlib.pyplot# sanity‑check visuals
seaborn # quick pair‑plots / heatmaps (optional)
scikit‑learn # model building & evaluation
joblib # save the trained pipeline

Dataset Link

E-commerce Customer Data

Step-by-Step Code Implementation

Why linear regression? For marketing teams, a transparent line with four coefficients beats a black‑box forest when explaining why the model recommends a mobile‑channel push.

1. Import core libraries

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score, mean_absolute_error
import joblib

2. Load the data

Download the CSV file from Kaggle and point to the local path:

df = pd.read_csv("Ecommerce Customers.csv")
print(df.head())

3. Pair‑plot first

A quick scatter‑matrix highlights prominent linear trends (Time on App usually shows the strongest slope).

# Visual feel for relationships
sns.pairplot(df[['Avg. Session Length','Time on App',
                 'Time on Website','Length of Membership',
                 'Yearly Amount Spent']])
plt.show()

print(df.describe())

4.  Feature matrix & target vector

features = ['Avg. Session Length',
            'Time on App',
            'Time on Website',
            'Length of Membership']

X = df[features]	
y = df['Yearly Amount Spent']

5. Train–test split

Train–test split ensures we judge performance on unseen customers—critical when generalising to next quarter’s cohort.

X_train, X_test, y_train, y_test = train_test_split(
        X, y, test_size=0.2, random_state=42)

6. Model training

linreg = LinearRegression()
linreg.fit(X_train, y_train)

7. Metrics chosen

R² (goodness of fit) tells us what fraction of the variance our four metrics explain.
MAE (absolute dollar error) gives a wallet‑level feel—if MAE ≈ $300, finance knows typical forecasts are within $300 of reality.

y_pred = linreg.predict(X_test)
print(f"R²  : {r2_score(y_test, y_pred):.3f}")
print(f"MAE : ${mean_absolute_error(y_test, y_pred):,.2f}")

8.  Interpret coefficients

The coefficient table instantly ranks levers. If “Time on App” dominates, the product can justify investing in mobile UX.

coef_df = pd.DataFrame({
    'feature': features,
    'coefficient': linreg.coef_
}).sort_values('coefficient', ascending=False)

print(coef_df)

A positive coefficient means every extra unit of that feature nudges spending upward; a negative coefficient would imply the opposite.

9. Model persistence

Model persistence with joblib lets you drop the .pkl file into a Flask API or nightly batch job without retraining.

joblib.dump(linreg, "customer_spending_linreg.pkl")

Summary

In under fifty lines of Python, we transformed raw engagement logs into an actionable spending‑prediction tool. The linear regression delivers two wins: (1) a numeric forecast every time marketing uploads fresh behaviour metrics, and (2) easy‑to‑explain coefficients that spotlight high‑ROI levers. Keep this interpretable baseline as your benchmark; when you later explore regularised regressors or gradient boosting, you’ll know exactly how much extra accuracy the added complexity buys.

Did we exceed your expectations?
If Yes, share your valuable feedback on Google | Facebook