Customer Churn Cost Prediction with Lasso Regression in ML

FREE Online Courses: Knowledge Awaits – Click for Free Access!

Telecom providers know who is likely to churn, but the monetary impact of each departing customer often remains vague. Without a dollar tag, retention budgets cannot be prioritised efficiently. This project builds a Lasso‑regularised linear model that:

Forecasts the potential revenue loss (USD) if a current subscriber churns tomorrow, using demographic, service‑usage, and billing attributes available today.
Zeroes out weak predictors via Lasso’s ℓ¹ penalty, surfacing the handful of levers that most influence churn cost and deserve proactive incentives.

Libraries Required

Purpose	Library
Data wrangling	pandas, numpy
Visualization	matplotlib, seaborn
ML workflow	scikit‑learn → ColumnTransformer, OneHotEncoder, StandardScaler, Pipeline, Lasso, GridSearchCV
Metrics	mean_squared_error, r2_score

Dataset Link

Telco Customer Churn

Step-by-Step Code Implementation

1. Import Libraries

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder, StandardScaler
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.pipeline import Pipeline
from sklearn.linear_model import Lasso
from sklearn.metrics import mean_squared_error, r2_score

2. Download and load the dataset

7,043 telecom subscribers with demographics, service bundles, billing info, and a churn flag.

# one‑time shell command (Kaggle API key required):
# kaggle datasets download -d blastchar/telco-customer-churn -p data --unzip

df = pd.read_csv("data/Telco-Customer-Churn.csv")     # 7 043 rows, 21 columns

3. Engineer the ‘churn cost’ target

Assumption: If a subscriber quits today, the operator loses 20% of their monthly bill for every month remaining in a typical three-year lifetime (36 months).

AVG_LIFETIME = 36           # months
INCENTIVE_RATE = 0.20       # 20 % of monthly charges

# Clean tenure (months already served)
df['tenure'] = pd.to_numeric(df['tenure'], errors='coerce').fillna(0).astype(int)
df['remaining_months'] = (AVG_LIFETIME - df['tenure']).clip(lower=0)

# Monetary impact *if* customer churns now
df['churn_cost'] = df['MonthlyCharges'] * INCENTIVE_RATE * df['remaining_months']

4. Define features and target

churn_cost estimates revenue at risk if a customer leaves today: MonthlyCharges × INCENTIVE_RATE × remaining_months. Rate (20%) and lifecycle length (36 months) are tunable business assumptions.

y = df['churn_cost']                          # continuous USD value
X = df.drop(columns=['churn_cost', 'customerID'])

5. Pre‑processing recipe

OneHotEncoder converts categorical variables, while StandardScaler normalises numerics. Encapsulating both inside a Pipeline prevents data leakage.

cat_cols = X.select_dtypes('object').columns            # e.g. gender, contract
num_cols = X.select_dtypes(exclude='object').columns    # tenure, charges …

preprocess = ColumnTransformer([
        ('cat', OneHotEncoder(drop='first', sparse=False), cat_cols),
        ('num', StandardScaler(), num_cols)
    ])

6. Train/test split

X_train, X_test, y_train, y_test = train_test_split(
        X, y, test_size=0.2, random_state=42, stratify=df['Churn'])

7. Build & tune Lasso pipeline

log‑spaced α search (0.001–10) balances sparsity and prediction error; five‑fold CV chooses the best trade‑off.

pipe = Pipeline([
        ('prep', preprocess),
        ('model', Lasso(max_iter=10_000, random_state=42))
    ])

param_grid = {'model__alpha': np.logspace(-3, 1, 25)}   # 0.001 → 10
search = GridSearchCV(pipe, param_grid, cv=5,
                      scoring='neg_root_mean_squared_error',
                      n_jobs=-1)
search.fit(X_train, y_train)

print("Optimal α:", search.best_params_['model__alpha'])

8. Evaluate on the hold‑out set

RMSE expresses average dollar error; R2R^2 shows variance explained.

y_pred = search.predict(X_test)
rmse = mean_squared_error(y_test, y_pred, squared=False)
r2   = r2_score(y_test, y_pred)

print(f"Test RMSE: ${rmse:,.0f} | R²: {r2:.3f}")

9. Interpret feature importance

non‑zero coefficients highlight high‑impact levers—e.g., “Contract = month‑to‑month” or high monthly charges. Zero coefficients flag features that, given others, do not influence lost revenue.

# Retrieve one‑hot column names
ohe = search.best_estimator_.named_steps['prep'].named_transformers_['cat']
ohe_names = ohe.get_feature_names_out(cat_cols)
feature_names = np.hstack([ohe_names, num_cols])

coefs = search.best_estimator_.named_steps['model'].coef_
imp = (pd.Series(coefs, index=feature_names)
         .sort_values(key=abs, ascending=False))

plt.figure(figsize=(9,6))
imp.head(20).plot(kind='barh')
plt.gca().invert_yaxis()
plt.title('Top Drivers of Churn Cost (Lasso Coefficients)')
plt.xlabel('Coefficient (Δ USD)')
plt.show()

Summary

This Lasso-based pipeline converts raw telecom data into a dollar-level forecast of churn impact and a ranked list of cost drivers. Retention teams can:

Prioritise expensive‑to‑lose customers for proactive offers.
Budget incentives based on expected ROI, not guesswork.
Refresh the model quarterly—thanks to the all-in-one Pipeline, a new fit is just one line of code.

By tying churn directly to money, the organisation moves from generic “save them all” tactics to precision‑guided retention economics.

Did we exceed your expectations?
If Yes, share your valuable feedback on Google | Facebook