Customer Retention Cost Prediction with Lasso Regression in ML
FREE Online Courses: Click for Success, Learn for Free - Start Now!
Subscription businesses spend heavily on offers—such as cashbacks, plan discounts, and loyalty points—to prevent customers from churning. Yet, most teams cannot quantify exactly how much a new retention campaign should cost for a specific account. This project builds a Lasso‑regularised linear model that:
- Predicts the minimum incentive cost (USD) likely required to persuade a subscriber to stay, using their service usage, tenure, and payment behaviour.
- Identifies the small set of customer traits that truly drive retention spending, because Lasso’s ℓ1 penalty shrinks uninformative coefficients to zero.
The target variable (retention_cost) will be engineered from monthly charges, tenure, and churn propensity, yielding a dollar estimate that marketing can act upon.
Libraries Required
| Purpose | Library |
| Data handling | pandas, numpy |
| Visualisation | matplotlib, seaborn |
| ML pipeline | scikit‑learn → Lasso, Pipeline, ColumnTransformer, StandardScaler, OneHotEncoder, GridSearchCV |
| Evaluation | mean_squared_error, r2_score |
Dataset Link
Step-by-Step Code Implementation
1. Import Libraries
import pandas as pd import numpy as np import matplotlib.pyplot as plt import seaborn as sns from sklearn.compose import ColumnTransformer from sklearn.preprocessing import OneHotEncoder, StandardScaler from sklearn.model_selection import train_test_split, GridSearchCV from sklearn.pipeline import Pipeline from sklearn.linear_model import Lasso from sklearn.metrics import mean_squared_error, r2_score
2. Download and load the dataset
The dataset combines demographic, contract, usage, and billing data for 7,043 telecom subscribers.
# One‑time download (requires Kaggle API):
# kaggle datasets download -d blastchar/telco-customer-churn -p data --unzip
data = pd.read_csv("data/Telco-Customer-Churn.csv") # 7 043 rows, 21 columns
3. Create a “retention cost” target
Assumption: Retaining an at-risk customer usually requires ~20% of their monthly bill for each remaining month of a standard 5-year lifetime (60 months).
AVG_LIFETIME = 60 # months INCENTIVE_RATE = 0.20 # 20 % of monthly charges data['remaining_months'] = AVG_LIFETIME - data['tenure'] data['remaining_months'] = data['remaining_months'].clip(lower=0) data['retention_cost'] = data['MonthlyCharges'] * INCENTIVE_RATE * data['remaining_months']
4. Define features and target
We translate churn risk into a dollar amount—20% of the monthly bill for every month remaining in an assumed five-year relationship. The formula is adjustable for different industries.
y = data['retention_cost'] X = data.drop(columns=['retention_cost', 'customerID']) # drop identifier
5. Pre‑processing recipe
One‑hot encoding converts categorical variables (e.g., Contract, PaymentMethod) to numeric dummies; numeric columns (e.g., MonthlyCharges, tenure) are z‑scaled so Lasso’s penalty treats them equally.
cat_cols = X.select_dtypes('object').columns
num_cols = X.select_dtypes(exclude='object').columns
preprocess = ColumnTransformer([
('cat', OneHotEncoder(drop='first', sparse=False), cat_cols),
('num', StandardScaler(), num_cols)
])
6. Train/test split
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42, stratify=data['Churn'])
7. Build & tune Lasso pipeline
A log‑spaced α grid balances sparsity and fit. Five‑fold CV mitigates variance in the modest dataset.
pipe = Pipeline([
('prep', preprocess),
('model', Lasso(max_iter=10_000, random_state=42))
])
param_grid = {'model__alpha': np.logspace(-3, 1, 30)} # 0.001 → 10
search = GridSearchCV(pipe, param_grid, cv=5,
scoring='neg_root_mean_squared_error', n_jobs=-1)
search.fit(X_train, y_train)
print("Optimal α:", search.best_params_['model__alpha'])
8. Evaluate on hold‑out set
RMSE expresses the average incentive‑cost error in dollars, while R2R^2 shows variance explained.
y_pred = search.predict(X_test)
rmse = mean_squared_error(y_test, y_pred, squared=False)
r2 = r2_score(y_test, y_pred)
print(f"Test RMSE: ${rmse:,.0f} | R²: {r2:.3f}")
9. Interpret coefficients
Non-zero coefficients reveal levers, such as month-to-month contracts (often high cost) versus two-year contracts (low cost). Zeroed features can be dropped from future data collection to save ETL effort.
# Retrieve one‑hot column names
ohe = search.best_estimator_.named_steps['prep'].named_transformers_['cat']
ohe_names = ohe.get_feature_names_out(cat_cols)
feature_names = np.hstack([ohe_names, num_cols])
coefs = search.best_estimator_.named_steps['model'].coef_
importance = (pd.Series(coefs, index=feature_names)
.sort_values(key=abs, ascending=False))
plt.figure(figsize=(9,6))
importance.head(20).plot(kind='barh')
plt.title('Top Drivers of Retention Cost (Lasso Coefficients)')
plt.gca().invert_yaxis()
plt.xlabel('Coefficient (USD change)')
plt.show()
Summary
This Lasso-based pipeline converts raw churn data into a per-customer dollar estimate of retention spending and a ranked list of cost drivers. Marketing teams can rerun the notebook quarterly with fresh records, tweak the incentive formula, and immediately identify which customer segments require the highest budget—guiding more brilliant, data-backed retention campaigns.