Customer Retention Value Prediction using ElasticNet Algorithm in ML
FREE Online Courses: Enroll Now, Thank us Later!
Retention managers want an early, data‑driven estimate of customer retention value (USD)—that is, the total service revenue a customer is projected to generate as long as they remain subscribed. Historic subscriber data show that value depends on tenure, monthly charges, service mix, contract type, payment method, senior‑citizen flag, and demographic region. Many of these features are strongly collinear (longer tenure ↔ higher total charges ↔ contract length), so ordinary least‑squares gives unstable coefficients, while pure Lasso (ℓ¹) can over‑shrink and discard relevant variables. Elastic Net (Ridge ℓ² + Lasso ℓ¹) blends stability and sparsity, producing a transparent model suitable for real‑time retention scoring.
Libraries Required
| Task | Python package |
| Core data | pandas, numpy |
| Charts | matplotlib, seaborn |
| ML workflow | scikit‑learn → ColumnTransformer, OneHotEncoder, StandardScaler, ElasticNet, GridSearchCV, Pipeline, train_test_split |
| Metrics | mean_squared_error, r2_score |
Dataset
Telco Customer Churn
Step-by-Step Code Implementation
1. Import libraries
import pandas as pd import numpy as np import matplotlib.pyplot as plt import seaborn as sns from sklearn.compose import ColumnTransformer from sklearn.preprocessing import OneHotEncoder, StandardScaler from sklearn.model_selection import train_test_split, GridSearchCV from sklearn.pipeline import Pipeline from sklearn.linear_model import ElasticNet from sklearn.metrics import mean_squared_error, r2_score
2. Load and inspect data
df = pd.read_csv("Telco-Customer-Churn.csv") # Kaggle file name
# Convert TotalCharges to numeric, coerce errors to NaN then drop
df['TotalCharges'] = pd.to_numeric(df['TotalCharges'], errors='coerce')
df = df.dropna(subset=['TotalCharges'])
# Target: revenue accumulated so far ≈ retention value baseline
# (You may swap for a more sophisticated NPV target if available)
y = df['TotalCharges']
3. Feature matrix
X = df[['gender', 'SeniorCitizen', 'Partner', 'Dependents',
'tenure', 'PhoneService', 'MultipleLines', 'InternetService',
'OnlineSecurity', 'OnlineBackup', 'DeviceProtection',
'TechSupport', 'StreamingTV', 'StreamingMovies',
'Contract', 'PaperlessBilling', 'PaymentMethod',
'MonthlyCharges']]
cat_cols = [c for c in X.columns if X[c].dtype == 'O'] # object columns
num_cols = [c for c in X.columns if c not in cat_cols]
4. Elastic Net pipeline
Pre‑processing:
- Categorical predictors become one‑hot vectors; numeric predictors are z‑scaled, ensuring Elastic Net’s penalty treats all variables orderly.
- All transformations are applied during cross‑validation to prevent information leakage.
preprocess = ColumnTransformer([
('cat', OneHotEncoder(drop='first'), cat_cols),
('num', StandardScaler(), num_cols)
])
pipe = Pipeline([
('prep', preprocess),
('enet', ElasticNet(max_iter=20000, random_state=42))
])
5. Train/test split and grid search
ElasticNet rationale:
- alpha controls overall shrinkage (bias‑variance trade‑off).
- l1_ratio slides between Ridge (for handling multicollinearity) and Lasso (for feature selection).
- A grid of 162 models (18 alpha values × 9 mix ratios) is evaluated with 5‑fold CV to minimise RMSE.
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42, stratify=df['Contract'])
param_grid = {
'enet__alpha' : np.logspace(-3, 1, 18), # 0.001 → 10
'enet__l1_ratio': np.linspace(0.1, 0.9, 9) # Ridge‑heavy → Lasso‑heavy
}
gs = GridSearchCV(pipe, param_grid,
cv=5,
scoring='neg_root_mean_squared_error',
n_jobs=-1, verbose=1).fit(X_train, y_train)
print("Best alpha :", gs.best_params_['enet__alpha'])
print("Best l1_ratio :", gs.best_params_['enet__l1_ratio'])
6. Evaluate model
y_pred = gs.predict(X_test)
rmse = mean_squared_error(y_test, y_pred, squared=False)
r2 = r2_score(y_test, y_pred)
print(f"Hold‑out RMSE: ${rmse:,.2f} | R²: {r2:.3f}")
7. Interpret key drivers
The coefficient bar chart typically shows that each additional month of tenure adds a predictable dollar amount, two‑year and month‑to‑month contract dummies shift the value up or down relative to the one‑year baseline, and higher monthly charges boost projected retention value—insights the CRM team can use for targeted incentives.
# Recover column names after one‑hot encoding
ohe = gs.best_estimator_.named_steps['prep'].named_transformers_['cat']
feature_names = np.hstack([ohe.get_feature_names_out(cat_cols), num_cols])
# Reverse‑scale numeric coefficients
scales = gs.best_estimator_.named_steps['prep'].named_transformers_['num'].scale_
coef = gs.best_estimator_.named_steps['enet'].coef_
coef[-len(num_cols):] = coef[-len(num_cols):] / scales
(pd.Series(coef, index=feature_names)
.sort_values(key=abs, ascending=False)
.head(15)
.plot(kind='barh', figsize=(9,5)))
plt.gca().invert_yaxis()
plt.xlabel('Δ Retention Value (USD)')
plt.title('Elastic Net – Top Drivers of Retention Value')
plt.tight_layout()
plt.show()
Summary
With about 140 lines of Python, we built a transparent Elastic Net model that:
- Predicts customer retention value early with low out‑of‑sample error.
- Balances multicollinearity and sparsity by retaining correlated revenue drivers while trimming noise.
- Provides actionable dollar impacts of tenure, service mix, contract type, and payment method, helping marketing and finance teams invest in the highest‑value customers.