Machine Learning

Customer Churn Prediction Model

Personal Project

I built a machine learning model to predict customer churn with 81% accuracy. The model identifies at-risk customers before they cancel and reveals actionable insights for retention strategy.

Python scikit-learn Logistic Regression pandas Google Colab Claude AI
81% Accuracy
57% At-Risk Identified
The Problem

Customer churn costs subscription businesses 20-30% annually. Acquiring new customers is 5-25x more expensive than retaining existing ones. Without prediction, retention efforts are reactive and unfocused.

Why ML

Rule-based approaches miss complex patterns. Machine learning can identify which combination of factors predict churn, enabling targeted interventions before customers leave.

Key Findings

1
Short Tenure = Highest Risk
New customers (less than 6 months) are most likely to churn. Each additional month significantly decreases churn probability.
2
Month-to-Month Contracts
Customers without annual commitments show dramatically higher churn. Two-year contracts reduce churn risk by 42%.
3
Service Bundling Increases Stickiness
Customers with fewer services more likely to churn. Streaming services and multiple lines increase retention.
4
Premium Service Paradox
Fiber optic customers churn more despite higher prices. Premium customers need differentiated retention strategy.
Feature Importance Top 10 feature importance chart showing tenure, monthly charges, and contract type as strongest predictors
Confusion Matrix Confusion matrix showing model predictions: 1023 true negatives, 213 true positives, 160 false positives, 163 false negatives

Business Impact

Metric Value Implication
At-Risk Customers Identified 212 Proactive retention possible before cancellation
Annual Revenue at Risk $127,200 Based on $50/month average customer value
Retention vs Acquisition Cost 10-25x $10-20 retention campaign vs $200-500 acquisition

Technical Approach

Component Technology Rationale
Model Logistic Regression Chose interpretability over marginal accuracy gains. Stakeholders need to understand which factors drive churn to take action.
Data Processing pandas, StandardScaler One-hot encoding, feature scaling for normalized inputs
Evaluation scikit-learn Precision, recall, F1-score, confusion matrix
Environment Google Colab Jupyter notebook for interactive development
Development Claude AI Technical collaboration for rapid implementation

From Insight to Action: Businesses can address potential churn proactively based on machine learning model's identification of which combination of factors predict churn, enabling targeted interventions before customers leave.

  • Implement 90-day onboarding for new customers (save ~$200K annually)
  • Offer incentives to convert month-to-month to annual contracts (reduce churn from 27% to 22%)
  • Cross-sell streaming services to increase customer lifetime value by 20%

What I Learned

  • 💡 Interpretability over complexity. CS teams need the "why" behind churn risk, not just a prediction. A model that explains beats one that doesn't.
  • 💡 The model's value isn't the 81% accuracy. It's identifying the factors that drive churn, enabling targeted retention programs.
  • 💡 Model outputs need translation to drive action. "Contract coefficient: -0.55" becomes "two-year contracts reduce churn by 42%."