2.2 On attrition

Attrition or churn has become a buzzword these last years. Churn analysis can be seen as an economic problem for three main reasons. Firstly, customers are in some way the firm’s more precious asset. Secondly, the firm’s resources in terms of customer relationship management are limited, so an efficient allocation needs to be deployed. Thirdly churn being a risk the firm has to cope with, it leads to asymmetric information from the firm’s side. With the development of advanced Econometrics and Data Science, several methods can be implemented in order to estimate churn.

On the one hand, survival models are helpful in measuring customer lifetime. In her thesis, Pérez Marín (2006) applies duration analysis to model the behavior of customers from an insurance company with a threefold purpose:

Identify factors influencing customer loyalty.
Estimate the remaining lifetime of a client who has subscribed to multiple policies and cancelled one of them.
Study the influence of covariates changing over time.

Her study is motivated by the importance of the insurer/policyholder relationship in a digitalized environment where the costs of searching for information are lower and lower and the risk of attrition consequently higher. The author develops a two-part methodology to address the study’s problematic. She begins by solely selecting insureds with at least two policies. Then, she fits a logit model to predict whether a policyholder will cancel their policies at the same time (type 1) or sequentially (type 2). She finally applies duration models on type 2 clients to determine the remaining time until all their policies are cancelled.

On the other hand, machine learning classification algorithms can be used for churn detection as illustrated by the work of Bellani (2019). Her objective is to develop a predictive model to detect customer churn in an insurance company while highlighting the key drivers of attrition. The underlying goals of her research paper are both minimizing revenue loss caused by churn and boosting the firm’s competitiveness. Using data on vehicle insurance policies, Bellani incorporates features on the policyholders, the vehicles, the insurance policies as well as marketing data to predict the churn indicator variable. After missing data imputation and dimensionality reduction, the author falls back on under-sampling to overcome the issue of unbalanced classes. There are indeed much more active than cancelled policies in the dataset. Her methodology works as follows:

The set of active policies is divided into 7 groups equal in size to the number of cancelled policies.
For each group of active policies, classification models (logistic regression, random forest and neural network) are trained on a subset of the original dataset including all the cancelled policies as well as the concerned group of active ones.
For each model, the predictions are aggregated across the 7 subsets for the final prediction.
Model selection is made by the means of the Kappa performance metric.

Ultimately when a customer leaves the firm’s portfolio, it may worth it to consider all possible outcomes for the reason he churned. For instance, a client might leave their telecom company because of a bad service quality, or because of too high a price. In this context, competing risk analysis can be introduced since its main interest is to determine the reason why the client churned. In their recent article, Slof, Frasincar, and Matsiiako (2021) try to predict both the likelihood of customer churn and the reasons for attrition using customer service data from a Dutch TSP. They estimate duration and competing risk models. In the competing risk model, three possible output states are considered: Controllable risk, Uncontrollable risk and Unknown risk. Each type of risk is assumed independent from another which means a client cannot be at high risk for two risks simultaneously. Besides, the authors implement a Latent Dirichlet Allocation model (see Bley, Ng, and Jordan (2003) for more details) to identify the main topics in a set of emails sent by customers to the service center. Six topics are discovered by the algorithm and each of them is then incorporated as explanatory variable into the models. These topic variables increase the performance of both standard duration models and competing risk models for Controllable and Unknown risks. According to Slof, Frasincar, and Matsiiako, “customers who churn due to the Controllable risk or due to the Unknown risk tend to call the customer service center with a specific problem, while customers who churn due to the Uncontrollable risk do not call the customer service center with a specific problem”.

References

Bellani, Carolina. 2019. “Predictive Churn Models in Vehicle Insurance.” PhD thesis, Universidade Nova de Lisboa.

Bley, Ng, and Jordan. 2003. “Latent Dirichlet Allocation.” Journal of Machine Learning Research 36.

Pérez Marín, A. María. 2006. “Survival Methods for the Analysis of Customer Lifetime Duration in Insurance.” PhD thesis.

Slof, Frasincar, and Matsiiako. 2021. “A Competing Risks Model Based on Latent Dirichlet Allocation for Predicting Churn Reasons.” Decision Support Systems 146.