Churn prediction model

Adam Votava
Sep 7, 2021
4 min read

Musing about a use case that’s been with me for a decade

No company likes to lose valuable customers. In the beginning, a company typically focuses on acquiring new clients, then grows by offering additional products to existing clients or trying to get them to use their products more.

If all is going well, there comes a point when the company is large enough that it must also choose a slightly more defensive strategy and focus on retaining existing customers. Despite the best user experience, there will always be a group of clients who are not satisfied and decide to leave.

The company then faces the problem of how to prevent these (voluntary) departures as effectively as possible. This is where the churn model, among others, comes to the rescue.

What is the churn model?

It’s a predictive model that estimates — at the level of individual customers — the propensity (or susceptibility) they have to leave. For each customer at any given time, it tells us how high the risk is of losing them in the future.

Technically, it’s a binary classifier that divides clients into two groups (classes) — those who leave and those who don’t. In addition to assigning them to one of the two groups, it will typically give us the probability with which the client belongs to that group.

It is important to note that this is the probability of belonging to the group of clients who leave. Thus, it is the propensity to leave and not the probability of leaving. However, it is possible to estimate the probability through a churn model.

What is it useful for?

By knowing which clients are at the highest risk of leaving, we can better target our rescue efforts. For example, we can reach out to these clients with a marketing campaign, reminding them that they haven’t purchased from us in a while, or even offering them a benefit.

In addition to knowing which clients to target, we can use the churn model to calculate the maximum benefit price that is still worthwhile. For example, if we know that the estimated probability of a particular client leaving is 10% and their annual revenue is $100, the expected value of future annual revenue is $90. Therefore, an offer that typically reduces the probability of leaving to 5% (the expected value of the revenue is then $95) will be worthwhile for this client, so long as it does not cost more than $5.

What do we need for the churn model?

Like any supervised machine learning model, a churn model needs training data with response (target) and explanatory variables (features). Based on this training data, the model learns to best capture the relationship between features and target.

Typically, this is historical data, where we know which clients eventually left and which did not. Those who left have a positive target (yes, they left). Others have a negative target (no, they didn’t leave). Whilst features describe clients at a point in time when that outcome was not yet known.

A properly defined target is fundamentally key. In many cases this is simple (e.g., cancellation of last product), sometimes less so (e.g., no transactions in the last three months). However, it is possible to apply the churn model to both contractual (e.g., bank) and non-contractual (e.g., e-shop) client relationships.

Features include any data that can help identify clients who churn. Often this includes socio-demographic data, data on products owned, historical transactions, client-company interaction, e-commerce behaviour, and so on.

It is also important to be careful about how far in advance we want to estimate the propensity to leave. In other words, how long is the time between the day we look at clients through the available features and the day we can tell if they have left? If that time is too short, we won’t have much time to make any kind of response. If, on the other hand, it is too long, the model will be less accurate and up to date.

What does such a model look like?

Modern churn models are often based on machine learning; specifically, on the binary classification algorithms mentioned above. There are a number of these algorithms, and it is necessary to test which one best fits a specific situation (specific training data, amount of data, etc.). Whether you use simple models such as logistic regression, more complex random forest or GBM, or venture into neural networks, you need to pay attention to the following two things.

Classifiers have a variety of performance metrics. Since churn is very low for most companies, it is not enough to look at the accuracy of the churn model. For example, if the churn is 10% and the churn model for all clients says they will not leave, it will have 90% accuracy. But this is not useful. So, among other things, you need to look at sensitivity (how many of the clients who actually leave were detected by the model) and precision (how many of the clients identified by the model actually left).
Furthermore, it is advisable to not use the resulting model as a black box. Rather, try to understand the parameters based on which decisions are made. Not only can this reveal flaws in the model or data, but it can also be very useful information for product and marketing teams. For example, if we know that the absolute amount of discount has less impact on churn than the relative amount of discount, we can use this to create more effective campaigns and pricing strategies.

What next?

Once you have the churn model ready, you need to plug it into the day-to-day running of the company. This involves monitoring, evaluating and updating it on an ongoing basis (whether that’s simply re-training it or even adding new features).

Consequently, you can start to automatically detect events that tend to increase the propensity to leave that need to be responded to as quickly as possible.

External data consultants can help you with both. But beware, it is crucial for the churn model (even more than for other data projects) to involve people with experience and a feel for the specific situation in the company and industry.

***

The article was originally written in Czech and published on Bizztreat Blog. Then translated an published on Medium https://towardsdatascience.com/churn-prediction-model-8a3f669cc760

As ever, I’m indefinitely grateful to Chelsea Wilkinson for patiently shaping my thoughts into a publishable format.

Photo by Drew Farwell on Unsplash