Article | Beyond Connectivity, B2B services, Data monetization, Data management

Contributed Content

Data mining for B2B churn and loyalty management in India and South Asia

Unintuitive UI/UX and poor customer experience are just some of the reasons behind B2B churn.

Sanket Jain, Solution Manager and Senior Inventor at IBM

Jatinder Joshi, Chief Architect at IBM

Joy Patra, Global Functional Solutioning Leader and Executive Architect at IBM

09 Aug 2022

Data mining for B2B churn and loyalty management in India and South Asia

B2C telecoms markets have illustrated the importance of churn prediction and the use of data mining to understand customer behavior. However, the problem of identifying and predicting churn can differ between B2B and B2C customers. In this paper, we review a number of B2B churn prediction solutions from other domains, and draw lessons from B2C markets to prepare a framework for B2B churn prediction for India and South Asia markets.

See figure 1 below for B2B Telecoms and Media churn predictive modeling.

Figure 1: B2B Telecom and Media churn predictive modeling.

It is important for B2B service providers to learn from B2C experiences because these fields are slowly converging.

Moreover, the B2B economy is almost twice as large as the business-to-consumer economy (source: Two Steps: A Primer on B2B Experiment, Mahima Hada, January 26th 2021, American Marketing Association).

B2C churn modelling

Two separate articles highlight some of the issues of churn in the Indian market in recent years. One points to how Airtel, Vodafone Idea lost 30 million customers; Jio added 9.4 million users in March 2019. And India’s total wireless subscriber base fell to 1,161.8 million on March 31, 2019, shedding 21.87 million users over the previous month.

A PwC report titled “Finding value for the consumer in The Indian mobile industry” mentions the churn problem in India and suggests a switch from prepaid to a postpaid connection to avoid churn. However, in June 2018, according to a Business Today report, postpaid subscribers were shifting to prepaid connections.

If I were to dissect the two conflicting points of view from the above-mentioned articles (PwC and Business Today), then it would be as follows: In India, people are hypersensitive to change in price, and are thus willing to try any offer that is either offering them a reduced price from their current plan, or one that offers subsidies or freebies. Moreover, human beings in general want more control over their spending and a prepaid connection offers more control to the user. This is because the user must pay for the plan a priori, and in a single transaction, does not have to spend beyond a certain amount say Rs. 400. This also gives the user a sense of belief that the money that can be fraudulently or unscrupulously taken away from their account will not exceed Rs. 400. Whereas an unused or an unpaid postpaid connection can result in bill shock. Another recent phenomenon is the tendency of many users to keep multiple SIMs so they can make use of offers, such as a 90-day package with 90 days free.

Does this person like to buy only when there are discounts? Does this person like to buy from a certain channel? Does this person like to buy only after having interacted on social media with his/her friends?

Below is an approach of how to solve this problem.

First, we need to identify the definition of churn and the churn rate. Doing this typically requires understanding the business of the client. Then we build the hypothesis. Then, typically we do profiling of customers. Some hypotheses that can be worth testing for their significance can be: Senior citizens may not require a smartphone, yet a young male may want at least three smartphones to justify his image as an early adopter or to appear smart in his friend group. Prepaid customers may be characterized by their wish to maintain anonymity, poor credit history, low usage, price sensitivity, willingness to experiment with different providers, etc. Whereas postpaid customers may be characterized by their affordability, high propensity to call ‘on-net’, i.e., to their family or friends from within their network of provider, a rich residential location, etc.

Preparing meaningful clusters is like preparing derived attributes. Once customer segments have been chalked out, we then jump to understanding how churn and churn rate are defined for the service provider.

Churn in B2C is sometimes a healthy phenomenon, yet sometimes it catches an analyst by surprise. For instance, in Canada, customers are willing to switch communication service providers after just 21 months, which is just short of the 24 months service contract.

We need to find the key reasons for churn:

a) Did the customer churn because they got swayed by a mere “save 100 rupees” offer?

b) Or were they waiting for the right time to switch?

c) Did they migrate because of their job to a large city? Today, Delhi, Chennai, Bangalore, New York, and Los Angeles have a high proportion of migrants. So, a hypothesis can be: did the migrants drive a change or churn? If the customer belongs to the category a) mentioned above, then they are (extremely) price sensitive and should be allowed to churn.

There have been several studies using data science approach to solve the customer churn prediction problem:

Cutting the Cord: Predicting Customer Churn for a Telecom Company

A Customer Churn Prediction Model in Telecom Industry Using Boosting

Deep Learning has been explored too.

India-specific trends are mentioned below. It appears that Reliance Jio has the lowest call drop rate as of November 16th 2018).

Here’s how telecom operators are taking on call drops

Airtel tops in call drop rate and download speeds with Jio close behind in latest TRAI independent drive test report

All telcos, except Jio, fail Trai’s call drop test

Channel churn: TRAI’s attempts to micromanage the broadcasting sector have backfired

Can Data Analytics Stop India’s Telecom Churn Crisis?

Data analysis

In trying to ascertain the churn rates, the data that we get from clients may be at an aggregate level, not at an individual level. We had earlier got data – by circle – for call drop rates, along with the monthly net growth of the number of subscribers. Upon searching for authentic data for churn and reasons contributing to churn, we went to the TRAI website and found TRAI call drop data, which is by operator, but not to the granularity of circle level. There may have been a study done, but that pilot project could not give many benefits during deployment stage because the churn rate in that circle was already much lower than the industry average (and maybe because they replicated the model of that circle to the other circles). That model was based on finding leaders and followers. When churn happens, then leader takes the followers along.

Churn is not a published KPI in TRAI website.

We want to correlate the churn data in each circle with call drop data. How many customers get lost in that period? IBM Benchmarking Report for KPIs looks as functional and business KPIs, and not churn KPIs.

Jio's arrival on the market meant a lot of subscribers being added to Jio, attracted by free data, free or very cheap calling, etc. So, now there are two segments of customers:

Price sensitive (freebies only, not much loyalty) and 2. Need more stability and reliability/want to continue exhibiting loyalty. Postpaid subscribers will stay, but today the network coverage is very bad for all the operators. Some regions are very patchy. So, unless telcos improve their coverage (QoS), there would not be much use. CRM can give the number of complaints related to call drop. Only CDR will have the information about the number of call drops.

Having said that, we typically design up to seven clusters when doing data profiling of telecoms subscribers. The number of clusters can go as high as twelve – such as in retail domain, although that is rarely the case in telecoms.

Figure 2 below shows call drop rate by circle, and subscriber growth rate by circle in India, which is based on data taken from year 2011 by an IBM team.

FIGURE 2: CALL DROP RATE AND SUBSCRIBER GROWTH RATE BY EACH CIRCLE IN INDIA

Next, we do data pre-processing (e.g., missing value imputation, outlier handling, etc.)

Then we filter our data to prepare a prediction modeling base. For instance, we discard a) involuntary churners and b) churners who have consistently been doing top-up but have zero MOU (minutes of usage).

Then, we create several hypotheses which may require designing of derived attributes. Then we design feature vectors or derived attributes to increase the accuracy of predictive model. Subsequently, we feed these derived attributes into a predictive model.

Upon doing analysis for a circle in Maharashtra, I found one of the predictors to be ratio of incoming to outgoing calls. This was a predictor because it indicates that the customers were using new connection for outgoing calls, and the older connection for receiving calls.

Next, for India, we can test the following hypotheses:

What is the best way to define churn and churn rate? Then report by data churn, voice churn (partial churn).
For each circle, find the number of complaints per month, churn rate per month.
How much churn can be attributed to call drop rate and to network quality? A good way to research here would be to identify for each circle the bins or the distribution of frequency of call drops that has the highest correlation on voluntary churn.
Are the customers not too sensitive to call drops? For instance, customers living in rural areas may or may not be too sensitive to call drops. They may be sensitive to call drops when they can change easily from nearby shop or vendor, thus making it easy for them to churn. They may not be sensitive to call drops if they do not have much choice from other operators.
Are pesky phone calls from unrecognizable phone numbers despite having the TrueCaller feature causing a customer to be considering churning the voice service or the entire service from that telecoms operator to someone else? We can use Blockchain to force all the phone callers to register themselves on Blockchain before attempting to call a customer. Source: IBM explores telecoms blockchain with Indian regulator – Ledger Insights – enterprise blockchain.
Is lack of innovation by Apple and Samsung yet high price of their latest models letting the customers churn to other Chinese mobile phones?
International travelers may enjoy a good call quality for a few days in the foreign country, and upon returning to India, want to churn because they find that there is a big difference in quality between the abroad plan and what the existing service provider is offering.
For India, the complaint attributes may give clues for customer churn: calls made to customer center, complaints data such as complaint type, complaints in process, etc.
Additionally, we may also require a number of call drops before churn happened, whether call drop happened before churn, first time call dropped, etc.
Refer here for some features for customer churn prediction in B2C.

In telecoms customer level analytics, we are often given CDR data, which has caller number, callee number, timestamp of day, hour of day (to get peak vs. off-peak time) and call duration. But, because there can be so many calls in a single day by a customer, we aggregate the data to daily level before doing analysis. Also, we choose data that is 0 to 60 days old because the prepaid market in India is quite dynamic.

Next, from the Data Mart, we also get customer demographic info (age, gender, zip code, region, tariff plan id, extra services information, loyalty indicator, number of phone connections that they possess, their value segment identifier, etc.), and their usage data of the services (e.g., number of call drops, voice to value ratio, where value is measured in rupee terms), their CUG or Close User Group of friends/family to whom they call the most frequently, how many times they called the call centre to complain, the complaint nature, the status of complaint, the feedback that they give about the services that they are using, their VAS such as SMS, Call Forward, Voice Mail, Ringtone, etc. From this, we can make some sample data for either wireless or wireline.

According to [12], the authors propose a new churn prediction approach based on CC or Collective Classification, which accounts for both the intrinsic and extrinsic factors by utilizing the local features of, and dependencies among, individuals during prediction steps. They evaluate their CC approach using real data provided by an established mobile social networking site, with a primary focus on prediction of churn in chat activities.

Their results demonstrate that using CC and social features derived from interaction records and network structure yields substantially improved prediction in comparison to using conventional classification and user profile features only. One of their KPIs is the time gap of inactivity between two consecutive chat sessions that a user has engaged in. They label a user as churner if their activity in the churn window drops to some extent relative to the activity in the previous window. They keep the observation period one month ahead of the churn period, which can provide insights on the chat activities of a user before they decide to churn. As their goal is prediction of chat-activity churn, they consider in their study the construction of chat graph, extracted from the chat session table in their database. The nodes in their chat graph represent the chat users, and the directed edges indicate the social ties between any two users. They only regard reciprocal edges as a sign of social ties between two individuals. In turn, the strength of social ties between two nodes are encoded as edge weights. Further, they chose a 2-month window to distinguish between the past churners and the current users under observation. This allows the authors to investigate the influence of past churners on the churn propensity of the current users. Specifically, the past churners are users who appear in the train graph but not in the training instances (i.e., they never chat for more than one month).

B2C loyalty modelling

Loyalty analytics is also a big factor in churn management and CRM. Too often, we find a customer leaving the CSP (Communications Service Provider) when least expected -- for example once a customer has passed the ‘fledgling test’ of surviving the first 90 days (i.e., AON or Age On Network >= 90 days).

In fact, if that customer has survived much beyond that period, then there is even more reason for the CSP to target that customer with offers or a tailored plan. So, why did that customer leave at that juncture? It could have been an incident or a negative (first) experience that happened several months previously. We would do better if we were to break down that journey into such cohorts as: AON >= 90 days, 91 < AON <= 135 days, 136 < AON <= 180 days, 181 < AON <= 630 days, 631 < AON <= 720 days, AON > 720 days, etc. We can then combine this statistical approach with spectral analysis [2].

We would do well to not leave the disjointed story forever. We would need graph theory and more intelligent algorithms to come up with the next best offer for such customers. Those who come back to the same CSP within two years can be labelled either as “change seekers” or “early adopters” or even “risk takers”. They should not be merely clustered as “opportunists”.

B2B loyalty modelling

According to IBM IP titled “System and method for rare event modelling of customer loyalty in B2B and B2C contexts”, customers are not aware in SaaS or Software-as-a-Service of what their next usage or pattern would be. To avoid churn of customers who are at risk of not renewing their license techniques such as email intimation of license renewal are typically mass mailed, but we wish to target only loyal customers. Loyalty by itself in B2C is sometimes a rare event. Also, if a loyal customer does not pass on their loyalty benefits to their kin, then that might become a rare event. A single rare event occurrence would be difficult to predict but if there are leading events that can be defined and tracked, the shifts that cause say a global or a local stable point to become unstable would be of interest and can be applied across the spectrum. Certain financial institutions have different plans for loyal customers and their loyalty is converted into objective value. When this is done, the rare events would be when someone does not transfer the loyalty benefit to any kin. Though the local stability points are there as a concept and may be there in some forms, to look at the possible black swan events (people still do it intuitively) and to create red flags for organizations to take steps to avoid them is a strategic business need. When loyalty points can be distributed especially with a time value associated with the cost of keeping it longer, and customers keep the loyalty points unused, then it becomes a factor for the model. Loyalty is a continuous event because of e-commerce domination in our lives, and there would be some market driven factors as well.

According to an IBM IP titled“System and method for rare event modelling of customer loyalty in B2B and B2C contexts”, the B2B specific KPIs are as follows: Average deal size, Phase of lead generation maturity (Strategic, etc.), Year it was founded, MRR (Monthly Recurring Revenue). Behavioral KPIs are as follows: Number of emails they opened. What pages do they visit? Did they tweet? How long since they last visited? Which articles did they click to visit? Did they watch the video? Did they write a review? Understand acquisition type (controlling entity, hive-off, merger, acquisition) that both IBM and its rivals have been going after. Understand the Industry and its sub-categories.

According to the same IBM IP, the loyalty decline predictors would need to be made from data captured in a common data platform: upsell and subscription renewal are key loyalty indicators. Find what makes customers come to your business in the first place: Is it convenience? Is your business more destination based, and customers come specifically for a certain product or service? Do customers do pre-payment? Are there points per visit-based incentives available? How many tiers of loyalty have been defined? How mature are competitors in the loyalty program? Is customer value more important than a data-driven loyalty approach? Find the reasons for decline in loyalty. Percentage of loyalty memberships inactive. percentage of customers that abandon a scheme without ever redeeming anything. Have the loyalty benefits been clearly told beforehand? Percentage of enterprise data that is not being used for analytics. Is a co-brand in pipeline? Does the competitor have an affiliate program?

B2B churn modelling

Considering the lack of academic research on churn prediction for B2B, a significant difference in field of application and a variety of methods applied in different papers, a comparison of the different B2B churn prediction methods is difficult to realize. Given this variation, the interpretation of the results is quite challenging. Drawing conclusions in consideration to B2B based on results acquired in a B2C context is a considerable challenge as well due to the differences of these markets. We have been making some general predictions about the performances of the different churn modelling techniques. The authors call this situation that we currently find ourselves to be as “the great churn” because it might require experimenting with all kinds of hypotheses and coming up with creative feature vectors to stitch a data mining story that can be applied to a B2B setting.

There is an added temptation of applying transfer learning to learn from B2B churn modelling in Fast-moving consumer goods (FMCG) to the telecoms domain. According to [4], FMCG or Fast-moving consumer goods are considered relatively inexpensive and frequently purchased. Transactions characterized by high volume make customer retention and accordingly churn prediction more prominent. However, the fact that FMCG companies often operate in a non-contractual setting makes it more challenging as well. This is because customers are not obliged to let companies know when they stopped using their services or buying their products. In this way, it is more difficult to determine when exactly a customer has churned. Therefore, it has been suggested to focus on partial churn instead of complete churn in retail settings because customers typically defect progressively, rather than in an abrupt discontinuation. Partial churn has a strong possibility to turn into complete churn in the long run. Therefore, successfully predicting partial churn can prevent complete churn.

In addition, the authors say that there are future research directions. In the author’s opinion, we should also try Mini-batch Gradient Descent and work with a flexible value of learning rate to achieve best results when compared to Logistic Regression or a Decision Tree based approach.

According to “System and method for rare event modelling of customer loyalty in B2B and B2C contexts”, handling of customer churn and loyalty are very important for SaaS businesses. Customers leave due to lack of scalability of model/program, lack of adaptability to new tools/techniques, lack of enough safety and governance controls, regulatory challenges, lack of dependable delivery, not enough level of tailored services, low level of responsiveness, low collaboration, need for product/service ceased, easy to switch options, and loss of trust.

As an illustration, churn should also be viewed from the branding and loyalty aspect of a company. For instance, Apple may have created such a nice and exquisite brand with a view to retain the most loyal customers. However, in price conscious and economically growing countries like India and China, their loyalty is now eroding. Only now they are launching their previous versions for 50,000 rupees. However, it may be too late for them. This is because a Chinese company had launched four companies to capture the market share in India and have now beaten both Samsung and Apple in market share because of their lower price. That Chinese company has also ensured that there is not much cannibalization amongst any of its four products, which are Oppo, Vivo, Xiaomi, and OnePlus.

Note that a similar strategy has been adopted by Birla group in India (for Retail domain), and because of that, it owns almost all the apparel chains in India, such as Van Heusen, Louis Phillippe, Zodiac, and Pantaloon.

In a study done by Qymatix, the authors propose spending time mapping your typical customer journey. Get as much input as you can from your salespeople. Quantify each stage in the journey and then cross this data with customer satisfaction or complaints, customer support cases opened, invoice payment frequency, the number of customer webinars attended, etc. The goal is to break-down customer behavior and to identify those signals with a more significant impact on customer defection. Naïve Bayes can be applied for modelling. They do this in the customer behavior analysis. They also propose customer financial modelling where they mention clustering your customers based on the combination of products they buy and then finding out which one has not yet purchased your latest ones. Ask your salespeople about them. Why haven’t they, however, used a newly introduced product (or service)? They also mention that unless using artificial intelligence for predictive modelling, one significant disadvantage here is the lagging nature of the supporting sales data.

In another study, the Mosaic Data Science team was able to confirm which features were most impactful to churn for which they may have used Random Forest using R, and compared their model with SVM or support vector machines. As the software firm expected, the number of product cluster and overall service calls, the volume of service calls about products nearing the end of their maintenance contract, and whether the customer had previously cancelled contracts for other products were all predictors of cancellation of a contract for a particular product. They show the strongest predictors of a cancellation with a 6-month lead time, averaged across 1,000 independently generated decision trees.

They also found that their model gave the same accuracy as SVM when the contract end date is 15 or 18 months away.

According to [13], the authors present an approach that enables the mapping of customer- and end-user-data based on “customer phases” which allows the prediction model to take all critical influencing factors into consideration. In addition to that, they introduce a B2B customer churn prediction process based on the proposed data mapping.

Define B2B churn from IBM point of view:

IBM is a client of a telecoms operator, say AT&T, wherein IBM consumers use products of AT&T such as office phones. So, IBM will be client of AT&T for consuming B2B products.

IBM has its employees who use corporate plan for either making IBM CUG calls or IBM calls or personal calls, thus it would be a combination of B2B and B2C. Thus, IBM employees would be retail consumption for B2C kind of personal calls as well as for retail consumption for B2B (i.e., employee to employee calls).

When the services or products are used specifically for B2B (e.g., VPN or Virtual Private Network), then we should identify by which city, region, location, geography and building the problems or customer complaints are arriving. Find the volume of customer complaints about poor QoS or Quality of Service if VPN is not performing well. Then we should work towards preventing this type of B2B churn by pre-empting a huge volume of complaints of similar nature before waiting for more complaints to accumulate.

How to do segmentation

Use k-Means clustering and use eight or nine clusters for B2C telco modeling.

We can use a similar number of clusters for B2B telco churn modeling as well.

Content providers – how to design the optimal number of clusters in India and South Asia

There is a massive consolidation of telcos that requires a close look at churn. It is a non-trivial solution. The content providers and service providers are converging now, which can be called a great churn problem.

We need to define OTT (Over the top) that is not on-demand. The steps to solve that problem can be as follows: first, define the intersection of OTT which is not on-demand. Then find how much is addressable advertising already doing it. There may have been a study in the USA which mentions that they had prepared more than one thousand clusters for fifteen million customers. Then offer to those groups. However, it is still not personalized. So, some folks are doing that level of personalization. Then ask relevant people to give rating of online content. When they have watched some content for say 5 minutes, then ask for payment for Advertisement 1. When they have watched some content for say 15 minutes, then ask for payment for Advertisement 2. USA may already be doing it: people watch their Netflix content, on Telco A bandwidth, but are not charged for that bandwidth. Find what Netflix with Airtel does apart from giving subscriptions in India. Disney too had tied up with Reliance Jio in December 2018.

Note: Continuing with finding the optimal number of clusters, we should begin with profiling the customers. There may be segments of customers who are above sixty of age and are mostly interested in listening to videos that have motivational content, yet those in the age group of 30-40 years may want to watch yoga or exercise related videos during certain time of day.

JioCinema is an on-demand video streaming platform and is in competition with the entertainment app of Bharti Airtel. Telcos are vying for more exclusive tie ups in entertainment, sports, and news to attract more subscribers. This can be understood as B2B association between content providers and service providers.

The way forward would be to understand how companies like Disney are approaching this problem. What if Disney decides to take its Marvel shows back from Jio? This will prompt the customers to switch to Disney to watch the Marvel series.

Very soon, it will be the content from Sony and Zee and Hotstar for new shows that will be the key to retain customers. There will be competition there. We can further think of what the subscription plan should be. We also need to find the source that is publishing this year’s (i.e. 2019) of which service provider is offering the best network coverage, as that would have a bearing on the quality of consumption of content. A pertinent question would be whether Reliance Jio is offering better network coverage than Airtel. The network quality will also be a function of whether there are significantly larger number of customers than last year who are watching content, the quality of coverage in the location, the trend of quality of coverage by circle, etc.

According to this insight from Coredna, YouTube video ads are a relatively untapped goldmine.

Also, it says that more than 50% of Google queries result in no clicks, so creation of content that ranks on Google using question-driven content strategy would be key in the year 2020. However, the authors think that it needs to be balanced with the potential breach of privacy of customers. This is because if we want to deliberately improve the conversion ratio of number of clicks on Google, then doing so may require designing tailored recommendations for customers or for a group of customers, which in turn may require surveys to be designed to seek responses to questions that would need to be tailored to a customer.

One interpretation of this could be from a B2B point of view. For instance, the ads on YouTube appear to be from already established companies, whereas Facebook would like to promote ads from start-up or not-so established companies. How to rank order the top five video ads on YouTube can be a research problem.

Such a strategy if adopted well can help promote your product from B2B point of view, thereby allowing you to do better B2B targeting with YouTube ads. For instance, if a company “A” decides to showcase its video ads on YouTube, then it must apply some analytical model to shortlist a few target segments or types of companies. More questions will be: How many ads should “A” be promoting? What should be the duration of each video ad? What should be the sequence of each video ad? How to target companies or customers who do not want to view a video? Which are the top five video ads that “A” should promote upon a consumer or a company watching a certain video on YouTube? What can be the ranking algorithm? When a company “B” competes with say five companies for a deal with a client, then how does “B” shortlist or down-select vendors? Can we apply the same ranking algorithm using Graph Theory and SVM (support vector machines) [11] to solve the problems of “A” and “B”?

According to [1], the authors build categorization of customer records into several groups in the B2B context, using clustering analysis. The obtained categories are then used to make suitable marketing strategy recommendations for each group of customers. This is accomplished by first using the Length, Recency, Frequency, Monetary (LRFM) customer lifetime value model, which scores customers according to four attributes, the relationship length with the company (L), recency of latest transaction (R), purchasing frequency (F), and monetary value of customer (M). Secondly, the paper introduces a proposed enhanced clustering model using the k-means++ algorithm, where customer records are segmented based on their respective LRFM values. Also, the proposed model is integrated with a bootstrapping phase, where the selection of the number of clusters is performed by employing both the Calinski-Harabasz and Rand cluster validity indices. In addition, the firmographics data of each customer are considered by analyzing groups based on both the sale sector, and the location of customers, as means of enhancing the clustering analysis. The authors had performed this study on a dataset obtained from a well-known, multi-national, Fast-Moving Consumer Goods (FMCG) company in Egypt.

Can we extend the learnings of this paper to the telecoms or media & entertainment industry?

Yet another area of research will be the number of clusters that should be made to offer personalized recommendations to each customer based on the customer’s behavior and the customer’s transaction records. Xiaojun [3] prepare PurClusTree, which is a clustering algorithm for customer segmentation from massive customer transaction data. They posit that the customers’ transaction data set can be compressed into a set of purchase trees. We propose a partitional clustering algorithm, named PurTreeClust, for fast clustering of purchase trees. A new distance metric is proposed to effectively compute the distance between two purchase trees. To cluster the purchase tree data, we first rank the purchase trees as candidate representative trees with a novel separate density, and then select the top k customers as the representatives of k customer groups. Finally, the clustering results are obtained by assigning each customer to the nearest representative. It would be interesting to see if we can extend such an algorithm to the media and entertainment industry, especially where there are 90% data users and 95% prepaid subscribers, such as India and Indonesia.

Yet another reference can be [9] where the authors generate customer’s income data and create data clusters to optimize customer potential by utilizing data. Furthermore, the result brings them an insight into which group of the customer might unserved properly considering their average income with their spending behavior. It would be interesting to see if such a paper can be relevant in the prepaid centric markets in India and Indonesia.

According to [15], the author concludes that churn analysis is applicable to the noncontractual B2B hotel industry.

So, why is churn happening in India?

In India, the customer is generally not happy with the service. It is not a priority for many companies. For instance, the customer gets called at odd hours, during their meetings, including on weekends, to provide feedback. Annoyed at being called at odd hours, the customer asks the caller to call after office hours on weekdays. However, the caller says that they cannot do so as they too must leave for their home after regular office hours. The feedback should instead be solicited from email or SMS.

Time series analysis and other approaches

According to the paper titled “Effective Customer Churn Prediction on Large Scale Data using Metaheuristic Approach”, (K. Sivasankar*, Department of Computer Applications, National Institute of Technology, Trichy, India, Indian Journal of Science and Technology, Vol 9(33), DOI: 10.17485/ijst/2016/v9i33/99103, September 2016), Graph based representations of data have several advantages. In and out degree-based analysis can reveal properties of importance. These properties can be used to incorporate specific property-based weights, leading to better prediction. Extraction and aggregation of the plans corresponding to positive churn and plotting them in comparison with the competitor’s plans can provide a deeper understanding of the reasons for a customer to move from one plan to another. Identifying the attractiveness of plans/features and the level of attractiveness could be identified by customer grouping. Levels of attractiveness required for each category of customer could be identified. The process of fuzzy classification can be incorporated into the model to identify the level of positive churn and in turn use the properties and plans identified using the property graphs to prevent churn.

According to the paper titled “Enhanced feature mining and classifier models to predict customer churn for an e-retailer” by Karthik Subramanya, Iowa State University, Ames, Iowa, an immediate addition to improve the current results are using Grid search functionality to do hyper-parameter tuning to Gradient Boosting Classifier which happens to be his best classifier. Another important avenue worth exploring for addressing customer churn is the application of time series analysis to this problem. As the businesses grow and get more complex there are more additional data sources, channels that continuously open up and may hold valuable information. Figure 3 below mentions some of those variables.

FIGURE 3: FEATURE VARIABLES FOR PREDICTION OF CUSTOMER CHURN IN AN E-COMMERCE COMPANY

In addition, the author has organized his feature collection process into four broad categories.

Customer demographics

Customer demographics refers to factors that define the type, scale and other attributes concerned with the customer alone which are independent of the e-retailer. They could depend on the size of the company, the number of employees in the company, financial worth, geographical demography, the type of the industry the customer belongs to, etc. Some of the features concerning customer demography are categorical like geographical location, industry, etc. while the other features like size of the company, the number of employees, etc are ordinal variables. If a particular industry vertical is on the verge of decline, it is not very surprising to see that customers have reduced their spending leading to churn, and similar reasons may be attributed to region, etc. Thus, customer demographic information plays a vital role.

Enterprise sales data

The sales metrics and customer buying pattern are captured from point of sales system or order management systems. Some of the important features of enterprise sales data are total sales, recency of sales, frequency of sales, categories of products bought, year to date buy ratio, etc.

Customer interaction data

These metrics are captured from channels that handle and store customer interaction data, customer survey data, chat data, email marketing, marketing campaign outcomes, etc.

Customer behavior data

These metrics are captured from clickstream logs which captures the overall customer interaction with the organization’s e-commerce platform. Some of the important metrics that are valuable features to consider are session lengths, cart activity, cart abandonment, user navigation experience, user product finding experience, visit to conversion ratio, response to marketing emails, etc.

Neural networks – the way forward

Referring to this news release from PR Newswire, will the above mentioned B2B patent by Cogism with the help of A.I. (where A.I. is an add-on) help accelerate a company’s problem of churn? Looking from another perspective, since Cogism says that bad data can be a challenge, can we assume that a lack of mature data can be a leading cause of B2B marketing companies to churn?

We have been solving the problem of churn prediction in telecoms by using Decision Tree, Logistic Regression and Linear Regression. We have started to use neural networks for variable selection. When we use neural networks for modelling, then it is still difficult to comprehend its results, let along explaining and convincing those results to the client. However, according to this article from Medium, which is titled“How to generate neural network confidence intervals with Keras”, the author mentions neural networks to re-engineer the models to return a set of (differing) predictions each time they perform inference. The authors goes on to mention that they can then use the distribution of these predictions to calculate the model’s confidence intervals. However, such approaches may not be justified. We should continue to use Logistic Regression as it is best suited for classification tasks.

How to use machine learning and A.I. for B2B datasets

Too often we get data that is either incomplete or missing in label or having some other kind of bias (e.g., selection bias). We can refer to the following two papers to address those issues.

According to [7], in practical regression and classification problems, one must face two problems on opposite ends: underfitting and overfitting. On one end, poor design of model and optimization process can result in large error for both training and testing dataset (underfitting). On the other end, the size of the sample that can be used to tune the parameters of model is always finite, and the evaluation of the objective function in practice will always be a mere empirical approximation of the true expectation of the target value over the sample space. Therefore, even with successful optimization and low error rate on the training dataset (training error), the true expected error (test error) can be large (overfitting). The subject of their study is the latter. Regularization is a process of introducing additional information to manage this inevitable gap between the training error and the test error. In this study, they introduce a novel regularization method applicable to semi-supervised learning that identifies the direction in which the classifier’s behavior is most sensitive. Regularization is often carried out by augmenting the loss function with a so-called regularization term, which prevents the model from overfitting to the loss function evaluated on a finite set of sample points.

In fact, smoothing the output distribution often works to our advantage in actual practice. For example, label propagation is an algorithm that improves the performance of classifier by assigning class labels to unlabeled training samples based on the belief that close input data points tend to have similar class labels. Also, it is known that, for neural networks (NNs), one can improve the generalization performance by applying random perturbations to each input to generate artificial input points and encouraging the model to assign similar outputs to the set of artificial inputs derived from the same point. Several studies have also confirmed that this philosophy of making the predictor robust against random and local perturbation is effective in semi-supervised learning.

However, some researchers found a weakness in naive application of this philosophy. They found that standard isotropic smoothing via random noise and random data augmentation often leaves the predictor particularly vulnerable to a small perturbation in a specific direction, that is, the adversarial direction, which is the direction in the input space in which the label probability p(y = k|x) of the model is most sensitive. It has earlier been experimentally verified that the predictors trained with the standard regularization technique such as L1 and L2 regularization are likely to make mistakes when the signal is perturbed in the adversarial direction, even when the norm of the perturbation is so small that it cannot be perceived by human eyes.

The author’s proposed regularization technique is a method that trains the output distribution to be isotropically smooth around each input data point by selectively smoothing the model in its most anisotropic direction. To quantify this idea, they introduce a notion of virtual adversarial direction, which is a direction of the perturbation that can most greatly alter the output distribution in the sense of distributional divergence. Virtual adversarial direction is our very interpretation of the ‘most’ anisotropic direction.

Adversarial training is a successful method that works for many supervised problems. However, full label information is not always available. Literally, the authors use “virtual” labels that are probabilistically generated from p(y|x,θ) in place of labels that are unknown to the user, and compute adversarial direction based on the virtual labels. Also note that, according to the [6], a common misconception is that adversarial training is equivalent to training on noisy examples. Noise is a far weaker regularizer than adversarial perturbations because, in high dimensional input spaces, an average noise vector is approximately orthogonal to the cost gradient. Adversarial perturbations are explicitly chosen to consistently increase the cost.

Also go through this link, which is titled “7 ways artificial intelligence is transforming the business landscape”. We might need to make it tailored to B2B use cases.

Applying Reinforcement Learning using DQN (Deep Networks) for Customer Churn Prediction:

Churn using AI or “MCS” Monte Carlo Simulation (currently, customer churn models are based on Random Forest or XGBoost).

Use Temporal-Difference Learning which is an off-policy Reinforcement Learning algorithm.

We can use DQN which is an RL or Reinforcement Learning algorithm.

This is assuming that Analytics such as RF (Random Forest) and XGBoost are advanced analytics algorithms, MCS is simulation based algorithm, and AI is latest. So, we are proposing clients to augment RF/ XGB/ MCS with AI (i.e., with DQN RL algorithm).

Some factors to consider:

According to this article, Preventing churn like a bandit with uplift modeling, causal inference and Thompson sampling (Gerben Oostra, Jan 6, 2020), they prepare a setup that has brought together methods from different lines of thought. They focus on feedback and actionable (prescriptive) predictions from a reinforcement learning perspective. They consider bias and causality using inverse propensity weighting. They consider uncertainty and the balance between exploration and exploitation. Finally, they focused on the improvement of their action using uplift modeling. So, they brought together uplift modeling, causal inference, and reinforcement learning.

Deep learning for B2B modeling in telecoms

We can refer to the deep learning architecture mentioned in [14].

Data processing

To use a recurrent network topology, the data needed to be represented as a time series [14]. Therefore, each row should be transformed into a two-step series, with the first step including data on previous purchases and the second – on the current purchase. Some variables, such as the date of the purchase, can be inadequate for such a transformation and thus can be represented as two steps with identical values. The target variable can then be stored as a separate, 1-dimensional vector.

Model tuning

The prediction of customer churn can be performed using two base artificial neural network topologies [14]. A multilayer perceptron (MLP), with one or two fully connected dense layers can be used. Also, recurrent layer as a first hidden layer (RNN), optionally supported by an additional dense layer can be used. Particular numbers of neurons can be preliminarily selected by comparing the accuracy and F1 scores of models of different widths. When multiple models perform similarly, the simpler one should be selected. Each network will be optimized using a binary cross-entropy loss function. The output layer uses a sigmoid activation function. To prevent overfitting, each model would be augmented with an extra dropout after every hidden layer. Both versions of the model, with and without dropout, would be trained and compared. In case the “dying ReLU” or Rectified Linear Unit problem appears, each model would be trained in two versions – using the standard rectified linear unit activation function and using the Leaky ReLU activation function. All models would be prepared and trained using Keras library with TensorFlow backend.

Learning

A 10-fold split can be performed over a dataset [14]. Each model would be trained independently 10 times, using consecutive data sections as validation sets and the remaining parts as training sets. The models consisting of only fully connected layers would be trained over 40 epochs. The models containing recurrent layers would be trained over 60 epochs. Model accuracy on the training and validation set would be measured after each epoch. After the last epoch, additional metrics would be calculated.

Net Promoter Score

According to [16], Predicting Customer Churn with Net Promoter Score, for B2B companies, surveying your customers on a regular basis, relationship surveys, has been shown to be an effective way to predict customer churn.

In a recent study it was shown that using “would recommend” style survey questions such as Net Promoter Score is a very effective way of predicting B2B customer churn 3-6 months before the churn event occurs. By asking customers the NPS question on a regular basis, typically every 3 months, and looking for declining scores it is possible to identify at-risk customers. The 3–6-month lead time is particularly important in B2B organizations because trying to save a customer at the last minute is almost impossible. Consistency of supply is key in a B2B context and before cancelling with one supplier, most B2B organizations will already have selected and onboarded a new supplier.

With telecoms, SaaS, utility companies, a customer must actively inform the company they are no longer going to purchase their services. When they cancel their agreement, they have churned. On the other hand, for non-contractual relationships, e.g. retail, printing, consulting and graphic design industries, there is often no agreement to reference. Customers also make purchases on an irregular basis so it’s very difficult to distinguish between a customer just not having made a purchase in a while and a customer having churned and now using another supplier. Things get even more complex when customers can only purchase at specific intervals. See Fig 4 below for types of relationships with customers.

FIG 4 TYPE OF RELATIONSHIP WITH CUSTOMERS

Source: Probability Models for Customer-Base Analysis, Peter S. Fader, University of Pennsylvania, Bruce G. S. Hardie, London Business School, 20th Annual Advanced Research Techniques Forum, June 14–17, 2009

Note: The author would like to thank Vivek Chaudhuri in Gurgaon India, Tapasi Sengupta in Bangalore India, and Abeer Selim in Cairo Egypt for their review.

References

[1] “A Two-Phase Clustering Analysis for B2B Customer Segmentation”, by Dalia Abdel, Razek Kandeil, Amani Anwar Saad, and Sherin Moustafa Youssef, IEEE, 2014 International Conference on Intelligent Networking and Collaborative Systems.

[2] Spread-gram: A spreading-activation schema of network structural learning Jie Bai, Linjing Li, Daniel Zeng, The State Key Laboratory of Management and Control for Complex Systems, Institute of Automation, Chinese Academy of Sciences, Beijing, China {baijie2013, linjing.li, dajun.zeng}@ia.ac.cn

[3] PurTreeClust: A Clustering Algorithm for Customer Segmentation from Massive Customer Transaction Data”, Xiaojun Chen ; Yixiang Fang ; Min Yang ; Feiping Nie ; Zhou Zhao ; Joshua Zhexue Huang, College of Computer Science and Software, Shenzhen University, Shenzhen, P.R. China, Published in: IEEE Transactions on Knowledge and Data Engineering (Volume: 30 , Issue: 3 , March 1 2018).

[4] “Benchmarking analytical techniques for churn modelling in a B2B context”, Master of Science thesis, Jana Van Haver, Master’s Dissertation submitted to obtain the degree of Master of Science in Business Engine, 2016-2017.

[5] “Effective Customer Churn Prediction on Large Scale Data using Metaheuristic Approach”, K. Sivasankar*, Department of Computer Applications, National Institute of Technology, Trichy, India, Indian Journal of Science and Technology, Vol 9(33), DOI: 10.17485/ijst/2016/v9i33/99103, September 2016.

[6] “ADVERSARIAL TRAINING METHODS FOR SEMI-SUPERVISED TEXT CLASSIFICATION”, Takeru Miyato, Andrew M Dai, Ian Goodfellow, 6th May 2017, Arxiv.

[7] “Virtual Adversarial Training: A Regularization Method for Supervised and Semi-Supervised Learning”, Takeru Miyato, Shin-ichi Maeda, Masanori Koyama, and Shin Ishii, 27th June 2018, Arxiv.

[8] “Enhanced feature mining and classifier models to predict customer churn for an e-retailer” by Karthik Subramanya, a thesis submitted to the graduate faculty in partial fulfillment of the requirements for the degree of MASTER OF SCIENCE, Major: Electrical and Computer Engineering, Iowa State University, Ames, Iowa.

[9] “Monte Carlo simulation and clustering for customer segmentation in business organization”, Andry Alamsyah ; Bellania Nurriz, Published in: 2017 3rd International Conference on Science and Technology – Computer (ICST).

[10] “Spectral Clustering of Customer Transaction Data With a Two-Level Subspace Weighting Method”, Xiaojun Chen; Wenya Sun; Bo Wang; Zhihui Li; Xizhao Wang; Yunming Ye, Published in: IEEE Transactions on Cybernetics (Volume: 49, Issue: 9, Sept. 2019).

[11] Graph Based Association Analysis for B2B, Sanket Jain Strategy and Analytics, Advanced Analytics Consultant, Bangalore India sanjainj@in.ibm.com, TM Forum, February 2015.

[12] Collective Churn Prediction in Social Network, Richard J. Oentaryo, Ee-Peng Lim, David Lo, Feida Zhu, and Philips K. Prasetyo School of Information Systems, Singapore Management University Email: {roentaryo, eplim, davidlo, fdzhu, pprasetyo}@smu.edu.sg, 2012 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining.

[13] “Customer Churn Prediction in B2B Contexts” by Iris Figalist, Christoph Elsner, Jan Bosch, and Helena Holmström Olsson, International Conference on Software Business, ICSOB 2019: Software Business pp 378-386, October 2019.

[14] Deep Learning for Customer Churn Prediction in E-Commerce Decision Support, July 2021, DOI:10.52825/bis.v1i.42, Pondel et al, Wroclaw University of Economics and Business

[15] “An Analysis of Non-Contractual Churn in the B2B Hotel Industry”, De Wit, L.S., ANR: 837191 U-Number: 1255441, Tilburg University, Master thesis: Data Science: Business and Governance 2017, First Reader: dr.ing. S.C.J. Bakkes Second Reader: Assistant Professor A.T. Hendrickson, Date: Friday, July 28, 2017

[16] Customer Churn Prediction Approaches For B2B and B2C Industries, Adam Ramshaw, 2020

[17] Applying Random Forest on Customer Churn Data, Akhil Sharma, May 3, 2021

Disclaimer 1: This document is intended to represent the views of the author rather than IBM.

Disclaimer 2: There is no recommendation or solution to technical problems proposed.

Disclaimer 3: The authors have referenced fictitious names or situations wherever possible.

Sanket Jain

Jatinder Joshi

Joy Patra