Machine Learning Magic Triangle

A new way to think about the machine learning process

Oct 12, 2022

7 min read

Does the machine learning flow which is shown below look familiar? It is a flow that all data scientists and machine learning engineers are used to seeing and using it in their work.

Standard machine learning flow (image by author)

Though it is a perfect machine learning flow, it has the following inefficiencies

It is focused on data science development and misses out on an important part of developing a business hypothesis of the problem
It is an open-ended flow. However efficient machine learning needs to close the loop between the start and finish

In this blog post, I would like to introduce a new way of thinking about the machine learning process. I have named it as Machine Learning Magic Triangle.

Machine Learning Magic Triangle (image by author)

It attempts to overcome the inefficiencies of the standard machine learning flow.

It focuses on developing a business hypothesis of the problem. This is very important as it translates the data exploration step into a hypothesis in a business language
It is a closed-loop flow by ensuring that model explanations match with the initial hypothesis.

Let me explain the process with an example.

Let me take an example of a telco company. The company collects data related to customer demographics, services that customer has, as well as if the customer churned or not. Sample data is shown below.

Telecommunication churn dataset (image by author)

Using this data, we can develop a machine-learning model which can be used to predict customers who are likely to churn. The target variable is Churn, which we need to predict. The other columns are called categorical (non-numeric) features and numeric features.

Let us now go through the steps required to develop a predictive model. Using this data, we can develop a machine-learning model which can be used to predict customers who are likely to churn.

We will go through the following steps of the machine-learning magic triangle.

Data Exploration and Hypothesis: In this step, we will explore the data and develop a hypothesis on why customers are churning.

Machine Learning: We will train a machine learning model which will be able to predict if the customer will churn or not.

Model Explanation: We will understand the machine learning model by predicting know churn cases and explaining the results. The explanation can be checked with the initial hypothesis.

Data Exploration and Hypothesis

Data exploration helps to develop an understanding of why customers are churning. One of the helpful data exploration techniques is analyzing categorical (non-numeric) vs the target variable (churn).

Shown below is an example of gender vs churn/no churn as well as Tech Support vs churn/no churn. For the gender, we see that the proportion of females vs males is equivalent for churners and non-churners. This means that gender does not have an impact on churning.

However, for TechSupport, we see that customers who do not have Tech support have higher churn. So Tech Support is an important feature in determining churn.

Categorical (non-numeric) vs the target variable (churn) - image by author — Categorical (non-numeric) vs the target variable (churn) – image by author

Shown below is an analysis of all categorical variables. You will observe that different services (other than phone service) can help differentiate between churners and non-churners

All categorical (non-numeric) vs the target variable (churn) - image by author — All categorical (non-numeric) vs the target variable (churn) – image by author

We can do the same analysis for numeric variables using a box plot visualization. Shown below is an example of the Tenure vs Target variable. We can observe that customers who have less tenure are likely to churn.

The numeric variable vs target (image by author)

Shown below is an analysis of all numeric variables. You will observe that high monthly charges and low total charges are also reasons to churn. High total charges can also mean high tenure.

All Numeric variables vs target (image by author)

Based on the above data exploration, we can make the following hypothesis:

Customers who have high monthly charges and low tenure (new customers) tend to churn more. Also, these relatively new customers paying high monthly charges, do not have all services included in the high monthly charge. So they are probably churning as they are not getting value and services despite high monthly charges

Developing such textual explanations helps in describing the problem in business language. This can also help the business line leaders to understand the objective of the machine learning work.

Machine Learning model

Now let us train a machine learning model. There are various machine learning algorithms, and the one used here is XGBoost.

The model is trained using dependent variables (X) as all the demographic fields, types of services, and billing fields. The target variable (Y) is Churn. The machine learning model will try to make a relation between X variables and Y.

The data is split into the train (70% of data) and the test (30% of data). The machine learning model is developed on train data. The model is then used to predict churn for test data. The predicted churn and actual churn are then compared to test data. The result can be visualized as a confusion matrix as shown below.

The accuracy of both train and test data is good. So we can keep the XGBoost-based machine learning model.

The output of the machine learning model can be visualized as a confusion matrix as shown below.

Confusion Matrix - image by author — Confusion Matrix – image by author

Predicting Churn and Explaining predictions

It is now time to make predictions using the machine learning model. The predictions are made on a dataset called score dataset, which is a few records that are not the model training.

Let us now explain some predictions. For example, customer 6894-LFHLY is predicted to churn. The SHAP explainer will show a visualization as shown below.

A positive bar means the factor is contributing to churn. A negative bar means the factor is not contributing to churn.

We see that the top factors which are contributing to churn are Contract and Total charges. A contract is a monthly contract and total charges (75) are relatively low, indicating a low tenure. This explanation also matches the initial hypothesis.

Now let us take customer 9767-FFLEM, who is predicted not to churn. The explanation of the prediction is shown below. We see that Monthly charges and tenure are the top factors that are preventing customers not to churn. The tenure (38) is high and thus customer is not churning. This explanation also matches the initial hypothesis

SHAP Analysis for a non-churning customer (image by author)

You will observe that we closed the loop between hypothesis and model explanation. This gives a purpose to the machine learning flow and helps in ensuring that the results match with business explaination of the problem

That is the power of the machine-learning magic triangle!

Conclusion

The standard machine learning flow is focused on data science development and misses out on developing a business hypothesis of the problem.

The machine learning triangle helps to overcome the inefficiencies of the standard flow by focussing on developing a business hypothesis of the problem as well as closing the loop between predictions and hypothesis

Dataset citation

The telecommunication dataset is available here. Both commercial and non-commercial use of it is permitted.

Telco customer churn (11.1.3+)

Please subscribe to stay informed whenever I release a new story.

Get an email whenever Pranay Dave publishes.

You can also join Medium with my referral link

Join Medium with my referral link – Pranay Dave

Additional Resources

Website

You can visit my website to make analytics with zero coding. https://experiencedatascience.com

Youtube channel

Here is a link to my YouTube channel https://www.youtube.com/c/DataScienceDemonstrated

Machine Learning Magic Triangle

Data Exploration and Hypothesis

Machine Learning model

Predicting Churn and Explaining predictions

Conclusion

Dataset citation

Additional Resources

Website

Youtube channel

Related Articles

Implementing Convolutional Neural Networks in TensorFlow

What Do Large Language Models “Understand”?

How to Forecast Hierarchical Time Series

Hands-on Time Series Anomaly Detection using Autoencoders, with Python

3 AI Use Cases (That Are Not a Chatbot)

Solving a Constrained Project Scheduling Problem with Quantum Annealing

Back To Basics, Part Uno: Linear Regression and Cost Function