Stacked Data Exploration - a new and advanced way to explore your data

Get started with limitless possibilities of stacked data exploration

Apr 2, 2022

6 min read

"Innovation is taking things that exist and using them in a new way" – Tom Freston, Co-founder of MTV

In this story, let me introduce you to stacked data exploration, which is a new and advanced way to explore your data. As per my research, the term stacked data exploration does not exist yet. So consider this story as the primer on this very interesting subject.

What is Stacked Data Exploration

Stacked Data exploration is where you combine different data exploration techniques resulting in more advanced data exploration results. The output of one data exploration technique becomes an input to the next technique. The combined results are generally more powerful than the individual techniques.

Stacked Data Exploration (image by author)

Why is this useful

Stacking has proven to be very useful during machine learning where a learner is trained to combine the individual learners. You can take a similar concept and apply it to data exploration.

During the data exploration phase, generally, data scientist uses data exploration technique individually. Histograms, Correlation Matrix, Dimensionality reduction, Clustering, etc… are all used individually to explore the data. If individually they can give powerful results, imagine what they can do when we combine them together!

Let us see this in action!

Stacked Data Exploration Example

Now let us see stacked data exploration in action using an example. Let us take a dataset on telecommunication company customers. The dataset has demographic information, services, billing information, and if the customer has churned or not.

Telecommunication churn dataset (image by author)

In this example, we will attempt the following stacked data exploration.

Stacked Data Exploration Example (image by author)

Here is a description of what each of these steps does. The final result will be revealed at the end.

Step 1 – Dimensionality Reduction (TSNE)

In this step, we will reduce the high dimensional data to 2-dimensions. This will help us visualize the data exploration results in a better way. This step will use TSNE (t-distributed stochastic neighbor embedding), as it does a very good job in keeping points that are close in high dimension, also close to each other in lower dimension space.

Here is the result of TSNE applied to the telecommunication dataset, where we reduce the data to two dimensions.

Result of TSNE reducing data to two dimensions (image by author)

Each point represents a customer. We can go one step further and color the points based on field customer churn.

Result of TSNE with information on churn (image by author)

Now we will take the result of TSNE and input it into the next step of clustering.

Step 2— Clustering (DBSCAN)

In the previous step, we can observe nice cluster formation. We can take advantage of this fact and use a clustering algorithm on the TSNE output. This will help us assign a cluster number to each of the visually formed clusters in the visualization above.

The clustering technique used here is DBSCAN (Density-Based Spatial Clustering of Applications with Noise). The advantage of this technique is that it does not require specifying the number of clusters beforehand. Here is the result of DBSCAN applied to TSNE output.

We can observe 5 clusters clearly identified. As DBSCAN is a density-based technique, the clusters correspond to the dense regions. The cluster on the extreme right is a not too dense cluster. So we can ignore it for time being.

Insights after 2 stacking steps

Just with two steps in the data exploration stacking process, we have some very interesting insights such as:

We observe nice cluster formation and we have been able to tag the densely formed clusters. This signifies that customers can be segmented into separate groups. This insight could be very useful.
In each of the dense clusters, there is no clear separation of churn vs non-churn customers. This means that if you are using machine learning to predict churners, you will require a complex algorithm to separate churners from non-churners.

Step 3 – Machine learning to interpret the clusters (Decision Tree)

Let us go to the next level in stacked data exploration. We can try to interpret each of the clusters to see what differentiates churners from non-churners. So in this step 3, we run a decision tree for each of the clusters.

Result of TSNE+DBSCAN + Decision Tree (image by author)

There are 5 decision tree which are calculated. However for simplicity, only the decision tree for cluster_3 and cluster_4 are shown below.

We can observe that for cluster_3, the most important field which differentiates churners and non-churners is Total Charges. This means that the predicted churners in cluster_3 are sensitive to total charges.

For cluster_4, the most important field which differentiates churners and non-churners is Contract and Tenure. The predicted churners in this cluster are those having low tenure and having monthly contract.

Using the results from stacked data exploration

You can use the results obtained till now in following ways:

Customer Segmentation Messaging: You can use the segments created above for customer segmentation and sending any marketing messages to the customers to avoid churn. The message can be fine-tuned based on each segment.

For example, as the predicted churners in cluster_3 are sensitive to total charges, the focus should be on value they are getting and thus justifying the charges.

For cluster_4, as the predicted churners have low tenure and monthly contract, the messaging should focus on advantages of long term contract, with an objective of converting monthly contract to yearly contracts.

Improving Machine learning model: If you are developing a machine learning model for predicting churn, it could be useful to train a model for each cluster rather than one single model. As the underlying reasons for churn are different for each cluster, you will be get better overall results.

The figure below show confusion matrix for one model approach, as well as confusion matrix for multi-model approach. In the multi-model approach, one machine learning classifier is trained for each of the cluster above.

With the multi-model approach, there is increase in true-positives as well as reduction on false-positives.

Confusion Matrix - One model vs multi-model — Confusion Matrix – One model vs multi-model

Conclusion

Stacked data exploration is an advanced way to explore your data. The results are powerful collectively compared to individual data exploration technique. In this story, I gave you an example of stacked data exploration. However there are limitless ways in you can combine different data exploration techniques.

Now it is your turn to come out with your own way of stacking different data exploration techniques ! You can comment on this story on which technique you have used.

Dataset citation

The telecommunication dataset is available here. Both commercial and non-commercial use of it is permitted.

Telco customer churn (11.1.3+)

Please subscribe in order to stay informed whenever I release a new story.

Get an email whenever Pranay Dave publishes.

You can also join Medium with my referral link

Join Medium with my referral link – Pranay Dave

Additional Resources

Website

You can visit my website to make analytics with zero coding. https://experiencedatascience.com

Youtube channel

Here is a link to my YouTube channel https://www.youtube.com/c/DataScienceDemonstrated

Written By

Pranay Dave

See all from Pranay Dave

Artificial Intelligence, Data Exploration, Data Science, Data Visualization, Machine Learning

Share This Article

Stacked Data Exploration - a new and advanced way to explore your data

What is Stacked Data Exploration

Why is this useful

Stacked Data Exploration Example

Step 1 – Dimensionality Reduction (TSNE)

Step 2— Clustering (DBSCAN)

Insights after 2 stacking steps

Step 3 – Machine learning to interpret the clusters (Decision Tree)

Using the results from stacked data exploration

Conclusion

Dataset citation

Additional Resources

Website

Youtube channel

Related Articles

Implementing Convolutional Neural Networks in TensorFlow

What Do Large Language Models “Understand”?

How to Forecast Hierarchical Time Series

Hands-on Time Series Anomaly Detection using Autoencoders, with Python

3 AI Use Cases (That Are Not a Chatbot)

Solving a Constrained Project Scheduling Problem with Quantum Annealing

Back To Basics, Part Uno: Linear Regression and Cost Function