Why Excel Is the Best Way to Truly Understand Machine Learning Models
The Illusion of Simplicity in Modern Training
With scikit-learn, everything feels easy.
And the training process is always done with the seemingly same method fit. So we get used to this idea that training any model is similar and simple.
With autoML, Grid search, and Gen AI, “training” machine learning models can be done with a simple “prompt”.
But the reality is that, when we do model.fit, behind each model, the process can be very different. And each model itself works very differently with the data.
Two Opposite Trends in Machine Learning
We can observe two very different trends, almost in two opposite directions:
- On the one hand, we train, use, manipulate, and predict with models (such as generative models) more and more complex.
- On the other hand, we are not always capable of explaining simple models (such as linear regression, linear discriminant classifier), and recalculating results by hand.
Why Understanding Models Matters
It is important to understand the models we use. And the best way to understand them is to implement them ourselves. Some people do it with Python, R, or other programming languages. But there is still a barrier for those who do not program. And nowadays, understanding AI is essential for everyone. Moreover, using a programming language can also hide some operations behind already existing functions. And it is not visually explained, meaning that each operation is not clearly shown, since the function is coded then run, to only give the results.
So the best tool to explore, in my opinion, is Excel.
With the formulas that clearly show every step of the calculations.
Why Excel Is the Best Tool to Learn ML
In fact, when we receive a dataset, most non-programmers will open it in Excel to understand what is inside. This is very common in the business world.
Even many data scientists, myself included, use Excel to take a quick look. And when it is time to explain the results, showing them directly in Excel is often the most effective way, especially in front of executives.
In Excel, everything is visible. There is no “black box”. You can see every formula, every number, every calculation.
This helps a lot to understand how the models really work, without shortcuts.
Also, you do not need to install anything. Just a spreadsheet.
A Series of Articles to Learn ML and DL in Excel
I will publish a series of articles about how to understand and implement, and visualize machine learning and deep learning models in Excel.
For the “Advent Calendar”, I will publish one article per day.
Who is this series for?
Different people use machine learning in very different ways. But all of them face the same problem: understanding what is really happening inside the models. This series shows how Excel can give each audience the clarity they need.
For Students: Making Complex Formulas Finally Make Sense
For students who are studying, I think that these articles offer a practical point of view. It is to make sense of complex formulas.
Students often learn machine learning with a lot of algebra, matrices, and probability.
But they rarely see the calculations happen.
Excel changes this.
Every step becomes visible.
Every formula becomes concrete.
A model that looked abstract suddenly becomes intuitive.
For students, coding skills are great. But in the business world, Python alone is not enough to communicate. People understand Excel instantly. It becomes the bridge that connects your technical skills with real business understanding.
For ML and AI Developers: Opening the Black Box Behind model.fit
For ML or AI developers, who, sometimes, have not studied theory. but now, without complicated algebra, probability, or statistics, you can open the black box behind model.fit. Because for all models, you do model.fit. But in reality, the models can be very different.
Excel shows the hidden steps.
No heavy math.
No advanced notation.
Just simple formulas, laid out one by one.
So you can finally see why two models both use .fit but behave in totally different ways.
For developers, Python is perfect to run models. But in the business world, it can make you look like a geek from another planet. Excel helps you communicate your ideas clearly, and it builds the bridge with non-technical teams.
For Managers: Intuition First, Then Better Decisions
This is also for managers who may not have all the technical background, but to whom Excel will give all the intuitive ideas behind the models. Therefore, combined with your business expertise, you can better judge if machine learning is really necessary, and which model might be more suitable.
For Teachers: A Tool That Makes Learning Simple
Teachers need ways to explain difficult concepts.
Excel gives students an immediate understanding of:
- what a model calculates
- why it works this way
- how each variable influences the result
It is a visual, step-by-step classroom.
For Beginners: A Friendly Starting Point
For beginners, this is the easiest way to start.
You often hear advice like: “If you want to learn Machine Learning, you must first study probability, statistics, linear algebra…”
No.
The prerequisite for this series is much simpler: you only need to know addition, multiplication, and how to open a spreadsheet with a tiny dataset (10 rows, or even fewer).
Because with such a small dataset, you can actually follow every calculation. You can understand where each number comes from. And you can add your own columns to test your ideas as you go.
You do not need to program, install packages, or configure anything.
You just open a spreadsheet.
You can change a number and instantly see how the model reacts.
You can add a feature, remove a row, modify a label, reorganize your sheet, and immediately observe the consequences.
This kind of direct experimentation is impossible to see in most programming environments.
And this is why spreadsheets are so powerful for beginners:
they give you the right intuition from the very beginning, and you learn by doing, not by memorizing formulas.
For Seniors: Excel Is Only a Pretext
For seniors, because Excel is a pretext, and I will revel many untaught lessons
For experienced people, Excel becomes a way to reveal:
- the hidden logic behind models
- the untaught links between algorithms
- the common structure that connects many techniques
- the lessons that are usually missed in traditional ML courses
Excel is simple, but what it reveals is deep.
For Those Who Also Want Better Excel Skills
For people who want to use Excel for data manipulation and organisation, this series is also helpful.
It is not the main goal, but it is an opportunity to practise Excel with simple formulas: IF, SUM, SUMIF, SUMIFS, SUMPRODUCT, AVERAGE, VLOOKUP, even VLOOKUP with multiple conditions and more.
You will use them in a real context, and understand how to structure a sheet so the formulas can scale.
Because in Excel, people (myself included) tend to calculate wherever they want. Since cells are everywhere, formulas end up everywhere.
My main concern is actually to organise the calculations: sometimes in a visual way, sometimes so that formulas remain dragable.
The visualisation part is also quite painful to build. I actually use Google Sheets, and the plotting logic is not the same as in Excel. So some charts may not work perfectly when you download the file and open it in Excel, because the underlying principles are different. I still do not fully understand how Excel handles this, but in Google Sheets, I think I understand it much better now.
In Summary, the real goal
These articles exist for one reason:
To help everyone truly understand models, how they are trained, how to interpret them, and how different algorithms connect to each other.
Excel is the tool.
Understanding is the goal.
So, in summary, It is to better understand the models, the training of the models, the interpretability of the models, and the links between different models.
And remember, in the business world, Excel is the universal language. Python impresses technical people, but Excel is what makes everyone else understand you. It is the bridge between your expertise and the people who will actually use your insights.
Structure of the articles
Why Classic ML Categories Are Not Enough
From a practitioner’s point of view, we usually categorize the models in the following two categories: supervised learning and unsupervised learning.
Then for supervised learning, we have regression and classification. And for unsupervised learning, we have clustering and dimensionality reduction.

But you surely already notice that some algorithms may share the same or similar approach, such as KNN classifier vs. KNN regressor, decision tree classifier vs. decision tree regressor, linear regression vs. “linear classifier”.
A regression tree and linear regression have the same objective, that is, to do a regression task. But when you try to implement them in Excel, you will see that the regression tree is very close to the classification tree. And linear regression is closer to a neural network.
And sometimes people confuse K-NN with K-means. Some may argue that their goals are completely different, and that confusing them is a beginner’s mistake. BUT, we also have to admit that they share the same approach of calculating distances between the data points. So there is a relationship between them.
The same goes for isolation forest, as we can see that in random forest there also is a “forest”.
Another Organization: Models by Theoretical Approaches
So I will organize all the models from a theoretical point of view. There are three main approaches, and we will clearly see how these approaches are implemented in a very different way in Excel.
This overview will help us to navigate through all the different models, and connect the dots between many of them.

- For distance-based models, we will calculate local or global distances, between a new observation and the training dataset.
- For tree based models, we have to define the splits or rules that will be used to make categories of the features.
- For math functions, the idea is to apply weights to features. And to train the model, the gradient descent is mainly used.
- For deep learning models, we will that the main point is about feature engineering, to create adequate representation of the data.
The Key Questions We Will Answer for Every Model
For each model, we will try to answer these questions.
General questions about the model:
- What is the nature of the model?
- How is the model trained?
- What are the hyperparameters of the model?
- How can the same model approach be used for regression, classification, or even clustering?
How features are modelled:
- How are categorical features handled?
- How are missing values managed?
- For continuous features, does scaling make a difference?
- How do we measure the importance of one feature?
How can we qualify the importance of the features? This question will also be discussed. You may know that packages like LIME and SHAP are very popular, and they are model-agnostic. But the truth is that each model behaves quite differently, and it is also interesting, and important to interpret directly with the model.
The Hidden Links Between Models
Each model will be in a separate article, but we will discuss the links with other models.
We will also discuss the relationships between different models. Since we truly open each “black box”, we will also know how to make theoretical improvement to some models.
- KNN and LDA (Linear Discriminant Analysis) are very close. The first uses a local distance, and the latter uses a global distance.
- Gradient boosting is the same as gradient descent, only the vector space is different.
- Linear regression is also a classifier.
- Label encoding can be, sort of, used for categorical feature, and it can be very useful, very powerful, but you have to choose the “labels” wisely.
- SVM is very close to linear regression, even closer to ridge regression.
- LASSO and SVM use one similar principle to select features or data points. Do you know that the second S in LASSO is for selection?
For each model, we also will discuss one particular point that most traditional courses will miss. I call it the untaught lesson of the machine learning model.
What We Will Not Cover: Hyperparameter Tuning
In these articles, we will focus only on how the models work and how they are trained. We will not discuss hyperparameter tuning, because the process is essentially the same for every model. We typically use grid search.

Get the Excel Files on Ko-fi and Support the Project
Below there will be a list, which I will update by publishing one article per day, beginning December 1st!
And you can find all the Excel files in this Kofi link. If you want to support my work, it means a lot to me. The price will increase during the month, so early supporters get the best value.

Part A. Distance-based models
We start with models built on distances between points. They are intuitive, visual, and perfect for Excel. You will see something as simple as “who are my closest neighbors?”
Day 1: KNN Regressor
So, to begin, with Machine Learning, this is THE simplest model. So simple that we may say: is this really machine learning?
Behind model.fit, we will see that the fit does not NOTHING.
The untaught lesson: standardization or min-max scaling is not always the right way to handle feature scales?
Day 2: KNN Classifier
KNN classifier works in a very similar way as KNN regressor.
What is interesting is that we can move from a local distance (Euclidean distance to nearby points) to a global distance, which generalizes the usual k-NN idea.
One drawback is that there is no variable weighting.
We will also see the links with Nearest Centroids, Gaussian Naive Bayes…
Day 3: GNB, LDA and QDA
Linear Discriminant Analysis and Quadratic Discriminant Analysis are very close to K-NN Classifier. Both introduce a form of “global distance”, known as the Mahalanobis distance, instead of the local Euclidean distance used in k-NN.
One more thing: the Kernel Density Estimator can also be used to customize the shape of the distribution. Keep this idea in mind, because we will see later how it appears again in other models.
Day 4: k-means
K-means is an unsupervised model that also uses distances. Sometimes it is confused with k-NN, and you will soon see why.
Spoiler: the two k do not mean the same thing.
And k-means is actually closer to another model we already saw.
Day 5: GMM (Gaussian Mixture Model)
GMM is the natural extension of k-means.
K-means assigns each point to one cluster with Euclidean Distance. GMM assigns probabilities.
Clusters are no longer just centers, they are Gaussian distributions with variance (and even covariance).
In Excel, you clearly see how this makes clusters more flexible, and why formulas quickly become longer.
Day 9: LOF – Local Outlier Factor
LOF detects anomalies using local density.
A point is abnormal not because it is far away, but because it is much less dense than its neighbors.
In Excel, every step is visible: distances, neighbors, densities, and the final anomaly score.
Day 10: DBSCAN
DBSCAN clusters by density, not by distance to a center.
Dense areas become clusters. Sparse points become noise.
It naturally finds clusters of any shape and detects outliers at the same time.
Part B. Tree-based models
These models do not use distances or probabilities.
They use rules. They cut features into pieces. This family is all about splits.
By stacking these rules one after another, we build what we call a Decision Tree.
The model itself is nothing more than a sequence of rules. And these rules are learned one by one, using a simple criterion that selects the best split at each step.
The hyperparameters then decide how many rules the model is allowed to add. In other words, how deep the tree can grow.
Day 6: Decision Tree Regressor
A Decision Tree Regressor tests every possible split and computes the error (MSE).
In Excel, you will see a simpler way to express this MSE.
Everything is just loops: test, compute, compare.
When you build the synthetic table, every split and every decision becomes visible.
With multiple features, nothing changes.
The same loop is repeated, and Excel shows clearly how all splits can be combined into one table.
Day 7: Decision Tree Classifier
Same structure as regression trees, but with a different objective.
Instead of minimizing MSE, we minimize impurity (Gini, or any valid alternative).
The model just tests all possible splits, measures impurity, and keeps the best one.
Excel makes this very clear: trees do not “think”, they try everything.
Multi-class classification is conceptually straightforward.
In Excel, it becomes simple as soon as the data is well structured and formulas can be dragged.
Day 8: Isolation Forest
Isolation Forest detects anomalies by isolating points instead of modeling normal behavior.
Anomalies are easier to separate, so they get isolated in fewer splits.
The model builds random trees and measures how quickly each point is isolated.
In Excel, this makes anomaly detection very concrete: random splits, path lengths, and anomaly scores become easy to visualize.
Part C. Weight-based models
Weight-based models all share the same core idea.
They combine the input features using weights to produce a score.
Each feature has a coefficient.
The model learns these coefficients by minimizing a loss function that measures how wrong the predictions are.
Training is simply an optimization problem.
Compute the loss, compute its gradients with respect to the weights, and update them step by step.
When you build these models in Excel, this becomes very clear.
There is no magic, just scores, losses, and gradients repeated in a loop.
Day 11: Linear Regression
The simplest weight-based model.
But it is essential to master the basics: gradient descent, handling categorical features, and the fact that linear regression can also be used as a classifier.
This is also where an important distinction appears:
parametric generative models versus parametric discriminative models.
Day 12: Logistic Regression
Same linear score, but passed through a sigmoid to produce probabilities.
The loss changes, the logic stays the same.
Excel makes the link between score, probability, loss, and gradient very clear.
Day 13: Ridge and Lasso Regression
Linear regression with a penalty on the coefficients.
Nothing changes in the model itself, only in the objective function.
In Excel, you clearly see how the penalty shrinks the weights.
Day 14: Softmax Regression
Logistic regression extended to multiple classes.
Several linear scores are computed in parallel and normalized into probabilities.
In Excel, the multiclass logic becomes surprisingly simple.







