angela shi, Author at Towards Data Science https://towardsdatascience.com Publish AI, ML & data-science insights to a global community of data professionals. Mon, 15 Dec 2025 20:49:55 +0000 en-US hourly 1 https://wordpress.org/?v=6.8.3 https://towardsdatascience.com/wp-content/uploads/2025/02/cropped-Favicon-32x32.png angela shi, Author at Towards Data Science https://towardsdatascience.com 32 32 The Machine Learning “Advent Calendar” Day 15: SVM in Excel https://towardsdatascience.com/the-machine-learning-advent-calendar-day-15-svm-in-excel/ Mon, 15 Dec 2025 19:41:01 +0000 https://towardsdatascience.com/?p=607912 Instead of starting with margins and geometry, this article builds the Support Vector Machine step by step from familiar models. By changing the loss function and reusing regularization, SVM appears naturally as a linear classifier trained by optimization. This perspective unifies logistic regression, SVM, and other linear models into a single, coherent framework.

The post The Machine Learning “Advent Calendar” Day 15: SVM in Excel appeared first on Towards Data Science.

]]>
The Machine Learning “Advent Calendar” Day 14: Softmax Regression in Excel https://towardsdatascience.com/the-machine-learning-advent-calendar-day-14-softmax-regression-in-excel/ Sun, 14 Dec 2025 18:12:00 +0000 https://towardsdatascience.com/?p=607910 Softmax Regression is simply Logistic Regression extended to multiple classes.

By computing one linear score per class and normalizing them with Softmax, we obtain multiclass probabilities without changing the core logic.

The loss, the gradients, and the optimization remain the same.
Only the number of parallel scores increases.

Implemented in Excel, the model becomes transparent: you can see the scores, the probabilities, and how the coefficients evolve over time.

The post The Machine Learning “Advent Calendar” Day 14: Softmax Regression in Excel appeared first on Towards Data Science.

]]>
The Machine Learning “Advent Calendar” Day 13: LASSO and Ridge Regression in Excel https://towardsdatascience.com/the-machine-learning-advent-calendar-day-13-lasso-and-ridge-regression-in-excel/ Sat, 13 Dec 2025 16:56:00 +0000 https://towardsdatascience.com/?p=607908 Ridge and Lasso regression are often perceived as more complex versions of linear regression. In reality, the prediction model remains exactly the same. What changes is the training objective. By adding a penalty on the coefficients, regularization forces the model to choose more stable solutions, especially when features are correlated. Implementing Ridge and Lasso step by step in Excel makes this idea explicit: regularization does not add complexity, it adds preference.

The post The Machine Learning “Advent Calendar” Day 13: LASSO and Ridge Regression in Excel appeared first on Towards Data Science.

]]>
The Machine Learning “Advent Calendar” Day 12: Logistic Regression in Excel https://towardsdatascience.com/the-machine-learning-advent-calendar-day-12-logistic-regression-in-excel/ Fri, 12 Dec 2025 17:15:00 +0000 https://towardsdatascience.com/?p=607901 In this article, we rebuild Logistic Regression step by step directly in Excel.
Starting from a binary dataset, we explore why linear regression struggles as a classifier, how the logistic function fixes these issues, and how log-loss naturally appears from the likelihood.
With a transparent gradient-descent table, you can watch the model learn at each iteration—making the whole process intuitive, visual, and surprisingly satisfying.

The post The Machine Learning “Advent Calendar” Day 12: Logistic Regression in Excel appeared first on Towards Data Science.

]]>
The Machine Learning “Advent Calendar” Day 11: Linear Regression in Excel https://towardsdatascience.com/the-machine-learning-advent-calendar-day-11-linear-regression-in-excel/ Thu, 11 Dec 2025 16:31:00 +0000 https://towardsdatascience.com/?p=607891 Linear Regression looks simple, but it introduces the core ideas of modern machine learning: loss functions, optimization, gradients, scaling, and interpretation.
In this article, we rebuild Linear Regression in Excel, compare the closed-form solution with Gradient Descent, and see how the coefficients evolve step by step.
This foundation naturally leads to regularization, kernels, classification, and the dual view.
Linear Regression is not just a straight line, but the starting point for many models we will explore next in the Advent Calendar.

The post The Machine Learning “Advent Calendar” Day 11: Linear Regression in Excel appeared first on Towards Data Science.

]]>
The Machine Learning “Advent Calendar” Day 10: DBSCAN in Excel https://towardsdatascience.com/the-machine-learning-advent-calendar-day-10-dbscan-in-excel/ Wed, 10 Dec 2025 16:30:00 +0000 https://towardsdatascience.com/?p=607882 DBSCAN shows how far we can go with a very simple idea: count how many neighbors live close to each point.
It finds clusters and marks anomalies without any probabilistic model, and it works beautifully in Excel.
But because it relies on one fixed radius, HDBSCAN is needed to make the method robust on real data.

The post The Machine Learning “Advent Calendar” Day 10: DBSCAN in Excel appeared first on Towards Data Science.

]]>
The Machine Learning “Advent Calendar” Day 9: LOF in Excel https://towardsdatascience.com/the-machine-learning-advent-calendar-day-9-lof-in-excel/ Tue, 09 Dec 2025 17:45:00 +0000 https://towardsdatascience.com/?p=607869 In this article, we explore LOF through three simple steps: distances and neighbors, reachability distances, and the final LOF score. Using tiny datasets, we see how two anomalies can look obvious to us but completely different to different algorithms. This reveals the key idea of unsupervised learning: there is no single “true” outlier, only definitions. Understanding these definitions is the real skill.

The post The Machine Learning “Advent Calendar” Day 9: LOF in Excel appeared first on Towards Data Science.

]]>
The Machine Learning “Advent Calendar” Day 8: Isolation Forest in Excel https://towardsdatascience.com/the-machine-learning-advent-calendar-day-8-isolation-forest-in-excel/ Mon, 08 Dec 2025 18:26:42 +0000 https://towardsdatascience.com/?p=607851 Isolation Forest may look technical, but its idea is simple: isolate points using random splits. If a point is isolated quickly, it is an anomaly; if it takes many splits, it is normal.

Using the tiny dataset 1, 2, 3, 9, we can see the logic clearly. We build several random trees, measure how many splits each point needs, average the depths, and convert them into anomaly scores. Short depths become scores close to 1, long depths close to 0.

The Excel implementation is painful, but the algorithm itself is elegant. It scales to many features, makes no assumptions about distributions, and even works with categorical data. Above all, Isolation Forest asks a different question: not “What is normal?”, but “How fast can I isolate this point?”

The post The Machine Learning “Advent Calendar” Day 8: Isolation Forest in Excel appeared first on Towards Data Science.

]]>
The Machine Learning “Advent Calendar” Day 7: Decision Tree Classifier https://towardsdatascience.com/the-machine-learning-advent-calendar-day-7-decision-tree-classifier/ Sun, 07 Dec 2025 14:30:00 +0000 https://towardsdatascience.com/?p=607847 In Day 6, we saw how a Decision Tree Regressor finds its optimal split by minimizing the Mean Squared Error.
Today, for Day 7 of the Machine Learning "Advent Calendar", we switch to classification. With just one numerical feature and two classes, we explore how a Decision Tree Classifier decides where to cut the data, using impurity measures like Gini and Entropy.
Even without doing the math, we can visually guess possible split points. But which one is best? And do impurity measures really make a difference? Let us build the first split step by step in Excel and see what happens.

The post The Machine Learning “Advent Calendar” Day 7: Decision Tree Classifier appeared first on Towards Data Science.

]]>
The Machine Learning “Advent Calendar” Day 6: Decision Tree Regressor https://towardsdatascience.com/the-machine-learning-advent-calendar-day-6-decision-tree-regressor/ Sat, 06 Dec 2025 14:30:00 +0000 https://towardsdatascience.com/?p=607840 During the first days of this Machine Learning Advent Calendar, we explored models based on distances. Today, we switch to a completely different way of learning: Decision Trees.
With a simple one-feature dataset, we can see how a tree chooses its first split. The idea is always the same: if humans can guess the split visually, then we can rebuild the logic step by step in Excel.
By listing all possible split values and computing the MSE for each one, we identify the split that reduces the error the most. This gives us a clear intuition of how a Decision Tree grows, how it makes predictions, and why the first split is such a crucial step.

The post The Machine Learning “Advent Calendar” Day 6: Decision Tree Regressor appeared first on Towards Data Science.

]]>
The Machine Learning “Advent Calendar” Day 5: Gaussian Mixture Model in Excel https://towardsdatascience.com/the-machine-learning-advent-calendar-day-5-gmm-in-excel/ Fri, 05 Dec 2025 17:00:00 +0000 https://towardsdatascience.com/?p=607838 This article introduces the Gaussian Mixture Model as a natural extension of k-Means, by improving how distance is measured through variances and the Mahalanobis distance. Instead of assigning points to clusters with hard boundaries, GMM uses probabilities learned through the Expectation–Maximization algorithm – the general form of Lloyd’s method.

Using simple Excel formulas, we implement EM step by step in 1D and 2D, and we visualise how the Gaussian curves or ellipses move during training. The means shift, the variances adjust, and the shapes gradually settle around the true structure of the data.

GMM provides a richer, more flexible way to model clusters, and becomes intuitive once the process is made visible in a spreadsheet.

The post The Machine Learning “Advent Calendar” Day 5: Gaussian Mixture Model in Excel appeared first on Towards Data Science.

]]>
The Machine Learning “Advent Calendar” Day 4: k-Means in Excel https://towardsdatascience.com/the-machine-learning-advent-calendar-day-4-k-means-in-excel/ Thu, 04 Dec 2025 16:30:00 +0000 https://towardsdatascience.com/?p=607826 How to implement a training algorithm that finally looks like “real” machine learning

The post The Machine Learning “Advent Calendar” Day 4: k-Means in Excel appeared first on Towards Data Science.

]]>
The Machine Learning “Advent Calendar” Day 3: GNB, LDA and QDA in Excel https://towardsdatascience.com/the-machine-learning-advent-calendar-day-3-gnb-lda-and-qda-in-excel/ Wed, 03 Dec 2025 16:30:00 +0000 https://towardsdatascience.com/?p=607802 From local distance to global probability

The post The Machine Learning “Advent Calendar” Day 3: GNB, LDA and QDA in Excel appeared first on Towards Data Science.

]]>
The Machine Learning “Advent Calendar” Day 2: k-NN Classifier in Excel https://towardsdatascience.com/the-machine-learning-advent-calendar-day-2-k-nn-classifier-in-excel/ Tue, 02 Dec 2025 18:39:26 +0000 https://towardsdatascience.com/?p=607788 Exploring the k-NN classifier with its variants and improvements

The post The Machine Learning “Advent Calendar” Day 2: k-NN Classifier in Excel appeared first on Towards Data Science.

]]>
The Machine Learning “Advent Calendar” Day 1: k-NN Regressor in Excel https://towardsdatascience.com/day-1-k-nn-regressor-in-excel-how-distance-drives-prediction/ Mon, 01 Dec 2025 19:52:19 +0000 https://towardsdatascience.com/?p=607778 This first day of the Advent Calendar introduces the k-NN regressor, the simplest distance-based model. Using Excel, we explore how predictions rely entirely on the closest observations, why feature scaling matters, and how heterogeneous variables can make distances meaningless. Through examples with continuous and categorical features, including the California Housing and Diamonds datasets, we see the strengths and limitations of k-NN, and why defining the right distance is essential to reflect real-world structure.

The post The Machine Learning “Advent Calendar” Day 1: k-NN Regressor in Excel appeared first on Towards Data Science.

]]>
The Machine Learning and Deep Learning “Advent Calendar” Series: The Blueprint https://towardsdatascience.com/machine-learning-and-deep-learning-in-excel-advent-calendar-announcement/ Sun, 30 Nov 2025 15:00:00 +0000 https://towardsdatascience.com/?p=607760 Opening the black box of ML models, step by step, directly in Excel

The post The Machine Learning and Deep Learning “Advent Calendar” Series: The Blueprint appeared first on Towards Data Science.

]]>
Understanding Convolutional Neural Networks (CNNs) Through Excel https://towardsdatascience.com/understanding-convolutional-neural-networks-cnns-through-excel/ Mon, 17 Nov 2025 19:54:54 +0000 https://towardsdatascience.com/?p=607662 Deep learning is often seen as a black box. We know that it learns from data, but we rarely stop to ask how it truly learns.
What if we could open that box and watch each step happen right before our eyes?
With Excel, we can do exactly that, see how numbers turn into patterns, and how simple calculations become the foundation of what we call “deep learning.”
In this article, we will build a tiny Convolutional Neural Network (CNN) directly in Excel to understand, step by step, how machines detect shapes, patterns, and meaning in images.

The post Understanding Convolutional Neural Networks (CNNs) Through Excel appeared first on Towards Data Science.

]]>