Algorithms | Towards Data Science

The Machine Learning “Advent Calendar” Day 15: SVM in Excel

angela shi — Mon, 15 Dec 2025 19:41:01 +0000

Instead of starting with margins and geometry, this article builds the Support Vector Machine step by step from familiar models. By changing the loss function and reusing regularization, SVM appears naturally as a linear classifier trained by optimization. This perspective unifies logistic regression, SVM, and other linear models into a single, coherent framework.

The post The Machine Learning “Advent Calendar” Day 15: SVM in Excel appeared first on Towards Data Science.

The Machine Learning “Advent Calendar” Day 10: DBSCAN in Excel

angela shi — Wed, 10 Dec 2025 16:30:00 +0000

DBSCAN shows how far we can go with a very simple idea: count how many neighbors live close to each point.
It finds clusters and marks anomalies without any probabilistic model, and it works beautifully in Excel.
But because it relies on one fixed radius, HDBSCAN is needed to make the method robust on real data.

The post The Machine Learning “Advent Calendar” Day 10: DBSCAN in Excel appeared first on Towards Data Science.

The Machine Learning “Advent Calendar” Day 7: Decision Tree Classifier

angela shi — Sun, 07 Dec 2025 14:30:00 +0000

In Day 6, we saw how a Decision Tree Regressor finds its optimal split by minimizing the Mean Squared Error.
Today, for Day 7 of the Machine Learning "Advent Calendar", we switch to classification. With just one numerical feature and two classes, we explore how a Decision Tree Classifier decides where to cut the data, using impurity measures like Gini and Entropy.
Even without doing the math, we can visually guess possible split points. But which one is best? And do impurity measures really make a difference? Let us build the first split step by step in Excel and see what happens.

The post The Machine Learning “Advent Calendar” Day 7: Decision Tree Classifier appeared first on Towards Data Science.

The Machine Learning “Advent Calendar” Day 6: Decision Tree Regressor

angela shi — Sat, 06 Dec 2025 14:30:00 +0000

During the first days of this Machine Learning Advent Calendar, we explored models based on distances. Today, we switch to a completely different way of learning: Decision Trees.
With a simple one-feature dataset, we can see how a tree chooses its first split. The idea is always the same: if humans can guess the split visually, then we can rebuild the logic step by step in Excel.
By listing all possible split values and computing the MSE for each one, we identify the split that reduces the error the most. This gives us a clear intuition of how a Decision Tree grows, how it makes predictions, and why the first split is such a crucial step.

The post The Machine Learning “Advent Calendar” Day 6: Decision Tree Regressor appeared first on Towards Data Science.

The Machine Learning “Advent Calendar” Day 4: k-Means in Excel

angela shi — Thu, 04 Dec 2025 16:30:00 +0000

How to implement a training algorithm that finally looks like “real” machine learning

The post The Machine Learning “Advent Calendar” Day 4: k-Means in Excel appeared first on Towards Data Science.

The Machine Learning “Advent Calendar” Day 3: GNB, LDA and QDA in Excel

angela shi — Wed, 03 Dec 2025 16:30:00 +0000

From local distance to global probability

The post The Machine Learning “Advent Calendar” Day 3: GNB, LDA and QDA in Excel appeared first on Towards Data Science.

The Machine Learning “Advent Calendar” Day 1: k-NN Regressor in Excel

angela shi — Mon, 01 Dec 2025 19:52:19 +0000

This first day of the Advent Calendar introduces the k-NN regressor, the simplest distance-based model. Using Excel, we explore how predictions rely entirely on the closest observations, why feature scaling matters, and how heterogeneous variables can make distances meaningless. Through examples with continuous and categorical features, including the California Housing and Diamonds datasets, we see the strengths and limitations of k-NN, and why defining the right distance is essential to reflect real-world structure.

The post The Machine Learning “Advent Calendar” Day 1: k-NN Regressor in Excel appeared first on Towards Data Science.

The Machine Learning and Deep Learning “Advent Calendar” Series: The Blueprint

angela shi — Sun, 30 Nov 2025 15:00:00 +0000

Opening the black box of ML models, step by step, directly in Excel

The post The Machine Learning and Deep Learning “Advent Calendar” Series: The Blueprint appeared first on Towards Data Science.

The Greedy Boruta Algorithm: Faster Feature Selection Without Sacrificing Recall

Nicolas Vana — Sun, 30 Nov 2025 13:00:00 +0000

A modification to the Boruta algorithm that dramatically reduces computation while maintaining high sensitivity

The post The Greedy Boruta Algorithm: Faster Feature Selection Without Sacrificing Recall appeared first on Towards Data Science.

Implementing the Fourier Transform Numerically in Python: A Step-by-Step Guide

JUNIOR JUMBONG — Tue, 21 Oct 2025 12:30:00 +0000

What if the FFT functions in NumPy and SciPy don’t actually compute the Fourier transform you think they do?

The post Implementing the Fourier Transform Numerically in Python: A Step-by-Step Guide appeared first on Towards Data Science.

Machine Learning Meets Panel Data: What Practitioners Need to Know

Marco Letta — Fri, 17 Oct 2025 17:15:40 +0000

How to avoid overestimating machine learning models’ performance, usefulness, and real-world applicability due to hidden data leakage

The post Machine Learning Meets Panel Data: What Practitioners Need to Know appeared first on Towards Data Science.

From Genes to Neural Networks: Understanding and Building NEAT (Neuro-Evolution of Augmenting Topologies) from Scratch

Carlos Redondo — Mon, 11 Aug 2025 18:03:44 +0000

Practical Neuroevolution: Reproducing NEAT’s Innovations and Code Walkthrough

The post From Genes to Neural Networks: Understanding and Building NEAT (Neuro-Evolution of Augmenting Topologies) from Scratch appeared first on Towards Data Science.

The Five-Second Fingerprint: Inside Shazam’s Instant Song ID

Ashton Gribble — Tue, 08 Jul 2025 01:35:00 +0000

How Shazam recognizes songs in seconds

The post The Five-Second Fingerprint: Inside Shazam’s Instant Song ID appeared first on Towards Data Science.

A Gentle Introduction to Backtracking

Chinmay Kakatkar — Mon, 30 Jun 2025 18:51:47 +0000

Conceptual overview and hands-on examples

The post A Gentle Introduction to Backtracking appeared first on Towards Data Science.

Adding Training Noise To Improve Detections In Transformers

Uri Almog — Mon, 28 Apr 2025 17:56:52 +0000

Denoising, explained

The post Adding Training Noise To Improve Detections In Transformers appeared first on Towards Data Science.

Algorithm Protection in the Context of Federated Learning

Bartek Szubstarski — Fri, 21 Mar 2025 04:32:38 +0000

A pragmatic look into protecting algorithms and models deployed into real-world federated analysis and learning settings in healthcare.

The post Algorithm Protection in the Context of Federated Learning appeared first on Towards Data Science.

Recursive Walks down User Referral Trees

Matthew Senick — Wed, 15 Jan 2025 15:02:13 +0000

Measuring the total influence of users in a user referral program by traversing indirect referrals

The post Recursive Walks down User Referral Trees appeared first on Towards Data Science.

Understanding the Optimization Process Pipeline in Linear Programming

Himalaya Bir Shrestha — Fri, 27 Dec 2024 11:01:35 +0000

An introduction to the backend and frontend processes in linear programming, including the mathematical programming system (mps) files

The post Understanding the Optimization Process Pipeline in Linear Programming appeared first on Towards Data Science.

Core AI For Any Rummy Variant

Iheb Rachdi — Sat, 09 Nov 2024 14:38:00 +0000

Step by Step guide to a Rummy AI

The post Core AI For Any Rummy Variant appeared first on Towards Data Science.

Cyclic Partition: An Up to 1.5x Faster Partitioning Algorithm

Tigran Hayrapetyan — Thu, 10 Oct 2024 17:28:03 +0000

A sequence partitioning algorithm that does minimal rearrangements of values

The post Cyclic Partition: An Up to 1.5x Faster Partitioning Algorithm appeared first on Towards Data Science.