Deep Dives | Towards Data Science

The Machine Learning “Advent Calendar” Day 15: SVM in Excel

angela shi — Mon, 15 Dec 2025 19:41:01 +0000

Instead of starting with margins and geometry, this article builds the Support Vector Machine step by step from familiar models. By changing the loss function and reusing regularization, SVM appears naturally as a linear classifier trained by optimization. This perspective unifies logistic regression, SVM, and other linear models into a single, coherent framework.

The post The Machine Learning “Advent Calendar” Day 15: SVM in Excel appeared first on Towards Data Science.

Spectral Community Detection in Clinical Knowledge Graphs

Silvia Onofrei — Fri, 12 Dec 2025 10:30:00 +0000

Introduction How do we identify latent groups of patients in a large cohort? How can we find similarities among patients that go beyond the well-known comorbidity clusters associated with specific diseases? And more importantly, how can we extract quantitative signals that can be analyzed, compared, and reused across different clinical scenarios? The information associated to […]

The post Spectral Community Detection in Clinical Knowledge Graphs appeared first on Towards Data Science.

A Realistic Roadmap to Start an AI Career in 2026

Sabrine Bendimerad — Tue, 09 Dec 2025 12:00:00 +0000

How to learn AI in 2026 through real, usable projects

The post A Realistic Roadmap to Start an AI Career in 2026 appeared first on Towards Data Science.

The Machine Learning “Advent Calendar” Day 8: Isolation Forest in Excel

angela shi — Mon, 08 Dec 2025 18:26:42 +0000

Isolation Forest may look technical, but its idea is simple: isolate points using random splits. If a point is isolated quickly, it is an anomaly; if it takes many splits, it is normal.

Using the tiny dataset 1, 2, 3, 9, we can see the logic clearly. We build several random trees, measure how many splits each point needs, average the depths, and convert them into anomaly scores. Short depths become scores close to 1, long depths close to 0.

The Excel implementation is painful, but the algorithm itself is elegant. It scales to many features, makes no assumptions about distributions, and even works with categorical data. Above all, Isolation Forest asks a different question: not “What is normal?”, but “How fast can I isolate this point?”

The post The Machine Learning “Advent Calendar” Day 8: Isolation Forest in Excel appeared first on Towards Data Science.

Optimizing PyTorch Model Inference on CPU

Chaim Rand — Mon, 08 Dec 2025 12:00:00 +0000

Flyin’ Like a Lion on Intel Xeon

The post Optimizing PyTorch Model Inference on CPU appeared first on Towards Data Science.

How to Climb the Hidden Career Ladder of Data Science

Greg Rafferty — Sun, 07 Dec 2025 16:00:00 +0000

The behaviors that get you promoted

The post How to Climb the Hidden Career Ladder of Data Science appeared first on Towards Data Science.

YOLOv1 Paper Walkthrough: The Day YOLO First Saw the World

Muhammad Ardi — Fri, 05 Dec 2025 14:00:00 +0000

A detailed walkthrough of the YOLOv1 architecture and its PyTorch implementation from scratch

The post YOLOv1 Paper Walkthrough: The Day YOLO First Saw the World appeared first on Towards Data Science.

On the Challenge of Converting TensorFlow Models to PyTorch

Chaim Rand — Fri, 05 Dec 2025 12:30:00 +0000

How to upgrade and optimize legacy AI/ML models

The post On the Challenge of Converting TensorFlow Models to PyTorch appeared first on Towards Data Science.

Build and Deploy Your First Supply Chain App in 20 Minutes

Samir Saci — Thu, 04 Dec 2025 15:00:00 +0000

A factory operator that discovered happiness by switching from notebook to streamlit - (Image Generated with GPT-5.1 by Samir Saci)

The post Build and Deploy Your First Supply Chain App in 20 Minutes appeared first on Towards Data Science.

The Architecture Behind Web Search in AI Chatbots

Ida Silfverskiöld — Thu, 04 Dec 2025 06:19:55 +0000

And what this means for generative engine optimization (GEO)

The post The Architecture Behind Web Search in AI Chatbots appeared first on Towards Data Science.

JSON Parsing for Large Payloads: Balancing Speed, Memory, and Scalability

Subha Ganapathi — Tue, 02 Dec 2025 15:30:00 +0000

Benchmarking JSON libraries for large payloads

The post JSON Parsing for Large Payloads: Balancing Speed, Memory, and Scalability appeared first on Towards Data Science.

The Machine Learning “Advent Calendar” Day 1: k-NN Regressor in Excel

angela shi — Mon, 01 Dec 2025 19:52:19 +0000

This first day of the Advent Calendar introduces the k-NN regressor, the simplest distance-based model. Using Excel, we explore how predictions rely entirely on the closest observations, why feature scaling matters, and how heterogeneous variables can make distances meaningless. Through examples with continuous and categorical features, including the California Housing and Diamonds datasets, we see the strengths and limitations of k-NN, and why defining the right distance is essential to reflect real-world structure.

The post The Machine Learning “Advent Calendar” Day 1: k-NN Regressor in Excel appeared first on Towards Data Science.

Why AI Alignment Starts With Better Evaluation

Hailey Quach — Mon, 01 Dec 2025 13:00:00 +0000

You can’t align what you don’t evaluate

The post Why AI Alignment Starts With Better Evaluation appeared first on Towards Data Science.

Neural Networks Are Blurry, Symbolic Systems Are Fragmented. Sparse Autoencoders Help Us Combine Them.

Xiaocong Yang — Thu, 27 Nov 2025 17:24:06 +0000

Neural and symbolic models compress the world in fundamentally different ways, and Sparse Autoencoders (SAEs) offer a bridge to connect them.

The post Neural Networks Are Blurry, Symbolic Systems Are Fragmented. Sparse Autoencoders Help Us Combine Them. appeared first on Towards Data Science.

I Cleaned a Messy CSV File Using Pandas . Here’s the Exact Process I Follow Every Time.

Ibrahim Salami — Wed, 26 Nov 2025 19:13:17 +0000

Stop guessing at data cleaning. Use this repeatable 5-step Python workflow to diagnose and fix the most common data flaws.

The post I Cleaned a Messy CSV File Using Pandas . Here’s the Exact Process I Follow Every Time. appeared first on Towards Data Science.

Why CrewAI’s Manager-Worker Architecture Fails — and How to Fix It

Partha Sarkar — Tue, 25 Nov 2025 18:45:38 +0000

A real-world analysis of why CrewAI’s hierarchical orchestration misfires—and a practical fix you can implement today.

The post Why CrewAI’s Manager-Worker Architecture Fails — and How to Fix It appeared first on Towards Data Science.

Ten Lessons of Building LLM Applications for Engineers

Shuai Guo — Tue, 25 Nov 2025 13:00:00 +0000

Practical field notes on workflows, structure, and evaluation from two years of building with engineering domain experts.

The post Ten Lessons of Building LLM Applications for Engineers appeared first on Towards Data Science.

Your Next ‘Large’ Language Model Might Not Be Large After All

Moulik Gupta — Sun, 23 Nov 2025 14:00:00 +0000

A 27M-parameter model just outperformed giants like DeepSeek R1, o3-mini, and Claude 3.7 on reasoning tasks

The post Your Next ‘Large’ Language Model Might Not Be Large After All appeared first on Towards Data Science.

Generative AI Will Redesign Cars, But Not the Way Automakers Think

Nishant Arora — Fri, 21 Nov 2025 12:30:00 +0000

Traditional manufacturers are using revolutionary technology for incremental optimization instead of fundamental re-imagination

The post Generative AI Will Redesign Cars, But Not the Way Automakers Think appeared first on Towards Data Science.

How Relevance Models Foreshadowed Transformers for NLP

Sean Moran — Thu, 20 Nov 2025 14:00:00 +0000

Tracing the history of LLM attention: standing on the shoulders of giants

The post How Relevance Models Foreshadowed Transformers for NLP appeared first on Towards Data Science.