Data Science | Towards Data Science

6 Technical Skills That Make You a Senior Data Scientist

Piero Paialunga — Mon, 15 Dec 2025 15:43:00 +0000

Beyond writing code, these are the design-level decisions, trade-offs, and habits that quietly separate senior data scientists from everyone else.

The post 6 Technical Skills That Make You a Senior Data Scientist appeared first on Towards Data Science.

Lessons Learned from Upgrading to LangChain 1.0 in Production

Clara Chong — Mon, 15 Dec 2025 10:30:00 +0000

What worked, what broke, and why I did it

The post Lessons Learned from Upgrading to LangChain 1.0 in Production appeared first on Towards Data Science.

The Machine Learning “Advent Calendar” Day 14: Softmax Regression in Excel

angela shi — Sun, 14 Dec 2025 18:12:00 +0000

Softmax Regression is simply Logistic Regression extended to multiple classes.

By computing one linear score per class and normalizing them with Softmax, we obtain multiclass probabilities without changing the core logic.

The loss, the gradients, and the optimization remain the same.
Only the number of parallel scores increases.

Implemented in Excel, the model becomes transparent: you can see the scores, the probabilities, and how the coefficients evolve over time.

The post The Machine Learning “Advent Calendar” Day 14: Softmax Regression in Excel appeared first on Towards Data Science.

The Skills That Bridge Technical Work and Business Impact

TDS Editors — Sun, 14 Dec 2025 14:30:29 +0000

In the Author Spotlight series, TDS Editors chat with members of our community about their career path in data science and AI, their writing, and their sources of inspiration. Today, we’re thrilled to share our conversation with Maria Mouschoutzi. Maria is a Data Analyst and Project Manager with a strong background in Operations Research, Mechanical […]

The post The Skills That Bridge Technical Work and Business Impact appeared first on Towards Data Science.

The Machine Learning “Advent Calendar” Day 13: LASSO and Ridge Regression in Excel

angela shi — Sat, 13 Dec 2025 16:56:00 +0000

Ridge and Lasso regression are often perceived as more complex versions of linear regression. In reality, the prediction model remains exactly the same. What changes is the training objective. By adding a penalty on the coefficients, regularization forces the model to choose more stable solutions, especially when features are correlated. Implementing Ridge and Lasso step by step in Excel makes this idea explicit: regularization does not add complexity, it adds preference.

The post The Machine Learning “Advent Calendar” Day 13: LASSO and Ridge Regression in Excel appeared first on Towards Data Science.

NeurIPS 2025 Best Paper Review: Qwen’s Systematic Exploration of Attention Gating

Sean Moran — Sat, 13 Dec 2025 10:16:00 +0000

This one little trick can bring about enhanced training stability, the use of larger learning rates and improved scaling properties

The post NeurIPS 2025 Best Paper Review: Qwen’s Systematic Exploration of Attention Gating appeared first on Towards Data Science.

The Machine Learning “Advent Calendar” Day 12: Logistic Regression in Excel

angela shi — Fri, 12 Dec 2025 17:15:00 +0000

In this article, we rebuild Logistic Regression step by step directly in Excel.
Starting from a binary dataset, we explore why linear regression struggles as a classifier, how the logistic function fixes these issues, and how log-loss naturally appears from the likelihood.
With a transparent gradient-descent table, you can watch the model learn at each iteration—making the whole process intuitive, visual, and surprisingly satisfying.

The post The Machine Learning “Advent Calendar” Day 12: Logistic Regression in Excel appeared first on Towards Data Science.

EDA in Public (Part 1): Cleaning and Exploring Sales Data with Pandas

Ibrahim Salami — Fri, 12 Dec 2025 13:20:00 +0000

Hey everyone! Welcome to the start of a major data journey that I’m calling “EDA in Public.” For those who know me, I believe the best way to learn anything is to tackle a real-world problem and share the entire messy process — including mistakes, victories, and everything in between. If you’ve been looking to level up […]

The post EDA in Public (Part 1): Cleaning and Exploring Sales Data with Pandas appeared first on Towards Data Science.

The Machine Learning “Advent Calendar” Day 11: Linear Regression in Excel

angela shi — Thu, 11 Dec 2025 16:31:00 +0000

Linear Regression looks simple, but it introduces the core ideas of modern machine learning: loss functions, optimization, gradients, scaling, and interpretation.
In this article, we rebuild Linear Regression in Excel, compare the closed-form solution with Gradient Descent, and see how the coefficients evolve step by step.
This foundation naturally leads to regularization, kernels, classification, and the dual view.
Linear Regression is not just a straight line, but the starting point for many models we will explore next in the Advent Calendar.

The post The Machine Learning “Advent Calendar” Day 11: Linear Regression in Excel appeared first on Towards Data Science.

7 Pandas Performance Tricks Every Data Scientist Should Know

Benjamin Nweke — Thu, 11 Dec 2025 13:30:00 +0000

What I've learned about making Pandas faster after too many slow notebooks and frozen sessions

The post 7 Pandas Performance Tricks Every Data Scientist Should Know appeared first on Towards Data Science.

The Machine Learning “Advent Calendar” Day 9: LOF in Excel

angela shi — Tue, 09 Dec 2025 17:45:00 +0000

In this article, we explore LOF through three simple steps: distances and neighbors, reachability distances, and the final LOF score. Using tiny datasets, we see how two anomalies can look obvious to us but completely different to different algorithms. This reveals the key idea of unsupervised learning: there is no single “true” outlier, only definitions. Understanding these definitions is the real skill.

The post The Machine Learning “Advent Calendar” Day 9: LOF in Excel appeared first on Towards Data Science.

How to Develop AI-Powered Solutions, Accelerated by AI

Anna Via — Tue, 09 Dec 2025 15:00:00 +0000

From idea to impact : using AI as your accelerating copilot

The post How to Develop AI-Powered Solutions, Accelerated by AI appeared first on Towards Data Science.

A Realistic Roadmap to Start an AI Career in 2026

Sabrine Bendimerad — Tue, 09 Dec 2025 12:00:00 +0000

How to learn AI in 2026 through real, usable projects

The post A Realistic Roadmap to Start an AI Career in 2026 appeared first on Towards Data Science.

Bridging the Silence: How LEO Satellites and Edge AI Will Democratize Connectivity

Aakash Goswami — Mon, 08 Dec 2025 19:00:00 +0000

Why on-device intelligence and low-orbit constellations are the only viable path to universal accessibility

The post Bridging the Silence: How LEO Satellites and Edge AI Will Democratize Connectivity appeared first on Towards Data Science.

The Machine Learning “Advent Calendar” Day 8: Isolation Forest in Excel

angela shi — Mon, 08 Dec 2025 18:26:42 +0000

Isolation Forest may look technical, but its idea is simple: isolate points using random splits. If a point is isolated quickly, it is an anomaly; if it takes many splits, it is normal.

Using the tiny dataset 1, 2, 3, 9, we can see the logic clearly. We build several random trees, measure how many splits each point needs, average the depths, and convert them into anomaly scores. Short depths become scores close to 1, long depths close to 0.

The Excel implementation is painful, but the algorithm itself is elegant. It scales to many features, makes no assumptions about distributions, and even works with categorical data. Above all, Isolation Forest asks a different question: not “What is normal?”, but “How fast can I isolate this point?”

The post The Machine Learning “Advent Calendar” Day 8: Isolation Forest in Excel appeared first on Towards Data Science.

The Machine Learning “Advent Calendar” Day 7: Decision Tree Classifier

angela shi — Sun, 07 Dec 2025 14:30:00 +0000

In Day 6, we saw how a Decision Tree Regressor finds its optimal split by minimizing the Mean Squared Error.
Today, for Day 7 of the Machine Learning "Advent Calendar", we switch to classification. With just one numerical feature and two classes, we explore how a Decision Tree Classifier decides where to cut the data, using impurity measures like Gini and Entropy.
Even without doing the math, we can visually guess possible split points. But which one is best? And do impurity measures really make a difference? Let us build the first split step by step in Excel and see what happens.

The post The Machine Learning “Advent Calendar” Day 7: Decision Tree Classifier appeared first on Towards Data Science.

Artificial Intelligence, Machine Learning, Deep Learning, and Generative AI — Clearly Explained

Sabrine Bendimerad — Sun, 07 Dec 2025 13:00:00 +0000

Understanding AI in 2026 — from machine learning to generative models

The post Artificial Intelligence, Machine Learning, Deep Learning, and Generative AI — Clearly Explained appeared first on Towards Data Science.

The Machine Learning “Advent Calendar” Day 6: Decision Tree Regressor

angela shi — Sat, 06 Dec 2025 14:30:00 +0000

During the first days of this Machine Learning Advent Calendar, we explored models based on distances. Today, we switch to a completely different way of learning: Decision Trees.
With a simple one-feature dataset, we can see how a tree chooses its first split. The idea is always the same: if humans can guess the split visually, then we can rebuild the logic step by step in Excel.
By listing all possible split values and computing the MSE for each one, we identify the split that reduces the error the most. This gives us a clear intuition of how a Decision Tree grows, how it makes predictions, and why the first split is such a crucial step.

The post The Machine Learning “Advent Calendar” Day 6: Decision Tree Regressor appeared first on Towards Data Science.

The Machine Learning “Advent Calendar” Day 5: Gaussian Mixture Model in Excel

angela shi — Fri, 05 Dec 2025 17:00:00 +0000

This article introduces the Gaussian Mixture Model as a natural extension of k-Means, by improving how distance is measured through variances and the Mahalanobis distance. Instead of assigning points to clusters with hard boundaries, GMM uses probabilities learned through the Expectation–Maximization algorithm – the general form of Lloyd’s method.

Using simple Excel formulas, we implement EM step by step in 1D and 2D, and we visualise how the Gaussian curves or ellipses move during training. The means shift, the variances adjust, and the shapes gradually settle around the true structure of the data.

GMM provides a richer, more flexible way to model clusters, and becomes intuitive once the process is made visible in a spreadsheet.

The post The Machine Learning “Advent Calendar” Day 5: Gaussian Mixture Model in Excel appeared first on Towards Data Science.

TDS Newsletter: How to Design Evals, Metrics, and KPIs That Work

TDS Editors — Fri, 05 Dec 2025 01:40:18 +0000

On the challenges of producing reliable insights and avoiding common mistakes

The post TDS Newsletter: How to Design Evals, Metrics, and KPIs That Work appeared first on Towards Data Science.