Transformers | Towards Data Science

Scaling Recommender Transformers to a Billion Parameters

Machine Learning

How to implement a new generation of transformer recommenders

Kirill Кhrylchenko

October 21, 2025

35 min read

Multi-head Attention is a Fancy Addition Machine

Machine Learning

“Attention is All you Need” showed attention as a sequence of multiplicative and concat operations…

Kunj Mehta

July 24, 2025

8 min read

Your 1M+ Context Window LLM Is Less Powerful Than You Think

Large Language Models

Why working memory is a more important bottleneck than raw context window size

Tobias Schnabel

July 17, 2025

9 min read

Private investigator in a brown hat looking at a chart of a time series with their lens

Hands-On Attention Mechanism for Time Series Classification, with Python

Machine Learning

This is how to use the attention mechanism in a time series classification framework

Piero Paialunga

May 30, 2025

9 min read

Behind the Magic: How Tensors Drive Transformers

Large Language Models

The workflow Of tensors Inside Transformers

Ziad SALLOUM

April 25, 2025

4 min read

Fine-tuning Multimodal Embedding Models

Machine Learning

Adapting CLIP to YouTube Data (with Python Code)

Shaw Talebi

January 31, 2025

10 min read

Kernel jobs data management | Image by the author

Understanding Flash Attention: Writing the Algorithm from Scratch in Triton

Artificial Intelligence

Find out how Flash Attention works. Afterward, we’ll refine our understanding by writing a GPU…

Alex Dremov

January 15, 2025

7 min read

A Schematic Overview of the KeyNMF Model

Contextual Topic Modelling in Chinese Corpora with KeyNMF

Natural Language Processing

A comprehensive guide on getting the most out of your Chinese topic models, from preprocessing…

Márton Kardos

January 13, 2025

8 min read

Customizing Your Fine-tuning Code Using HuggingFace’s Transformers Library

Examples of custom callbacks and custom fine-tuning code from different libraries

Maeda Hanafi, PhD

January 8, 2025

8 min read

Transformer (Created by author using FLUX1-schnell)

Einstein Notation: A New Lens on Transformers

Machine Learning

Transforming the Math of the Transformer Model

Dr. Christoph Mittendorf

November 20, 2024

9 min read