Publish AI, ML & data-science insights to a global community of data professionals.

Visual Guides to understand the basics of Large Language Models

A compilation of tools and articles that intuitively break down the complicated AI concepts

Image by the author using free illustrations by unDraw.co
Image by the author using free illustrations by unDraw.co

This is a living document and will be continually updated.

Last update: 10th August, 2024. Added Transformer Explainer

Today, the world is abuzz with LLMs, short for Large Language models. Not a day passes without the announcement of a new language model, fueling the fear of missing out in the AI space. Yet, many still struggle with the basic concepts of LLMs, making it challenging to keep pace with the advancements. This article is aimed at those who would like to dive into the inner workings of such AI models to have a solid grasp of the subject. With this in mind, I present a few tools and articles that can help solidify the concepts and break down the concepts of LLMs so they can be easily understood.


Table of Contents

Β· 1. The Illustrated Transformer by Jay Alammar Β· 2. The Illustrated GPT-2 by Jay Alammar Β· 3. Transformer Explainer: Interactive Learning of Text-Generative Models Β· 4. LLM Visualization by Brendan Bycroft Β· 5. Generative AI exists because of the transformer – Financial Times Β· 6. Tokenizer tool by OpenAI Β· 7. Understanding GPT tokenizers by Simon Willison Β· 8. Chunkviz by Greg Kamradt Β· 9. Do Machine Learning Models Memorize or Generalize? -An explorable by PAIR Β· 10. Color-Coded Text Generation Β· Conclusion


1. The Illustrated Transformer by Jay Alammar

GIF created by Author, based on The Illustrated Transformer by Jay Alammar
GIF created by Author, based on The Illustrated Transformer by Jay Alammar

I’m sure many of you are already familiar with this iconic article. Jay was one of the earliest pioneers in writing technical articles with powerful visualizations. A quick run through this blog site will make you understand what I’m trying to imply. Over the years, he has inspired many writers to follow suit, and the idea of tutorials changed from simple text and code to immersive visualizations. Anyway, back to the illustrated Transformer. The transformer architecture is the fundamental building block of all Language Models with Transformers (LLMs). Hence, it is essential to understand the basics of it, which is what Jay does beautifully. The blog covers crucial concepts like:

  1. A High-Level Look at The Transformer Model
  2. Exploring The Transformer’s Encoding and Decoding Components
  3. Self-Attention
  4. Matrix Calculation of Self-Attention
  5. The Concept of Multi-Headed Attention
  6. Positional Encoding
  7. The Residuals in The Transformer Architecture
  8. The Final Linear and Softmax Layer of The Decoder
  9. The Loss Function in Model Training

He has also created a "Narrated Transformer" video, which is a gentler approach to the topic. Once you are done with this blog post, the Attention Is All You Need paper, and the official Transformer blog post would be great add-ons.

Link: https://jalammar.github.io/illustrated-transformer/


2. The Illustrated GPT-2 by Jay Alammar

GIF created by Author, based on The Illustrated GPT-2 by Jay Alammar
GIF created by Author, based on The Illustrated GPT-2 by Jay Alammar

Another great article from Jay Alammar – the illustrated GPT-2. It is a supplement to the Illustrated Transformer blog, containing more visual elements to explain the inner workings of transformers and how they’ve evolved since the original paper. It also has a dedicated section for applications of transformers beyond language modeling.

πŸ”— : https://jalammar.github.io/illustrated-gpt2/


3. Transformer Explainer: Interactive Learning of Text-Generative Models

Transformer explainer by https://poloclub.github.io/transformer-explainer/ | Screenshot by Author
Transformer explainer by https://poloclub.github.io/transformer-explainer/ | Screenshot by Author

Transformer Explainer is an interactive visualization tool designed for non-experts to learn about Transformers through the GPT-2 model. It runs a live GPT-2 instance directly in the user’s browser, allowing them to experiment with their own input and see in real time how the internal components and parameters of the Transformer predict the next tokens. The tool requires no installation or special hardware, making modern generative AI techniques more accessible to everyone.

πŸ”— : https://poloclub.github.io/transformer-explainer/

4. LLM Visualization by Brendan Bycroft

GIF created by Author, based on LLM Visualization by Brendan Bycroft
GIF created by Author, based on LLM Visualization by Brendan Bycroft

The LLM visualization project provides a walkthrough of the LLM algorithm backing OpenAI’s ChatGPT. It’s a great resource to explore the algorithm down to every step required to run a single token inference., seeing the whole process in action.

The project features a web page containing visualizations of a small LLM akin to what powers ChatGPT but in stunning 3D effects. This tool offers a step-by-step guide through a single token inference and features interactive elements for a hands-on experience. As of today, visualizations for the following architectures are available:

  • GPT-2(small)
  • Nano GPT
  • GPT-2(XL)
  • GPT-3

πŸ”— : https://bbycroft.net/llm


5. Generative AI exists because of the transformer – Financial Times

GIF created by the Author, based on Generative AI, exists because of the transformer - Financial Times(FT) | This work is being distributed under FT's sharing policy.
GIF created by the Author, based on Generative AI, exists because of the transformer – Financial Times(FT) | This work is being distributed under FT’s sharing policy.

Great job by the Visual Storytelling Team and Madhumita Murgia at Financial Times for employing visuals to elucidate the functioning of LLMs, with a special emphasis on the self-attention mechanism and the Transformer architecture.

πŸ”— https://ig.ft.com/generative-ai/


6. Tokenizer tool by OpenAI

Screenshot by Author | Source: OpenAI's Tokenizer tool documentation
Screenshot by Author | Source: OpenAI’s Tokenizer tool documentation

Large language models process text using tokens – sequences of numbers. Tokenizers convert text into tokens. OpenAI’s tokenizer tool provides a helpful way to test specific strings and see how they are translated into tokens. You can use the tool to understand how a piece of text might be tokenized by a language model and the total count of tokens in that piece of text.

Link: https://platform.openai.com/tokenizer


7. Understanding GPT tokenizers by Simon Willison

GIF created by Author, based on Understanding GPT tokenizers by Simon Willison
GIF created by Author, based on Understanding GPT tokenizers by Simon Willison

While we have already mentioned that OpenAI offers a Tokenizer tool for exploring how tokens work, Simon Willison has built his own tokenizer tool, which is slightly more interesting. It is available as a tool as an Observable notebook. The notebook converts text to tokens, tokens to text, and runs searches against the full token table.

Some of the key insights from Simon’s analysis are : β€’ Most common English words have a single token assigned. β€’ Some words have tokens with a leading space, enabling more efficient encoding of full sentences. β€’ Non-English languages may have less efficient tokenization. β€’ Glitch tokens can lead to unexpected behavior.

πŸ”— https://lnkd.in/eXTcia8Z


8. Chunkviz by Greg Kamradt

GIF by the Author based on the Chunkviz app available for sharing under the MIT License.
GIF by the Author based on the Chunkviz app available for sharing under the MIT License.

Chunking is a strategy that involves breaking down large pieces of text into smaller segments when building LLM applications. This is important so that you can fit your document into your model’s context window. Context windows refer to the maximum length of text they can be handled by a language model at once. But there are various strategies for chunking, and this is where this tool shines. You can choose from a variety of chunk strategies and see how it affects your text. currently, you can visualize text splitting & chunking strategies from four different LangChainAI splitters. features

πŸ”— https://chunkviz.up.railway.app/


9. Do Machine Learning Models Memorize or Generalize? -An explorable by PAIR

GIF by the Author based on the Do Machine Learning Models Memorize or Generalize? explorable, available for sharing under the MIT License.
GIF by the Author based on the Do Machine Learning Models Memorize or Generalize? explorable, available for sharing under the MIT License.

Explorables are interactive essays by Google’s PAIR team that try to simplify complex AI-related topics with interactive mediums. This particular explorable delves deep into the concept of Generalization and Memorization, exploring a vital question – whether large language models(LLMs) truly understand the world, or are they just recalling information from their extensive training data?

In this interactive article, the authors take an investigative journey through the training dynamics of a tiny model. They reverse engineer the solution they find, providing a brilliant illustration of the exciting emerging field of Mechanistic Interpretability.

πŸ”— https://pair.withgoogle.com/explorables/grokking/


10. Color-Coded Text Generation

Color-Coded Text Generation | Image by Author
Color-Coded Text Generation | Image by Author

The color-coded text generation tool visualizes the probabilities of each generated token during the text generation process. The tool computes the transition scores of the generated sequences, which represent the likelihood or probability of each token being selected at each step of the generation. This information is then used to color-code the generated output, with different colors representing different probability ranges (e.g., green for high probability, yellow for medium, and red for low).

This approach provides a convenient way for users to quickly understand and analyze the model’s decision-making process during text generation. The tool relies on the [compute_transition_scores](https://huggingface.co/docs/transformers/main/en/main_classes/text_generation#transformers.GenerationMixin.compute_transition_scores) function, which was introduced in the HuggingFace Transformers library (version 4.26.0).

πŸ”— https://huggingface.co/spaces/joaogante/color-coded-text-generation


Conclusion

We looked at a few invaluable tools and articles that try to break down complex technical jargon into easily understandable forms. I am a big proponent of writing and presenting technical concepts in interactive, visual formats. This reminds me of a previous article of mine that focussed on tools that present intuitive explanations of standard machine learning concepts.

Learn Machine Learning Concepts Interactively

The articles and tools highlighted in this article aim to lower the barrier to entry for beginners and enthusiasts alike, making learning more engaging and accessible. I plan to continually update this article with more such resources as I discover them. Additionally, I welcome and look forward to incorporating suggestions from readers.


Visit my GitHub repository to access all my blogs and their accompanying code in one convenient location.

GitHub – parulnith/Data-Science-Articles: A collection of my blogs on Data Science and Machine…


Towards Data Science is a community publication. Submit your insights to reach our global audience and earn through the TDS Author Payment Program.

Write for TDS

Related Articles