System Design Space
Knowledge graphSettings

Updated: April 7, 2026 at 7:45 PM

Hands-On Large Language Models (short summary)

medium

LLMs stop feeling like magic when you can see how tokenization, attention, and model architecture show up in system behavior.

The chapter connects internal model mechanics to applied use cases: classification, search, RAG, agent workflows, and task-specific fine-tuning.

That makes it especially useful for interviews: you can explain LLM mechanics without losing sight of product decisions, answer quality, and system boundaries.

Practical value of this chapter

LLM internals

The chapter helps you explain tokenization, embeddings, transformers, and attention without turning the model into a black box or a magic trick.

Bridge to applied use cases

Once the mechanics are clear, the book moves directly into classification, topic analysis, generation, search, and multimodal workloads.

Bridge to AI engineering

It is a strong transition from LLM mechanics to RAG, agent workflows, fine-tuning, and real product integration.

Interview material

It gives you a solid base for discussing both how LLMs work internally and which architectural choices grow around them.

Source

Telegram: Book Cube

A short review of the book by Alexander Polomodov.

Read post

Hands-On Large Language Models

Authors: Jay Alammar, Maarten Grootendorst
Publisher: O'Reilly Media, Inc.
Length: 428 pages

Jay Alammar and Maarten Grootendorst: a visual practical guide to LLMs with ~300 illustrations covering tokenization, embeddings, transformers, RAG, and fine-tuning.

Original

About the authors

The book is written by two ML and AI experts known for explaining difficult ideas through clear visuals and practical examples:

Jay Alammar

Engineering Fellow at Cohere and author of widely cited visual explainers on ML and NLP. His diagrams appear in NumPy and pandas documentation as well as deeplearning.ai courses.

jalammar.github.io

Maarten Grootendorst

Data Scientist and creator of BERTopic and KeyBERT. He focuses on topic modeling and text embeddings.

newsletter.maartengrootendorst.com

Philosophy of the book

The authors follow an "intuition first" approach: they build intuitive understanding through visuals and only then reinforce it with more formal explanations and code.

~300 original illustrations

Attention mechanisms, tokenizers, and high-dimensional embedding spaces are explained through visual diagrams, graphs, and drawings.

All code is available at GitHub.

Related chapter

AI Engineering (Chip Huyen)

A broader view of operational practice: RAG, agents, fine-tuning, and evaluation.

Читать обзор

Book structure

The book has three parts and 12 chapters: it starts with how LLMs work, moves to practical applications, and finishes with training and fine-tuning.

IUnderstanding language models

The first part builds the foundation and explains, step by step, how language models work.

1

Introduction to Language Models

The path from bag-of-words and early embeddings to transformers, and why LLMs changed the scale and flexibility of language modeling.

2

Tokenization and embeddings

How tokenization works in LLMs. Comparison of words, subwords, characters, and bytes, plus the path from word2vec to modern embeddings.

3

Inside the transformer

What happens in a forward pass: input processing, attention computation, and next-token selection. Includes an intuitive view of self-attention and optimizations such as KV-cache.

IIUsing Pretrained Models

Practical ways to use pretrained LLMs and embedding models for applied tasks.

Chapter 4

Text classification

Using LLMs for classification with minimal additional training

Chapter 5

Clustering and Topic Modeling

BERTopic, one of the authors' libraries, for topic analysis

Chapter 6

Prompt engineering

Chain of Thought, ReAct, Tree of Thought, and other prompting and reasoning techniques

Chapter 7

Advanced text generation

LangChain, agents, memory, and tools for building more complex application flows

Chapter 8

Semantic Search and RAG

RAG extends model knowledge with document retrieval and external context

Chapter 9

Multimodal models

Text and images: CLIP, BLIP-2, and vision-language approaches

IIITraining and fine tuning

How to train your own models and adapt existing LLMs for specific tasks.

10

Creating embedding models

Training custom embedding models for specific domains and collections

11

Fine-tuning for classification

Adapting representation models for classification tasks

12

Fine-tuning generative models

Adjusting LLMs to generate text in a specific style, tone, or domain

Who is this book for?

  • Engineers and analysts who want a clear entry point into LLMs and modern NLP
  • Developers and data specialists bringing LLMs into working products, not just demos
  • Anyone who wants to speak confidently about systems like ChatGPT, Mistral, and Claude and understand what happens under the hood

Related chapters

Where to find the book

Enable tracking in Settings