Primary source
O'Reilly Learning
Early Release page for An Illustrated Guide to AI Agents
An Illustrated Guide to AI Agents
Authors: Jay Alammar, Maarten Grootendorst
Publisher: O'Reilly Media, Inc. (Early Release)
Length: Early Release (в процессе написания)
Jay Alammar and Maarten Grootendorst: a practical guide to AI agents - memory, tools, planning, reflection, multi-agent coordination, and engineering risks.
OriginalWhy this is the right follow-up to Hands-On LLM
If Hands-On Large Language Models explains what happens inside the model, this book explains how to build a working agentic system around that model.
Step 1: understand the LLM engine
Transformer internals, tokens, embeddings, inference behavior, and core model limitations.
Step 2: design the agentic application
Memory, tools, planning, reflection, and multi-agent coordination as separate engineering work.
Related chapter
AI Engineering
Production practices for AI systems: evaluation, RAG, agents, finetuning
What is already available in the book
The book is in Early Release and already covers the core of agent architecture. The currently published chapters are enough to build a coherent mental model of AI agents.
Introduction
Why an agentic approach is needed, and where a simple LLM call ends and a real system begins.
Chapter components:
- Definition of an agentic system: the difference between a one-shot LLM call and a loop of planning, acting, and checking.
- Core components: model, state, orchestrator, tool layer, and observability.
- Typical scenarios where a chatbot is not enough: multi-step work, API integrations, long-running workflows.
- Architecture success criteria: reliability, controllability, latency/cost, and reproducibility.
Reasoning LLMs
What changes when a model can execute test-time reasoning chains, and how this impacts pipeline design.
Chapter components:
- Difference between a fluent answer and true reasoning behavior on complex tasks.
- Test-time reasoning: how deeper reasoning affects quality, response time, and cost.
- When to choose reasoning models vs a standard generation pipeline.
- Reasoning quality control: step verification, fallback strategies, and compute limits.
Memory
Short-term vs long-term memory, context engineering, and practical state management between agent steps.
Chapter components:
- Working memory in the context window: what stays in prompt vs what moves outside.
- Long-term memory: episodic storage of facts, preferences, and execution artifacts.
- Retrieval policies: relevance, TTL, summarization, deduplication, and context hygiene.
- Memory risks: sensitive-data leakage, context drift, and quality degradation as state grows.
Tool Usage, Learning, and Protocols
Function calling, external integrations, and interaction protocols (including MCP).
Chapter components:
- Tool contract design: argument schemas, validation, typing, and clear boundaries.
- Tool execution cycle: action selection, error handling, retry/idempotency, and post-processing.
- External system integration via protocols (MCP and related approaches) with explicit trade-offs.
- Moving from text generation to real-world actions: safety and audit requirements.
Planning and Reflection
Task decomposition, plan reassembly, self-critique, and feedback loops for better output quality.
Chapter components:
- Breaking a complex goal into executable sub-tasks and staged plans.
- Dynamic plan revision when tools fail, new data arrives, or priorities change.
- Reflection and self-critique loops to improve reliability and reduce obvious errors.
- Operational guardrails: budget/timebox, stop conditions, and reflection cost control.
Multi-Agent Systems
Role separation, multi-agent coordination, and architecture trade-offs (including A2A context).
Chapter components:
- Role patterns: planner, researcher, executor, reviewer, and handoff rules.
- Coordination topologies: hub-and-spoke, peer-to-peer, and hierarchical supervisors.
- State alignment across agents: shared context, messaging protocols, and conflict handling.
- Key risks: cascading failures, traceability complexity, and infrastructure cost growth.
Where engineering risks show up
Memory and context management
Without an explicit memory strategy, relevance drops quickly: latency grows, context drifts, and token costs become unpredictable.
Tools and safe integrations
When tools are connected, the system moves from text generation to action execution. Mistakes in permissions, validation, or idempotency become production risks.
Planning depth vs cost
Reflection loops can improve output quality, but they increase model calls. You need strict guardrails on budget, timeouts, and planning depth.
Coordination of multiple agents
A multi-agent setup can separate responsibilities better, but it also complicates observability, error tracing, and global consistency control.
Who should read it and how
Best fit for
- Engineers and tech leads who design AI features as product systems, not one-off demos.
- Teams that need a practical mental model of memory, tools, planning, and orchestration.
- Readers who already understand LLM fundamentals and want to move into agent and multi-agent architecture.
Suggested order
- Start with Hands-On Large Language Models to lock in the LLM foundation.
- Then read An Illustrated Guide to AI Agents as the architecture layer around the model.
- After that, reinforce production practices with AI Engineering and Prompt Engineering for LLMs.
What to study in parallel
In the tool-usage and multi-agent sections, it is useful to compare MCP vs A2A approaches, because protocol choice affects responsibility boundaries and observability of agent systems.
