Prompt engineering hits a ceiling quickly if you reduce it to wording instead of treating it as the design of the full context around the model.
The chapter shows why the LLM Loop, RAG, agents, and workflows change the conversation entirely: answer quality depends on the full system rather than on a single prompt.
That makes it especially useful in interviews and architecture discussions, because the conversation naturally moves from prompting to retrieval quality, state management, and evaluation.
Practical value of this chapter
Context design
The chapter makes it clear that answer quality depends not only on wording, but on how the surrounding context is assembled around the model.
RAG and workflows
Through the LLM Loop, RAG, and agent patterns, the book ties prompting to data, tools, and the sequence of steps inside the system.
Evaluation and quality
It gives you a practical way to discuss example suites, gold answers, A/B tests, and common model failures rather than only clever prompts.
Interview material
It is a strong frame for moving from prompt design to architecture: retrieval, agents, evaluation, and system boundaries.
Source
Book Cube
A short review of the book by Alexander Polomodov.
Prompt Engineering for LLMs
Authors: John Berryman, Albert Ziegler
Publisher: O'Reilly Media, Inc.
Length: 282 pages
John Berryman and Albert Ziegler on designing prompts for LLMs, assembling context, using RAG and agent patterns, and evaluating answer quality.
Key Idea: LLM Loop
The authors introduce the LLM Loop as a working cycle where model quality depends not on one clever prompt, but on how the system retrieves context, assembles the request, and processes the result:
Retrieval
Finding the right documents and data
Snippetizing
Preparing model-friendly context chunks
Scoring
Selecting the most useful fragments
Assembly
Packing instructions and context into the request
Post-process
Checking, formatting, and safely returning the answer
Related chapter
AI Engineering (Chip Huyen)
A broader look at RAG, agents, fine-tuning, and AI operations.
Book structure: 3 parts, 11 chapters
Part I: LLM Basics
How the models work, how they evolved, how they are trained, and how they moved from pure completion to dialogue.
Introduction to Prompt Engineering
Why LLMs look like “magic,” how language models evolved, and why prompt engineering became its own engineering discipline.
Understanding LLMs
LLMs as continuation models: tokens, autoregression, hallucinations, temperature, and the basics of transformers.
Moving to Chat
From completion to chat: RLHF, instruct and chat models, the cost of alignment, and API evolution. Prompting as “staging a play” with scenes, roles, and cues.
Designing LLM Applications
The core LLM Loop frame: retrieval → snippetizing → scoring → prompt assembly → post-processing.
Part II: Key Techniques
Few-shot examples, RAG to reduce hallucinations, and careful prompt assembly.
Prompt Content
Static content such as instructions and few-shot examples versus dynamic content. RAG: lexical and neural retrieval, embeddings, vector storage, and hierarchical summarization.
Assembling the Prompt
Working within the context window: prompt anatomy, document formats, and elastic snippets. Valley of Meh: the middle of the prompt tends to sag, so important content belongs closer to the edges.
Taming the Model
Anatomy of the model response: preamble, start and end markers, stop sequences, and streaming. Logprobs for confidence. Model choice as a balance of quality, price, and latency.
Related chapter
Hands-On Large Language Models
Visual explanation of RAG, agents and LangChain
Part III: Advanced Topics
Agents with memory and tools, workflows, and quality evaluation.
Conversational Agency
Tool use: tool design, error handling, and dangerous actions. Reasoning patterns such as CoT and ReAct. Building the agent and its user experience.
LLM Workflows
When a workflow is better than an agent. Tasks as building blocks, reusable prompts, agent-driven workflows, stateful task agents, roles, and delegation.
Evaluating LLM Applications
Offline: example suites, gold answers, model-based judging, and SOMA. Online: A/B tests and product metrics.
Looking Ahead
Multimodality, user experience as part of quality, and the continuing rise in model capability and speed.
Practical insight: Valley of Meh
The middle of the prompt “sags”
Models usually pay more attention to the beginning and end of a prompt. Information in the middle is more likely to be ignored or weighted less.
Authors' recommendation:
- Put critical instructions first, especially in the system prompt
- Keep the most important context closer to the end
- Leave lower-priority material in the middle
Relevance in 2026: Prompt → Context Engineering
Since the book was published, LLMs have improved significantly. Models understand users better even without complex prompts, and many strong practices are now built directly into tools and platforms.
Context Engineering
Andrey Karpathy's framing from 2025: focus on giving the model a complete environment, including data, history, tools, and constraints, rather than hunting for the perfect wording.
PromptOps
Prompt versioning, request-quality monitoring, and automated context preparation.
Conclusion: The core principles of the book still hold. RAG is now everywhere, and chain-of-thought has become a standard tool in many AI agents. The authors were upfront that APIs would age, but the underlying ideas would last.
Related chapters
- AI Engineering (short summary) - A broader runtime view: LLM system architecture, evaluation, guardrails, and operations.
- Hands-On Large Language Models (short summary) - A visual LLM foundation: tokenization, embeddings, transformers, RAG, and applied patterns.
- An Illustrated Guide to AI Agents (short summary) - A continuation into agents: planning, memory, reflection, tools, and multi-agent coordination.
- Developing Apps with GPT-4 and ChatGPT (short summary) - An entry-level applied layer: API practice, baseline prompting, and early product use cases.
