Prompt Engineering for LLMs (short summary)

Prompt engineering hits a ceiling quickly if you reduce it to wording instead of treating it as the design of the full context around the model.

The chapter shows why the LLM Loop, RAG, agents, and workflows change the conversation entirely: answer quality depends on the full system rather than on a single prompt.

That makes it especially useful in interviews and architecture discussions, because the conversation naturally moves from prompting to retrieval quality, state management, and evaluation.

Practical value of this chapter

Context design

The chapter makes it clear that answer quality depends not only on wording, but on how the surrounding context is assembled around the model.

RAG and workflows

Through the LLM Loop, RAG, and agent patterns, the book ties prompting to data, tools, and the sequence of steps inside the system.

Evaluation and quality

It gives you a practical way to discuss example suites, gold answers, A/B tests, and common model failures rather than only clever prompts.

Interview material

It is a strong frame for moving from prompt design to architecture: retrieval, agents, evaluation, and system boundaries.

Source

Book Cube

A short review of the book by Alexander Polomodov.

Read post

Prompt Engineering for LLMs

Authors: John Berryman, Albert Ziegler
Publisher: O'Reilly Media, Inc.
Length: 282 pages

John Berryman and Albert Ziegler on designing prompts for LLMs, assembling context, using RAG and agent patterns, and evaluating answer quality.

Original

Key Idea: LLM Loop

The authors introduce the LLM Loop as the working cycle of a large language model. The key shift: answer quality rests not on one clever prompt, but on how the system finds context, assembles the request, and processes the result. Break any of these steps and good wording no longer saves you:

Retrieval

Finding the right documents and data

→

Snippetizing

Preparing model-friendly context chunks

→

Scoring

Selecting the most useful fragments

→

Assembly

Packing instructions and context into the request

→

Post-process

Checking, formatting, and safely returning the answer

Related chapter

AI Engineering (Chip Huyen)

A broader look at RAG, agents, fine-tuning, and AI operations.

Читать обзор

Book structure: 3 parts, 11 chapters

Part I: LLM Basics

The base intuition without which the later techniques turn into cargo cult: how the models work, how they are trained, and why text completion became dialogue.

Introduction to Prompt Engineering

Why LLMs look like “magic,” how language models evolved, and why prompt engineering became its own engineering discipline.

Understanding LLMs

LLMs as continuation models: tokens, autoregression, hallucinations, temperature, and the basics of transformers.

Moving to Chat

From completion to chat: RLHF, instruct and chat models, the cost of alignment, and API evolution. Prompting as “staging a play” with scenes, roles, and cues.

Designing LLM Applications

The core LLM Loop frame: retrieval → snippetizing → scoring → prompt assembly → post-processing.

Part II: Key Techniques

What actually moves the answer: few-shot examples, RAG against the model making things up, and prompt assembly that holds up on long context.

Prompt Content

Static content such as instructions and few-shot examples versus dynamic content. RAG: lexical and neural retrieval, embeddings, vector storage, and hierarchical summarization.

Assembling the Prompt

Working within the context window: prompt anatomy, document formats, and elastic snippets. Valley of Meh: the middle of the prompt tends to sag, so important content belongs closer to the edges.

Taming the Model

Anatomy of the model response: preamble, start and end markers, stop sequences, and streaming. Logprobs for confidence. Model choice as a balance of quality, price, and latency.

Related chapter

Hands-On Large Language Models

Visual explanation of RAG, agents and LangChain

Читать обзор

Part III: Advanced Topics

This is where systems begin rather than single prompts: agents with memory and tools, workflows, and the quality evaluation without which you cannot catch a regression.

Conversational Agency

Tool use: tool design, error handling, and dangerous actions. Reasoning patterns such as CoT and ReAct. Building the agent and its user experience.

LLM Workflows

When a workflow is better than an agent. Tasks as building blocks, reusable prompts, agent-driven workflows, stateful task agents, roles, and delegation.

Evaluating LLM Applications

Offline: example suites, gold answers, model-based judging, and SOMA. Online: A/B tests and product metrics.

Looking Ahead

Multimodality, user experience as part of quality, and the continuing rise in model capability and speed.

Practical insight: Valley of Meh

The middle of the prompt “sags”

Models usually pay more attention to the beginning and end of a prompt. Anything in the middle is more likely to be ignored or weighted less — so a critical instruction parked there may simply not fire.

Authors' recommendation:

Put critical instructions first, especially in the system prompt
Keep the most important context closer to the end
Leave lower-priority material in the middle

Relevance in 2026: Prompt → Context Engineering

Since the book was published, LLMs have improved significantly, and some of its tricks have lost their edge: models understand users better without complex prompts, and many strong practices are now built directly into tools and platforms. The scarce part moved from wording to what the model receives as input, and in what shape.

Context Engineering

Andrey Karpathy's framing from 2025: instead of hunting for the perfect wording, the engineer assembles the model's whole environment — data, history, tools, and constraints. The cost of failure moved there too: more often the model answers badly not because of the words in the request, but because the context it needed simply was not in it.

PromptOps

A prompt in production is code: you version it, monitor request quality, and automate context preparation so quiet degradation surfaces before the user hits it.

Conclusion: API syntax ages fast; a way of thinking does not, and that is what the book actually gives you. RAG is now everywhere, chain-of-thought has become a standard tool in many AI agents — but what still decides the outcome is the context you put into the model and how you check what it answered. The authors said as much themselves: APIs would age, the underlying ideas would last.

Related chapters

AI Engineering (short summary) - A broader runtime view: LLM system architecture, evaluation, guardrails, and operations.
Hands-On Large Language Models (short summary) - A visual LLM foundation: tokenization, embeddings, transformers, RAG, and applied patterns.
An Illustrated Guide to AI Agents (short summary) - A continuation into agents: planning, memory, reflection, tools, and multi-agent coordination.
Developing Apps with GPT-4 and ChatGPT (short summary) - An entry-level applied layer: API practice, baseline prompting, and early product use cases.

Where to find the book

Original

oreilly.com

Prompt Engineering for LLMs