System Design Space
Knowledge graphSettings

Updated: April 7, 2026 at 7:45 PM

Prompt Engineering for LLMs (short summary)

medium

Prompt engineering hits a ceiling quickly if you reduce it to wording instead of treating it as the design of the full context around the model.

The chapter shows why the LLM Loop, RAG, agents, and workflows change the conversation entirely: answer quality depends on the full system rather than on a single prompt.

That makes it especially useful in interviews and architecture discussions, because the conversation naturally moves from prompting to retrieval quality, state management, and evaluation.

Practical value of this chapter

Context design

The chapter makes it clear that answer quality depends not only on wording, but on how the surrounding context is assembled around the model.

RAG and workflows

Through the LLM Loop, RAG, and agent patterns, the book ties prompting to data, tools, and the sequence of steps inside the system.

Evaluation and quality

It gives you a practical way to discuss example suites, gold answers, A/B tests, and common model failures rather than only clever prompts.

Interview material

It is a strong frame for moving from prompt design to architecture: retrieval, agents, evaluation, and system boundaries.

Source

Book Cube

A short review of the book by Alexander Polomodov.

Read post

Prompt Engineering for LLMs

Authors: John Berryman, Albert Ziegler
Publisher: O'Reilly Media, Inc.
Length: 282 pages

John Berryman and Albert Ziegler on designing prompts for LLMs, assembling context, using RAG and agent patterns, and evaluating answer quality.

Original

Key Idea: LLM Loop

The authors introduce the LLM Loop as a working cycle where model quality depends not on one clever prompt, but on how the system retrieves context, assembles the request, and processes the result:

1

Retrieval

Finding the right documents and data

2

Snippetizing

Preparing model-friendly context chunks

3

Scoring

Selecting the most useful fragments

4

Assembly

Packing instructions and context into the request

5

Post-process

Checking, formatting, and safely returning the answer

Related chapter

AI Engineering (Chip Huyen)

A broader look at RAG, agents, fine-tuning, and AI operations.

Читать обзор

Book structure: 3 parts, 11 chapters

Part I: LLM Basics

How the models work, how they evolved, how they are trained, and how they moved from pure completion to dialogue.

1

Introduction to Prompt Engineering

Why LLMs look like “magic,” how language models evolved, and why prompt engineering became its own engineering discipline.

2

Understanding LLMs

LLMs as continuation models: tokens, autoregression, hallucinations, temperature, and the basics of transformers.

3

Moving to Chat

From completion to chat: RLHF, instruct and chat models, the cost of alignment, and API evolution. Prompting as “staging a play” with scenes, roles, and cues.

4

Designing LLM Applications

The core LLM Loop frame: retrieval → snippetizing → scoring → prompt assembly → post-processing.

Part II: Key Techniques

Few-shot examples, RAG to reduce hallucinations, and careful prompt assembly.

5

Prompt Content

Static content such as instructions and few-shot examples versus dynamic content. RAG: lexical and neural retrieval, embeddings, vector storage, and hierarchical summarization.

6

Assembling the Prompt

Working within the context window: prompt anatomy, document formats, and elastic snippets. Valley of Meh: the middle of the prompt tends to sag, so important content belongs closer to the edges.

7

Taming the Model

Anatomy of the model response: preamble, start and end markers, stop sequences, and streaming. Logprobs for confidence. Model choice as a balance of quality, price, and latency.

Related chapter

Hands-On Large Language Models

Visual explanation of RAG, agents and LangChain

Читать обзор

Part III: Advanced Topics

Agents with memory and tools, workflows, and quality evaluation.

8

Conversational Agency

Tool use: tool design, error handling, and dangerous actions. Reasoning patterns such as CoT and ReAct. Building the agent and its user experience.

9

LLM Workflows

When a workflow is better than an agent. Tasks as building blocks, reusable prompts, agent-driven workflows, stateful task agents, roles, and delegation.

10

Evaluating LLM Applications

Offline: example suites, gold answers, model-based judging, and SOMA. Online: A/B tests and product metrics.

11

Looking Ahead

Multimodality, user experience as part of quality, and the continuing rise in model capability and speed.

Practical insight: Valley of Meh

The middle of the prompt “sags”

Models usually pay more attention to the beginning and end of a prompt. Information in the middle is more likely to be ignored or weighted less.

Authors' recommendation:

  • Put critical instructions first, especially in the system prompt
  • Keep the most important context closer to the end
  • Leave lower-priority material in the middle

Relevance in 2026: Prompt → Context Engineering

Since the book was published, LLMs have improved significantly. Models understand users better even without complex prompts, and many strong practices are now built directly into tools and platforms.

Context Engineering

Andrey Karpathy's framing from 2025: focus on giving the model a complete environment, including data, history, tools, and constraints, rather than hunting for the perfect wording.

PromptOps

Prompt versioning, request-quality monitoring, and automated context preparation.

Conclusion: The core principles of the book still hold. RAG is now everywhere, and chain-of-thought has become a standard tool in many AI agents. The authors were upfront that APIs would age, but the underlying ideas would last.

Related chapters

Where to find the book

Enable tracking in Settings