System Design Space
Knowledge graphSettings

Updated: April 7, 2026 at 7:45 PM

AI Engineering (short summary)

hard

AI engineering becomes a separate discipline when the model stops being an impressive demo and starts shaping product behavior, cost, and operations.

The chapter brings foundation models, prompting, RAG, agent workflows, and finetuning into one engineering loop where the real concern is not fashionable patterns, but answer quality, control, and risk boundaries.

For interviews and architecture discussions, it works as a map of decisions around the model: how to evaluate the system, where to place guardrails, when to increase complexity, and how to think about the runtime as a whole.

Practical value of this chapter

AI product loop

The book brings the model, context, knowledge, tools, and operations into one product loop instead of treating them as isolated techniques.

Evaluation and quality

It helps you discuss AI through validation, product metrics, human review, and degradation analysis rather than through the model alone.

From prompt to system

It is a strong guide for explaining how the path from prompting to RAG, agents, and finetuning becomes more complex as product expectations rise.

Interview material

The chapter gives you a strong frame for discussing quality, cost, latency, guardrails, and the operation of the AI system as a whole.

Source

Book Cube

Post with an overview of the book and the full episode series about AI engineering.

Open post

AI Engineering

Authors: Chip Huyen
Publisher: O'Reilly Media, Inc.
Length: 534 pages

Chip Huyen on building AI applications around foundation models: prompting, RAG, agents, finetuning, evaluation, and operations.

Original
Translated

AI engineering stack by Chip Huyen

Foundation Models

GPT-4, Claude, Gemini, Llama: model selection and a clear view of strengths and limitations

Prompting & Context

Prompt design, in-context examples, step-by-step reasoning, and system instructions

RAG & Knowledge

RAG architecture, vector stores, and embeddings

Agents & Tools

Agent workflows, function calling, and orchestration

Finetuning & Adaptation

SFT, RLHF, LoRA, and training-set design

Related chapter

Hands-On Large Language Models

Visual introduction to LLMs: tokenization, embeddings, and transformers

Читать обзор

Key ideas of the book

AI engineering is not the same as ML engineering

Model-as-a-service has lowered the entry barrier. AI engineering is about building applications around foundation models rather than training models from scratch.

Evaluation is the central theme

The deeper AI is embedded into a product, the more expensive errors become. That is why systematic validation, model-based judging, and product metrics are essential.

From simple to complex

The working progression is simple: start with prompt design, add RAG when needed, and move to finetuning only when it is truly justified.

Production concerns

Latency, the cost of inference, stability, and graceful degradation are not secondary details. They are part of delivering and operating AI features.

Related chapter

ML System Design

A Practical Guide to Designing ML Interview Systems

Читать обзор

Book structure: 10 chapters

Part I: Foundation

1

Introduction to Building AI Applications

The transition from ML to GenAI, the advantages of foundation models, tokens, multimodality, use cases, and AI as a platform.

2

Understanding Foundation Models

Data and languages, transformers and the attention mechanism, parameters and context window, post-training, and hallucinations.

Part II: AI Application Development

3

Evaluation Methodology

Why AI evaluation is difficult, entropy and perplexy, functional vs non-functional correctness.

4

Evaluate AI Systems

Model-based judging, pairwise comparisons, benchmarks and their limitations, the human baseline, and product validation.

5

Prompt Engineering

Prompt structure, in-context examples, step-by-step reasoning, system prompts, and the democratization of development.

6

RAG and Agents

Retrieval-Augmented Generation, vector stores, chunking strategies, agents and function calling.

7

Finetuning

When finetuning is justified, SFT versus RLHF, training-set design, LoRA, and effective adaptation.

Part III: AI Engineering in production

8

Dataset Engineering

Data collection and preparation, annotation, synthetic data, and the data improvement loop.

9

Inference Optimization

Latency and throughput, quantization, batching, caching, cost optimization.

10

AI Engineering Architecture and User Feedback

AI application architecture, feedback collection, continuous improvement, and the operational loop for GenAI.

Evaluation: key theme of the book

Chip Huyen treats evaluation as a central concern in AI engineering. Two full chapters are devoted to the methodology and practice:

Metrics

Perplexity, BLEU, ROUGE, semantic similarity, and task-specific metrics

Model-based judging

One model evaluates another: judge prompts, bias, and calibration

Product Validation

A/B tests, user feedback, business metrics alignment

Chapter Podcast Series

The book is discussed by Alexander Polomodov (CTO of T-Bank) and Evgeny Sergeev (Engineering Director, Flo).

Issue #1

Preface & Intro Chapter

  • Book overview and 10-chapter structure
  • Transition from ML to GenAI
  • Tokens and multimodality
  • Prompt design and democratization
  • Integration of MCP and Claude Desktop

Issue #2

Understanding Foundation Models

  • Model training stages
  • Transformers and the attention mechanism
  • Parameters and context window
  • Post-training: SFT and RLHF
  • Hallucinations and their causes

Issue #3

Evaluation (Ch. 3–4)

  • Why AI evaluation is difficult
  • Entropy and perplexity
  • Model-based judging
  • Pairwise model comparisons
  • Product validation

Related chapters

Where to find the book

Enable tracking in Settings