AI Engineering (short summary) — System Design Space

AI engineering becomes a separate discipline when the model stops being an impressive demo and starts shaping product behavior, cost, and operations.

The chapter brings foundation models, prompting, RAG, agent workflows, and finetuning into one engineering loop where the real concern is not fashionable patterns, but answer quality, control, and risk boundaries.

For interviews and architecture discussions, it works as a map of decisions around the model: how to evaluate the system, where to place guardrails, when to increase complexity, and how to think about the runtime as a whole.

Practical value of this chapter

AI product loop

The book brings the model, context, knowledge, tools, and operations into one product loop instead of treating them as isolated techniques.

Evaluation and quality

It helps you discuss AI through validation, product metrics, human review, and degradation analysis rather than through the model alone.

From prompt to system

It is a strong guide for explaining how the path from prompting to RAG, agents, and finetuning becomes more complex as product expectations rise.

Interview material

The chapter gives you a strong frame for discussing quality, cost, latency, guardrails, and the operation of the AI system as a whole.

Source

Book Cube

Post with an overview of the book and the full episode series about AI engineering.

Open post

AI Engineering

Authors: Chip Huyen
Publisher: O'Reilly Media, Inc.
Length: 534 pages

Chip Huyen on building AI applications around foundation models: prompting, RAG, agents, finetuning, evaluation, and operations.

Original

Translated

AI engineering stack by Chip Huyen

Foundation Models

GPT-4, Claude, Gemini, Llama: model selection and a clear view of strengths and limitations

Prompting & Context

Prompt design, in-context examples, step-by-step reasoning, and system instructions

RAG & Knowledge

RAG architecture, vector stores, and embeddings

Agents & Tools

Agent workflows, function calling, and orchestration

Finetuning & Adaptation

SFT, RLHF, LoRA, and training-set design

Related chapter

Hands-On Large Language Models

Visual introduction to LLMs: tokenization, embeddings, and transformers

Читать обзор

Key ideas of the book

AI engineering is not the same as ML engineering

Model-as-a-service removes the most expensive step — training from scratch. The center of gravity shifts to building applications around foundation models: choosing a model, holding quality and cost in check, and getting the feature to the user.

Evaluation is the central theme

The deeper AI is embedded into a product, the more expensive errors become. That is why systematic validation, model-based judging, and product metrics are essential.

From simple to complex

The order of complexity follows the cost of change: start with prompt design, add RAG when the model lacks the knowledge, and reach for finetuning last — it drags in data, infrastructure, and upkeep that you cannot roll back cheaply.

Production concerns

Latency, the cost of inference, stability, and graceful degradation decide whether a feature survives to the user. A prototype passes a demo without them; in production the first load spike or provider outage turns each one into an incident.

Related chapter

ML System Design

A Practical Guide to Designing ML Interview Systems

Читать обзор

Book structure: 10 chapters

Part I: Foundation

Introduction to Building AI Applications

The transition from ML to GenAI, the advantages of foundation models, tokens, multimodality, use cases, and AI as a platform.

Understanding Foundation Models

Data and languages, transformers and the attention mechanism, parameters and context window, post-training, and hallucinations.

Part II: AI Application Development

Evaluation Methodology

Why AI evaluation is difficult, entropy and perplexy, functional vs non-functional correctness.

Evaluate AI Systems

Model-based judging, pairwise comparisons, benchmarks and their limitations, the human baseline, and product validation.

Prompt Engineering

Prompt structure, in-context examples, step-by-step reasoning, system prompts, and the democratization of development.

RAG and Agents

Retrieval-Augmented Generation, vector stores, chunking strategies, agents and function calling.

Finetuning

When finetuning is justified, SFT versus RLHF, training-set design, LoRA, and effective adaptation.

Part III: AI Engineering in production

Dataset Engineering

Data collection and preparation, annotation, synthetic data, and the data improvement loop.

Inference Optimization

Latency and throughput, quantization, batching, caching, cost optimization.

AI Engineering Architecture and User Feedback

AI application architecture, feedback collection, continuous improvement, and the operational loop for GenAI.

Evaluation: key theme of the book

A foundation model is non-deterministic, and its output cannot be checked against a reference line by line — so without a dedicated evaluation loop, regressions surface with users first. Chip Huyen gives this two full chapters: the methodology and the practice of validation.

Metrics

Perplexity, BLEU, ROUGE, semantic similarity, and task-specific metrics

Model-based judging

One model evaluates another: judge prompts, bias, and calibration

Product Validation

A/B tests, user feedback, business metrics alignment

Chapter Podcast Series

The book is discussed by Alexander Polomodov (CTO of T-Bank) and Evgeny Sergeev (Engineering Director, Flo).

Issue #1

Preface & Intro Chapter

Book overview and 10-chapter structure
Transition from ML to GenAI
Tokens and multimodality
Prompt design and democratization
Integration of MCP and Claude Desktop

YouTube VK Podster

Issue #2

Understanding Foundation Models

Model training stages
Transformers and the attention mechanism
Parameters and context window
Post-training: SFT and RLHF
Hallucinations and their causes

YouTube VK Podster

Issue #3

Evaluation (Ch. 3–4)

Why AI evaluation is difficult
Entropy and perplexity
Model-based judging
Pairwise model comparisons
Product validation

YouTube VK Podster

Related chapters

ML Lifecycle: From Data and Training to Production and Feedback Loops - Provides the core map of an ML system lifecycle: data, release, runtime, and the next improvement loop.
Evaluation and Observability for AI Systems - Turns evaluation, human review, and degradation analysis into a dedicated engineering discipline.
Model Serving and Inference Architecture - Breaks down the runtime path in detail: latency budgets, hardware routing, fallback, and cost.
Hands-On Large Language Models (short summary) - Provides the LLM foundation needed to make architectural decisions more deliberately.
Prompt Engineering for LLMs (short summary) - Expands on prompt and context design as a core part of AI application architecture.
Developing Apps with GPT-4 and ChatGPT (short summary) - Shows the applied path from API prototype to a usable AI application with basic guardrails.
An Illustrated Guide to AI Agents (short summary) - Continues the topic through agent patterns: tools, planning, memory, and orchestration.
AI Engineering Interviews (short summary) - Reinforces the same ideas through interview-style questions, trade-offs, and architecture cases.
Generative AI System Design Interview (short summary) - Turns AI Engineering into a GenAI System Design Interview frame: requirements, data, model, evaluation, safety, and monitoring.

Where to find the book

Original

oreilly.com

AI Engineering

Translated

piter.com

AI-инженерия. Построение приложений с использованием базовых моделей