ML Engineering

19 chapters

This page contains all chapters in this theme. Use search and the type and difficulty filters to find the right material inside this section.

Difficulty:

Only chapters matching both the selected material type and the selected difficulty are shown.

ML Engineering: Designing Models, Pipelines, and the Production Loopeasy

Introductory map of ML Engineering: connecting model quality to error cost, release, serving, platform ownership, and production operations.

Machine Learning System Design (short summary)hard

Practical guide from Babushkin and Kravchenko: problem analysis, metrics, working with data, common mistakes and preparation for ML interviews.

Precision and recall at your fingertipseasy

A simple explanation of precision, recall, threshold choice, ROC AUC, and PR AUC built around the story of Vasya and the wolf.

ML Lifecycle: From Data and Training to Production and Feedback Loopsmedium

A practical map of the ML system lifecycle: data contracts, training, quality checks, model registry, release flow, monitoring, and retraining.

Model Release, Calibration, and Experiment Loopsmedium

How to release ML models safely: calibration, threshold tuning, shadow mode, canary release, A/B experiments, and rollback.

Model Serving and Inference Architecturemedium

How to design the live inference path for ML and LLM systems: online, batch, and stream modes, autoscaling, CPU/GPU routing, degraded behavior, and latency-cost trade-offs.

LLM Inference Optimizationhard

Inside the LLM inference engine: prefill vs decode, TTFT/TPOT and goodput, the KV-cache and PagedAttention, continuous batching, quantization (GPTQ/AWQ/FP8), speculative decoding, parallelism, and cost-per-token economics.

Vector Search and Approximate Nearest Neighbors (ANN)medium

The vector search layer in depth: ANN index families (IVF, HNSW), compression (PQ, IVF-PQ, ScaNN), distance metrics, recall/latency/memory trade-offs, hybrid search with BM25 and RRF, metadata filtering, scaling, and systems (FAISS, pgvector, Milvus, Qdrant, Weaviate).

LLM Post-Training: SFT, LoRA, and Alignment (DPO/RLHF)medium

How a pre-trained base model becomes a useful, aligned assistant: supervised fine-tuning (SFT), parameter-efficient fine-tuning (LoRA, QLoRA), and preference alignment (RLHF and DPO). Includes the pretraining → SFT → alignment pipeline, the fine-tune vs prompt/RAG fork, cost and evaluation, and common mistakes.

LLM Cost and Routingmedium

The economics of LLM applications: what makes up the cost (tokens, model size, context, KV cache, hosted vs self-hosted) and how to route requests between models with cascades, caching, token reduction, and an LLM gateway.

Model Context Protocol (MCP): A Standard for Tool Integrationmedium

How MCP, Anthropic's open protocol (November 2024), standardizes connecting LLM applications to tools and data: the M×N problem, host/client/server architecture, JSON-RPC 2.0 with stdio and Streamable HTTP, the tools/resources/prompts primitives, security, and operations.

Human-in-the-Loop, Data Quality, and the Operational AI Loopmedium

The operating loop of ML systems: feedback capture, annotation workflows, data quality, error analysis, drift investigation, and retraining triggers.

ML Ops Pipelinehard

Case study on the MLOps loop: data, features, training, model registry, rollout, live inference, and drift monitoring as one engineering system.

Feature Store & Model Servinghard

Case study on feature stores and model serving: preserving one meaning of features across training and runtime, keeping point-in-time correctness, and controlling training-serving skew.

The history of Google TPUs and their evolutionmedium

How Google moved from TPU v1 for inference to Ironwood: architectural trade-offs, compute economics, and what distinguishes the TPU approach from GPU-centric designs.

The History of NVIDIA AI Acceleratorsmedium

How NVIDIA moved from programmable GPUs and CUDA to Tensor Cores, DGX, H100, Blackwell, and rack-scale AI infrastructure: architectural inflection points, ecosystem leverage, and compute economics.

ML platform in T-Bank: the common good or better not neededmedium

Analysis of an interview about the evolution of the ML platform at T-Bank: how teams moved from manual SSH workflows to platform engineering, shared data flows, and mature model operations.

Fraud / Risk Scoring ML Systemhard

Practical ML case: realtime scoring, review operations, delayed labels, threshold tuning, drift analysis, and the next calibration cycle.

Ranking and Recommendation Architecture for ML Systemsmedium

How to design a recommendation loop: candidate generation, ranking, policy layers, freshness, feedback, and the next training cycle.