System Design Space
Knowledge graphSettings

Updated: May 11, 2026 at 4:55 AM

Cloud Native (short summary)

hard

A Cloud Native book becomes valuable when it ties containers, functions, and data services into one operating picture instead of a pile of separate technologies.

In real design work, the chapter shows how to assemble application architecture for a concrete workload, choose a sensible level of abstraction, and evaluate the design through delivery speed, failure radius, and post-launch operating simplicity.

In interviews and engineering discussions, it helps frame cloud architecture through platform boundaries, SLA commitments, and cost of ownership rather than through a list of fashionable tools.

Practical value of this chapter

Design in practice

Connect containers, functions, and data services into one application architecture for real workloads.

Decision quality

Evaluate architecture through delivery speed, failure radius, and post-launch operating simplicity.

Interview articulation

Structure answers as platform decomposition: compute, data, messaging, observability, and security.

Trade-off framing

Explain how to choose abstraction level without losing SLA and cost control.

Related book

Building Microservices

Sam Newman on service boundaries, communication, and the cost of distribution.

Read review

Cloud Native

Authors: Boris Scholl, Trent Swanson, Peter Jausovec
Publisher: O'Reilly Media, 2019
Length: 229 pages

O'Reilly's practical guide to Cloud Native: containers, functions, data, resilience, GitOps, and observability.

Original
This chapter treats Cloud Native as a practical contract between an application and its platform: container runtime, serverless model, backing services, Infrastructure as Code, GitOps, resilience, and observability need to work together rather than live as separate practices.

Related chapter

Kubernetes Fundamentals

A practical overview of Kubernetes architecture, objects, and baseline practices.

Open chapter

What cloud-native architecture means

Cloud Native does not mean “we placed the app in a cloud account.” It means the service is designed for automation, elasticity, managed services, partial failure, and portability across environments.

Key characteristics

  • The application is packaged as a container image and does not depend on manual setup on a specific machine.
  • Infrastructure is described declaratively, from APIs to deployment policy.
  • State moves into backing services, while application processes stay stateless.
  • The platform handles scaling, restart, routing, and observability signals.

Practical value

  • Ship changes faster without manual server operations.
  • Scale services around actual workload shape rather than pre-purchased hardware.
  • Isolate failures and reduce blast radius through platform boundaries.
  • Collect operational signals early: logs, metrics, traces, and readiness checks.

Documentaries

Book structure

Part I

Cloud-native context

The book defines the language: cloud-native architecture, distributed-system challenges, The Twelve-Factor App, and the difference between cloud-native and merely cloud-enabled applications.

Part II

Application and platform patterns

Containers, orchestration, service communication, resilience, and patterns for surviving network failures and partial outages.

Part III

Data in cloud architecture

Data ownership, events, stream processing, CQRS, and Event Sourcing: data becomes part of a distributed contract, not just tables behind a service.

Part IV

Delivery, security, and operations

The final chapters connect CI/CD, GitOps, observability, and security into one operating model.

Containers and Kubernetes

Deep dive

Kubernetes Patterns

A pattern catalog for Kubernetes: sidecars, health probes, configuration, and advanced patterns.

Read review

Containers

  • Application isolation through namespaces and cgroups.
  • Immutable images make execution reproducible.
  • A layered filesystem makes builds and image distribution more efficient.
  • Container registries store versions that the platform can deploy.

Core Kubernetes objects

  • Pod is the smallest execution unit.
  • Service gives a stable network address for a group of Pods.
  • Deployment handles declarative updates and rollbacks.
  • ConfigMap / Secret carry configuration and sensitive values.
# Kubernetes Deployment example
apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-app
spec:
  replicas: 3
  selector:
    matchLabels:
      app: my-app
  template:
    spec:
      containers:
      - name: my-app
        image: my-app:v1.2.0
        resources:
          limits:
            memory: "256Mi"
            cpu: "500m"

Serverless functions

The serverless model lets code run without server management. The platform scales execution and charges for actual usage, but in exchange it defines limits around time, memory, networking, and event model.

AWS Lambda

Function as a Service, event triggers, AWS integrations, and bounded execution time.

Azure Functions

Functions, Durable Functions for long-running workflows, and bindings to platform events.

Google Cloud Functions

HTTP triggers, event triggers, and Cloud Run when a container-based option is a better fit.

When it fits

  • Event-driven processing for small independent tasks.
  • API handlers with variable load.
  • Scheduled jobs for background operations.
  • Data transformation pipelines without a dedicated processing server.

Constraints

  • Cold-start latency on rare or heavy invocations.
  • Limits on execution time and resource size.
  • Stateless processes by default.
  • Vendor lock-in to a provider’s event model.

Data management

Deep dive

Designing Data-Intensive Applications, 2nd Edition

DDIA on replication, sharding, and consistency guarantees in distributed systems.

Read review

Database per service

A service owns its data instead of sharing one schema with every neighbor. This reduces coupling, but makes distributed transactions harder and requires an explicit consistency model.

Polyglot persistenceSagaEventual consistency

Event-driven architecture

Services publish events and react to them asynchronously. This helps scale processing, but requires idempotency, replay strategy, and careful event schemas.

Event Sourcing

Store the history of events, not only current state

CQRS

Separate write commands from read queries

Resilience patterns

Classic

Release It!

Michael Nygard introduced the Circuit Breaker and other stability patterns.

Read review

Retries with backoff

Retries help with transient failures, while exponential backoff and jitter reduce the risk of a thundering herd.

Circuit breaker

A circuit breaker stops sending requests to a degraded service and protects the system from cascading failure.

Health checks

A liveness probe answers whether the process is alive; a readiness probe answers whether traffic can be sent to it.

Bulkhead

Bulkheads contain failure propagation: one pool, queue, or dependency should not take down the whole system.

DevOps and observability

Delivery practices

GitOps

Git acts as the source of truth for infrastructure and platform changes.

Canary release

A canary release exposes the new version to a small slice of traffic and compares it against metrics.

Blue-Green

Blue-green deployment keeps two environments and switches traffic between them.

Three pillars of observability

Logs

Structured logging, ELK or Loki, and correlation IDs for finding the full request chain.

Metrics

Prometheus, Grafana, and RED/USE methods for load, errors, and resource saturation.

Traces

Distributed tracing through Jaeger, Zipkin, or OpenTelemetry shows the path of a request across services.

Using this on system design interviews

Useful concepts

  • Container orchestration with Kubernetes.
  • Serverless for event-driven processing.
  • Database per service and explicit data ownership.
  • Circuit breakers, retries, and health checks.
  • Graceful shutdown and stateless execution.
  • Observability: logs, metrics, and traces.

Where it helps

  • “How would you deploy and scale the service?”
  • “How would you survive partial failures and dependency degradation?”
  • “How would you observe a distributed system in production?”
  • “How would you choose storage and data boundaries for a microservice?”
  • “How would you implement event processing without retry chaos?”

Key takeaways

Cloud Native is not just running “in the cloud”; it is designing applications for automation and partial failure.
Containers and orchestration give teams portable packaging, controlled execution, and scaling.
Serverless is useful for event-driven tasks, but cold starts and platform dependency must be part of the design.
Resilience comes from a set of guardrails: retries, breakers, checks, and isolation.
Observability is needed from day one, otherwise a distributed system quickly turns into a black box.
GitOps and CI/CD turn infrastructure and releases into a repeatable, reviewable process.

Related chapters

Where to find the book

Enable tracking in Settings