System Design Space
Knowledge graphSettings

Updated: February 21, 2026 at 11:59 PM

Building Secure and Reliable Systems (short summary)

hard

Official website

Free version

The book is available for free on the Google SRE website.

Перейти на сайт

Building Secure and Reliable Systems

Authors: Heather Adkins, Betsy Beyer, Paul Blankinship, Piotr Lewandowski, Ana Oprea, Adam Stubblefield
Publisher: O'Reilly Media, Inc.
Length: 555 pages

Google practices: Zero Trust, defense in depth, secure SDLC, incident response and security culture.

Building Secure and Reliable Systems - original coverOriginal
Building Secure and Reliable Systems - translated editionTranslated

«Building Secure and Reliable Systems» is a book from the Google team that combines Security and Reliability practices into a single approach. The authors show that these disciplines do not contradict each other, but strengthen the system when used correctly.

Book structure

Part I

Introduction

The intersection of security and reliability, the role of culture, adversarial thinking.

Part II

Designing Systems

Design principles, least privilege, defense in depth, secure by default.

Part III

Implementing Systems

Secure code, testing, code review, dependency management.

Part IV

Maintaining Systems

Incident response, recovery, post-mortems, continuous improvement.

Part V

Organization and Culture

Team building, safety culture, training and awareness.

Security + Reliability: A Common Approach

Connection

SRE Book

Basic SRE practices: SLO, error budgets, toil reduction.

Читать обзор

Why are Security and Reliability related?

General goals:

  • Protecting the system from failures (internal and external)
  • Minimizing blast radius incidents
  • Fast detection and response
  • Recovery from failures

General practices:

  • Defense in depth
  • Least privilege
  • Monitoring and alerting
  • Incident response playbooks

Key insight

"Security and reliability failures often look similar from a systems perspective: both result in unavailable or degraded systems, data loss, and loss of user trust."

Safe Design Principles

Least Privilege

Minimum required rights to perform the task. Applies at all levels: users, services, processes, network policies.

Examples:

  • Service accounts with minimal IAM roles
  • Network policies: deny-all by default
  • Temporary credentials instead of long-lived keys
  • Just-in-time access for privileged operations

Defense in Depth

Multi-layered protection: if one layer is broken, the next one will stop the attack.

Perimeter

WAF, DDoS protection, rate limiting

Application

Input validation, AuthN/AuthZ, encryption

Data

Encryption at rest, access logging, backups

Secure by Default

Secure configuration out of the box. The user must explicitly weaken the protection, not enable it.

❌ Bad:

  • Public S3 buckets by default
  • Open ports 0.0.0.0/0
  • Weak password policies

✓ Good:

  • Private buckets, explicit public access
  • Deny-all network policies
  • MFA required, strong passwords

Fail Securely

When failures occur, the system should go into a safe state, not an open state.

// ❌ Fail open (unsafe)
if (authService.isDown()) {
  return allowAccess();  // Let everyone through if auth fails
}

// ✓ Fail closed (safe)
if (authService.isDown()) {
  return denyAccess();  // Block if auth fails
  // + alert for on-call
}

Zero Trust Architecture

Zero Trust Principles

"Never trust, always verify" - even internal traffic must be authenticated and authorized.

1. Verify explicitly

Authentication and authorization of each request based on all available data: identity, location, device, service, data classification.

2. Use least privilege access

Just-in-time and just-enough access. Temporary credentials, risk-based adaptive policies.

3. Assume breach

Minimizing blast radius through segmentation, end-to-end encryption, continuous monitoring.

Service-to-Service Authentication

Service A
Client
Service B
Resource
Identity
SPIRE / CA
Policy
OPA / Cedar
Request allowed
Verification passed, access grantedPolicy rejected the requestmTLS + identity + policy

Secure Development Lifecycle

Security at every stage

StageSecurity ActivitiesTools
DesignThreat modeling, security reviewSTRIDE, Attack trees
CodeSecure coding, SASTSemgrep, CodeQL
BuildDependency scanning, SBOMSnyk, Dependabot
TestDAST, fuzzing, pen testingOWASP ZAP, Burp Suite
DeployContainer scanning, IaC securityTrivy, Checkov
OperateMonitoring, incident responseSIEM, SOAR

Threat Modeling

Systematic threat analysis at the design stage.

STRIDE Framework:

  • Spoofing - identity substitution
  • Tampering - changing data
  • Repudiation - denial of actions
  • Information disclosure - data leak
  • Denial of service - denial of service
  • Elevation of privilege - increase in privileges

Supply Chain Security

Protection against attacks through dependencies and build pipeline.

Practices:

  • SBOM (Software Bill of Materials)
  • Signed artifacts and verified builds
  • Dependency pinning and lock files
  • Private artifact registries
  • SLSA framework compliance

Incident Response

Connection

Release It!

Resilience patterns: Circuit Breaker, Bulkhead, Timeouts.

Читать обзор

Security Incident Lifecycle

1

Detection

Monitoring, alerts, anomaly detection. Time to Detection (MTTD) is a critical metric.

2

Triage

Assessment of severity, scope, impact. Definition of the response team.

3

Containment

Isolation of affected systems, blocking of malicious traffic, revoke compromised credentials.

4

Eradication

Eliminating root cause, patching vulnerabilities, removing malware.

5

Recovery

Service restoration, integrity verification, monitoring of replay attacks.

6

Post-Incident Review

Blameless postmortem, lessons learned, process improvement.

Safety culture

Security Champions

Dedicated representatives on each team who promote security practices.

  • Conduct a security review of the command code
  • Participate in threat modeling
  • Train colleagues on best practices
  • Communication between the team and the security team

Blameless Culture

Focus on improving the system rather than punishing people for mistakes.

  • Encouraging the reporting of vulnerabilities
  • Postmortems without charges
  • Incident transparency
  • Continuous improvement mindset

Comparison with other books

BookFocusConnection
SRE BookReliability, SLO/SLIBasic Reliability Practices
Release It!Stability patternsPatterns of Resilience
DDIADistributed systemsDistributed systems theory
This BookSecurity + ReliabilityIntegration of both practices

Application at System Design Interview

Practice

API Gateway

Implementation of authentication and authorization at the gateway level.

Читать обзор

1. Authentication & Authorization

Mention Zero Trust, mTLS between services, JWT/OAuth for users, RBAC/ABAC for authorization.

2. Data Protection

Encryption at rest and in transit, key management (KMS), data classification, PII handling.

3. Blast Radius Reduction

Microservice isolation, network segmentation, domain failures, rate limiting.

4. Observability

Security logging, audit trails, anomaly detection, distributed tracing for forensics.

Key Findings

  • Security and Reliability reinforce each other — general practices: defense in depth, least privilege, fail-safe
  • Zero Trust — never trust, always verify, even for internal traffic
  • Secure by Default - secure configuration out of the box, obvious weakening
  • Shift Left Security — integration of security in the early stages of development
  • Blameless Culture - focus on improving the system, not on punishment

Where to find the book

Enable tracking in Settings

System Design Space

© 2026 Alexander Polomodov