A social platform is hard because the user sees one product while underneath it is a set of tightly connected systems: graph, feed, media, messaging, notifications, and moderation.
The case helps draw the boundary between shared platform capabilities and domain services, showing where a common infrastructure layer is justified and where separate product pipelines are healthier.
For interviews and architecture discussions, it is useful because it trains macro-level reasoning: how services evolve together, where coupling emerges, and which platform investments actually pay off.
Pipeline Thinking
Ingestion, partitioning, deduplication, and stage latency drive system behavior.
Serving Layer
Index and cache-locality decisions directly shape user-facing query latency.
Consistency Window
Explicitly define where eventual consistency is acceptable and where it is not.
Cost vs Freshness
Balance update frequency with compute/storage cost and operational complexity.
Acing SDI
Practice task from chapter 14
Infrastructure view of social media: from feature-level design to operational architecture.
Social Media Infrastructure View is not about one feature. It is about platform operability: SLOs, fault isolation, observability, and controlled rolloutfor a large consumer system.
Functional requirements
- Post publishing and interaction capture (like/comment/share).
- Timeline generation for different user segments.
- Moderation hooks and safe feature degradation.
- Operational tooling for incident response.
Non-functional requirements
- SLOs for key user journeys: open feed, publish, refresh.
- Horizontal scale under celebrity spikes.
- Controlled blast radius between services.
- Observability baseline: metrics, logs, traces, error budgets.
High-Level Architecture
Theory
Twitter/X
Practical feed-system case: fanout, cache topology, ranking, and scaling trade-offs.
High-Level Architecture
feed platform + ranking + SLO control loopThis topology combines publish path, feed serving path, and an operational control loop for platform stability.
The architecture separates publish/feed paths and a dedicated control loop for SLO, observability, and degradation policies. This keeps blast radius limited during spikes and incidents.
Write/Read Paths
Write/Read Paths
How publishing flows through infrastructure and how timelines are served under heavy read load.
Write path: post request is validated, committed to durable storage, and propagated through async fanout into timeline/notification/moderation pipelines.
Client Post
Layer 1create content
User publishes content from mobile/web client.
Gateway + Auth
Layer 2validate request
Gateway checks auth/quota and routes request to post service.
Post Service
Layer 3durable commit
Post is committed into durable storage and event is produced.
Async Fanout
Layer 4timeline + moderation
Event fans out into timeline build, moderation, and notification pipelines.
User Signals
Layer 5feed + notifications
Followers receive timeline updates/notifications without blocking publish ACK.
Write path checkpoints
- •Durable post commit happens before downstream fanout.
- •Moderation and notifications are typically asynchronous and isolated from core feed availability.
- •Celebrity posts require controlled fanout to avoid consumer overload.
Runtime strategies
- Bulkheads and circuit breakers between feed and dependent services.
- Graceful degradation with fallback ranking and feature gating.
- Canary/blue-green rollout for critical paths.
- Autoscaling by queue depth and latency saturation.
Observability
- SLOs for user journeys, not only internal APIs.
- Trace correlation: feed request -> graph fetch -> ranking -> fanout.
- Error budget policy for engineering prioritization.
- Runbook-driven incidents with postmortem feedback loop.
Critical trade-offs
- Fanout-on-write vs fanout-on-read for celebrity users.
- Latency vs ranking/personalization depth.
- Timeline consistency vs availability under partial outages.
- Release speed vs production risk.
Related chapters
- Twitter/X - Practical social feed case: fanout strategies, ranking depth, cache topology, and spike handling.
- Event-Driven Architecture - Asynchronous event pipelines for publish flow, timeline assembly, and decoupled scaling.
- Resilience patterns - Bulkheads, circuit breakers, and graceful degradation under partial outages.
- Observability and monitoring design - User-journey SLIs/SLOs, trace correlation, and operational decision loops.
- SRE Book - Error-budget governance, reliability-focused releases, and incident discipline.
- Notification System - Adjacent engagement channel with asynchronous delivery and controlled fallback modes.
