System Design Space
Knowledge graphSettings

Updated: March 2, 2026 at 6:25 PM

Service Mesh Architecture

hard

Service mesh architecture: data plane/control plane, mTLS, traffic policy, observability and operational trade-offs.

Context

Inside Envoy

Service mesh grew out of the practice of L7 proxy and centralization of service-to-service traffic.

Open movie

Service mesh architecture is a way to move network and security policies into the platform layer. The main benefit: traffic control and security at scale. The main risk: growth in the operational complexity of the control plane.

Why is mesh implemented?

  • Unified mTLS and identity policies between services without duplicating security code in each service.
  • Traffic management on L7: retries, timeouts, traffic splitting, canary and failover policy.
  • End-to-end telemetry (metrics/traces/access logs) with a consistent format and context.
  • Quickly implement resilience patterns into fleets without manually rewriting each service.

Architectural layers

Data plane

Intercepts east-west traffic and applies runtime policy: routing, retries, timeouts, and mTLS handshake.

Includes

Envoy/ztunnel proxy, L4/L7 filters, connection pools.

Operational risk

Incorrect timeout/retry settings quickly increase p99 latency and error rate.

Layer: Sidecar / node proxy runtime
Loop: intent -> distribution -> enforcement -> observation
rev 42

Mesh Queue

MESH-201checkout-api
canary 10% / tenant:acme
MESH-202mobile-gateway
retry budget tune / user:42
MESH-203orders-api
timeout tighten / order:7712
MESH-204checkout-api
mTLS strict mode / tenant:globex

Mesh Control Loop

Waiting for next intent

telemetry signals: 0

payments-svc

cluster-a
rps: 118errors: 2policy rev: 42mTLS: on

profile-svc

cluster-b
rps: 96errors: 1policy rev: 42mTLS: on

inventory-svc

cluster-c
rps: 112errors: 3policy rev: 42mTLS: on
Ready

Ready to simulate mesh flows. You can start auto mode or execute one step.

Last decision: —

Security

Zero Trust

Mesh provides a transport framework, but access policy and identity governance need to be designed separately.

Open chapter

Rollout strategy

Start with a limited blast radius (1-2 namespace) and explicit SLOs before/after enabling mesh.

First implement observability and traffic policy, then mTLS everywhere and authZ policy.

Control resource costs: sidecar overhead on CPU/memory and impact on p99 latency.

Keep a fallback plan: the ability to quickly disable policy or bypass mesh for critical services.

Common mistakes

Mesh like a silver bullet

Mesh does not replace poor service design or fix implicit contracts between services.

Full rollout too early

Mass inclusion without phased adoption usually leads to complex incidents and rollback pressure.

Ignoring operational complexity

The control plane is a critical platform. We need versions, SLOs, runbooks and an ownership model.

Insufficient security policy

mTLS without authorization policy provides channel encryption, but not control of actions between services.

References

Related chapters

Enable tracking in Settings

System Design Space

© 2026 Alexander Polomodov