Context
Inside Envoy
Service mesh grew out of the practice of L7 proxy and centralization of service-to-service traffic.
Service mesh architecture is a way to move network and security policies into the platform layer. The main benefit: traffic control and security at scale. The main risk: growth in the operational complexity of the control plane.
Why is mesh implemented?
- Unified mTLS and identity policies between services without duplicating security code in each service.
- Traffic management on L7: retries, timeouts, traffic splitting, canary and failover policy.
- End-to-end telemetry (metrics/traces/access logs) with a consistent format and context.
- Quickly implement resilience patterns into fleets without manually rewriting each service.
Architectural layers
Data plane
Intercepts east-west traffic and applies runtime policy: routing, retries, timeouts, and mTLS handshake.
Includes
Envoy/ztunnel proxy, L4/L7 filters, connection pools.
Operational risk
Incorrect timeout/retry settings quickly increase p99 latency and error rate.
Mesh Queue
Mesh Control Loop
Waiting for next intent
payments-svc
profile-svc
inventory-svc
Ready to simulate mesh flows. You can start auto mode or execute one step.
Last decision: —
Security
Zero Trust
Mesh provides a transport framework, but access policy and identity governance need to be designed separately.
Rollout strategy
Start with a limited blast radius (1-2 namespace) and explicit SLOs before/after enabling mesh.
First implement observability and traffic policy, then mTLS everywhere and authZ policy.
Control resource costs: sidecar overhead on CPU/memory and impact on p99 latency.
Keep a fallback plan: the ability to quickly disable policy or bypass mesh for critical services.
Common mistakes
Mesh like a silver bullet
Mesh does not replace poor service design or fix implicit contracts between services.
Full rollout too early
Mass inclusion without phased adoption usually leads to complex incidents and rollback pressure.
Ignoring operational complexity
The control plane is a critical platform. We need versions, SLOs, runbooks and an ownership model.
Insufficient security policy
mTLS without authorization policy provides channel encryption, but not control of actions between services.
