System Design Space
Knowledge graphSettings

Updated: March 2, 2026 at 6:10 PM

Service Discovery

mid

Service discovery patterns in microservice architecture: registry, DNS-based discovery, health checking, load balancing and failure handling.

Context

Interservice communication patterns

Communication between services becomes fragile without the correct discovery and routing layer.

Open chapter

Service discovery solves the basic problem of a distributed environment: how services find each other in conditions of dynamic instances, failover and scaling. A reliable discovery circuit reduces time-to-recovery and reduces cascading incidents.

Discovery models

Client-side discovery

The client itself receives a list of instances from the registry and selects an endpoint via local load balancing.

Server-side discovery

The client accesses a stable entry point (LB/proxy), and routing to services is hidden within the infrastructure.

DNS-based discovery

Services are published as DNS names; clients use standard DNS resolvers and TTL policies.

Client-side discovery

SDK performs registry lookup, keeps a local instance pool, and routes requests on the client side.

Strengths

  • Maximum control over routing and retry logic on the client side.
  • Fast reaction to local latency and error-rate metrics.
  • Independent from a central proxy in the data path.

Limitations

  • Discovery SDK must be supported across all services and languages.
  • Harder to enforce uniform rules across the whole platform.
Best fit: High-load internal services with a unified SDK and mature observability platform.
Pipeline: registration -> health -> lookup -> routing
registry v42

Request queue

SD-REQ-101Web
billing / tenant:acme
SD-REQ-102Mobile
profile / user:42
SD-REQ-103Partner
orders / order:7712
SD-REQ-104Web
billing / tenant:globex

Discovery plane

Client performs lookup and selects an endpoint locally using a routing policy.

Waiting for request

service-a-01

healthy
zone: cluster-aload: 1served: 0

service-a-02

healthy
zone: cluster-bload: 2served: 0

service-a-03

healthy
zone: cluster-cload: 1served: 0
Ready

Ready to simulate the discovery flow.

Latest decision: —

Key Components

  • Service registry: stores current endpoints and instance metadata.
  • Health checks: readiness/liveness signal whether traffic can be directed to the instance.
  • Heartbeat/session TTL: automatic removal of inactive nodes from the discovery circuit.
  • Load balancing policy: round-robin, least-loaded, locality-aware routing.
  • Retry/timeout policy: protection against short-term failures and network fluctuations.

Foundation

DNS

DNS is the basic building block for many service discovery implementations.

Open chapter

Trade-offs

Consistency vs availability registry

Too strict consistency can impair the availability of discovery in case of network problems.

TTL freshness vs DNS/query overhead

A short TTL speeds up route updates, but increases the load on the DNS/control plane.

Centralized control vs local autonomy

A centralized control plane is convenient, but it increases the blast radius in case of configuration errors.

Dynamic endpoints vs cache staleness

Caches speed up lookups, but can keep stale addresses during failover.

Practical checklist

  • Implemented automatic deregistration logic when a node fails/isolated.
  • The behavior of discovery in partition scenarios and in case of controller failures has been tested.
  • Configured retries/timeouts with jitter and limiting repetitions.
  • There is monitoring of stale endpoints and latency lookup in the discovery path.
  • Service names and ownership are standardized at the platform level.

References

Related chapters

Enable tracking in Settings

System Design Space

© 2026 Alexander Polomodov