Inter-service communication does not fail on the happy path. It fails in timeouts, retries, and the assumptions each side forgets to make explicit.
In real design work, the chapter shows how synchronous and asynchronous patterns should be chosen through SLA, acceptable business latency, coupling, and explicit treatment of timeout, retry, backoff, and idempotency.
In interviews and engineering discussions, it helps frame backpressure, queue build-up, and partial failure as contract-level design properties rather than accidental implementation details.
Practical value of this chapter
Design in practice
Choose sync vs async communication by SLA, coupling, and acceptable business-flow latency.
Decision quality
Encode timeout, retry, backoff, and idempotency in contracts rather than ad-hoc handlers.
Interview articulation
Tie pattern choice to latency, reliability, and developer productivity outcomes.
Failure framing
Model backpressure and queue build-up before they become production incidents.
Context
Decomposition Strategies
The method of system decomposition determines the types and number of interservice interactions.
Interservice communication patterns you need to choose not according to fashion, but according to latency budget, criticality of the operation and operational restrictions. The main goal is predictable behavior of the system under load and during failures.
Synchronous patterns
HTTP/gRPC request-response
Suitable for low-latency requests when the client needs an immediate response. In production you usually need timeout budgets, retry budget, and explicit degradation policy.
Aggregator/BFF composition
A separate service collects data from several upstreams. Convenient for UI, but fan-out without cache and parallelism quickly becomes a latency bottleneck.
Asynchronous patterns
Queue-based async
Producer and consumer are decoupled in time; useful for smoothing spikes and background jobs. Fits command processing where controlled retries and DLQ are needed.
Pub/Sub events
One emitter publishes an event, multiple subscribers react independently. Good for extensibility and reducing coupling across teams.
Event-carried state transfer
The event carries enough context to reduce synchronous callback requests between services. The cost is stricter versioning discipline and larger payloads.
gRPC vs REST vs GraphQL: configurations and mini benchmark
Single region, internal VPC, TLS enabled, payload around 1 KiB.
Service node: 4 vCPU / 8 GB RAM, 300 concurrent virtual users.
Reads from in-memory cache, no external DB and no heavy business logic.
Numbers below are a lab baseline example, not universal truth.
REST (HTTP/1.1 + JSON)
Simple integration and interoperability with external clients
# NGINX upstream + keep-alive
upstream user_api {
server user-api:8080;
keepalive 256;
}
server {
listen 443 ssl http2;
location /v1/ {
proxy_http_version 1.1;
proxy_set_header Connection "";
proxy_set_header X-Request-Id $request_id;
proxy_read_timeout 300ms;
proxy_connect_timeout 80ms;
proxy_pass http://user_api;
}
}gRPC (HTTP/2 + Protobuf)
Lower protocol overhead and strict IDL contract
// service.proto
syntax = "proto3";
package catalog.v1;
service CatalogService {
rpc GetItem(GetItemRequest) returns (GetItemResponse);
}
// envoy cluster (fragment)
clusters:
- name: catalog_grpc
connect_timeout: 0.08s
type: STRICT_DNS
http2_protocol_options:
max_concurrent_streams: 512
load_assignment:
cluster_name: catalog_grpc
endpoints: ...GraphQL (BFF/Gateway)
Client-driven contract and composition across multiple domains
const server = new ApolloServer({
schema,
persistedQueries: {
cache: redisCache,
},
plugins: [responseCachePlugin()],
});
// resolver guardrails
const resolvers = {
Query: {
dashboard: async (_, args, ctx) =>
ctx.loaders.dashboardByUser.load(args.userId),
},
};| Approach | p50 latency | p95 latency | Throughput | Comment |
|---|---|---|---|---|
| REST (JSON, HTTP/1.1) | 12 ms | 41 ms | ~6.1k req/s | JSON serialization overhead and larger wire payload. |
| gRPC unary (Protobuf, HTTP/2) | 7 ms | 24 ms | ~9.8k req/s | Better CPU/network efficiency with similar business logic. |
| GraphQL gateway (persisted queries + DataLoader) | 15 ms | 53 ms | ~4.3k req/s | Great for UI flexibility, but resolver overhead and fan-out risks remain. |
Where the numbers come from
Protobuf schema evolution without pain
Never reuse field numbers after deletion.
Mark removed fields as `reserved` (both by number and by name).
Add new fields only as optional/nullable with safe default behavior.
For enums, always keep `*_UNSPECIFIED = 0` and handle unknown values.
Breaking changes (type change, moving into `oneof`, removing required behavior) require a new contract version.
Before (v1)
syntax = "proto3";
message UserProfile {
string user_id = 1;
string email = 2;
string phone = 3;
}After (v2, safe evolution)
syntax = "proto3";
message UserProfile {
string user_id = 1;
string email = 2;
reserved 3;
reserved "phone";
optional string telegram = 4;
}| Change | Backward | Forward | Comment |
|---|---|---|---|
| Added a new field | Yes | Yes | Older consumers ignore unknown fields. |
| Removed field + reserved | Conditional | No | If old producers still send it, the new consumer loses that value. |
| Changed field type (int32 -> string) | No | No | Wire format changes and decoding becomes unsafe. |
| Added enum value | Yes | Conditional | Old code needs fallback handling for unknown enum values. |
Performance
Performance Engineering
Latency and throughput should be measured on your own workloads with realistic payloads.
Latency/throughput comparison table
| Approach | Typical latency | Typical throughput | Common fit | Key trade-off |
|---|---|---|---|---|
| REST sync | 15-60 ms (p95) | 3k-8k req/s per node | External/public APIs, simple integrations | Heavier payloads and usually higher CPU serialization cost. |
| gRPC sync | 8-30 ms (p95) | 6k-15k req/s per node | Internal low-latency RPC, streaming | Needs IDL governance/tooling and HTTP/2 readiness. |
| GraphQL (BFF/Gateway) | 25-90 ms (p95) | 1k-5k req/s on gateway | UI aggregation, product-driven contracts | Resolver fan-out, harder profiling and caching. |
| Queue-based async | 40 ms - 2 s | 10k-120k msg/s | Background commands, smoothing traffic spikes | Eventual consistency and queue operations overhead. |
| Pub/Sub events | 20-300 ms | 50k-500k msg/s (cluster) | Domain events with multiple independent subscribers | Harder ordering/duplication control and contract evolution. |
Real event contracts: CloudEvents and AsyncAPI
CloudEvents (domain event example)
{
"specversion": "1.0",
"type": "com.shop.order.paid.v1",
"source": "urn:shop:payments",
"id": "evt-01HQ7V0R4Z6A0G3T95S1ZQ6B9N",
"time": "2026-03-03T14:23:44Z",
"subject": "order/938475",
"datacontenttype": "application/json",
"dataschema": "https://events.shop.dev/schemas/order-paid-v1.json",
"data": {
"orderId": "938475",
"userId": "u-1821",
"amount": 149.90,
"currency": "USD",
"paymentMethod": "card"
}
}AsyncAPI (channel + payload contract)
asyncapi: 3.0.0
info:
title: Order Events API
version: 1.4.0
channels:
order.paid.v1:
address: order.paid.v1
messages:
orderPaid:
$ref: '#/components/messages/OrderPaid'
operations:
onOrderPaid:
action: receive
channel:
$ref: '#/channels/order.paid.v1'
messages:
- $ref: '#/channels/order.paid.v1/messages/orderPaid'
components:
messages:
OrderPaid:
payload:
type: object
required: [orderId, userId, amount, currency]
properties:
orderId: { type: string }
userId: { type: string }
amount: { type: number }
currency: { type: string }Each event should include business key (`orderId`) and technical id (`id`) for deduplication.
Use explicit versioning in `type`/topic (`...v1`) and keep schema in a registry.
Document delivery SLA: at-least-once/exactly-once expectations and TTL.
Assign an owning team and deprecation policy for each event version.
Reliability
Fault Tolerance Patterns
Communication without resilience policies in a distributed environment is usually unstable.
How to choose a pattern
We need a response to the user within one HTTP request -> often sync path.
We need resistance to spikes and loose coupling -> async via queue/topic.
If the operation is money/order critical, check idempotency and ordering before selecting a pattern.
If you have a lot of cross-service hops, reduce the depth of synchronous chains and implement cache/materialized views.
Timeout budgets for each hop and general end-to-end deadline policy.
Retry with jitter and retry budget, so as not to create a retry storm when the dependency degrades.
Circuit breaker/bulkhead for fault isolation and concurrency control.
Idempotency keys for commands and deduplication for event-consumers.
DLQ/parking lot for invalid or problematic messages.
Practical checklist
- For each integration channel, an owner, SLO and error budget are specified.
- Contracts are versioned and verified by contract tests in CI.
- There is a degradation strategy when the downstream service is unavailable.
- The trace covers the end-to-end path through sync and async segments.
- Critical commands and events are processed idempotently.
References
Related chapters
- Event-Driven Architecture - Learn more about event models and designing asynchronous flows.
- Fault Tolerance Patterns - Timeout/retry/circuit breaker as a mandatory layer of inter-service communication.
- Consistency and idempotency - How to ensure data correctness during retries and re-delivery of events.
- Service Discovery - Communication between services relies on correct discovery of endpoints.
- Decomposition Strategies - Service boundaries directly affect the intensity and complexity of communications.
