Inter-service communication rarely fails on the happy path. It fails in timeouts, retries, and assumptions each side forgot to make explicit.
In real design work, the chapter shows how to choose synchronous and asynchronous patterns by SLA, acceptable business latency, coupling, and explicit rules for timeout, retry, backoff, and idempotency.
In interviews and engineering discussions, it helps frame backpressure, queue build-up, and partial failure as contract-level design properties rather than accidental implementation details.
Practical value of this chapter
Design in practice
Choose the interaction style by SLA, service coupling, and acceptable business-flow latency.
Decision quality
Encode timeout, retry, backoff, and idempotency in contracts rather than improvised handlers.
Interview articulation
Tie pattern choice to latency, reliability, and delivery-speed outcomes.
Failure framing
Model backpressure and queue growth before they become incidents in production.
Context
Decomposition Strategies
Service boundaries shape both the number and type of interactions between services.
Interservice communication patterns should be chosen by latency budget, operation criticality, and operational constraints rather than by fashion. The goal is predictable system behavior under load and during failures.
Synchronous interaction patterns
HTTP/gRPC request-response
Works well when the caller needs an immediate answer and the latency budget is tight. In production it needs clear timeouts, a retry budget, and an explicit degradation policy.
Aggregator/BFF composition
A dedicated service composes data from several upstream services. It is convenient for UI, but fan-out without caching and parallelism quickly becomes a latency bottleneck.
Asynchronous interaction patterns
Queue-based async
Producer and consumer are decoupled in time, which helps smooth traffic spikes and background work. It fits command processing when retries and a dead-letter queue must be controlled.
Pub/Sub events
One service publishes an event and multiple subscribers react independently. It improves extensibility and reduces coupling across teams.
Event-carried state transfer
The event carries enough context to reduce synchronous callback requests between services. The cost is stricter versioning discipline and careful payload sizing.
gRPC, REST, and GraphQL: configuration examples and mini benchmark
Single region, internal VPC, TLS enabled, payload around 1 KiB.
Service node: 4 vCPU / 8 GB RAM, 300 concurrent virtual users.
Reads from in-memory cache, no external DB and no heavy business logic.
Numbers below are a lab comparison point, not a universal law.
REST (HTTP/1.1 + JSON)
Simple integration and interoperability with external clients
# NGINX upstream + keep-alive
upstream user_api {
server user-api:8080;
keepalive 256;
}
server {
listen 443 ssl http2;
location /v1/ {
proxy_http_version 1.1;
proxy_set_header Connection "";
proxy_set_header X-Request-Id $request_id;
proxy_read_timeout 300ms;
proxy_connect_timeout 80ms;
proxy_pass http://user_api;
}
}gRPC (HTTP/2 + Protobuf)
Lower protocol overhead and a strict IDL contract
// service.proto
syntax = "proto3";
package catalog.v1;
service CatalogService {
rpc GetItem(GetItemRequest) returns (GetItemResponse);
}
// envoy cluster (fragment)
clusters:
- name: catalog_grpc
connect_timeout: 0.08s
type: STRICT_DNS
http2_protocol_options:
max_concurrent_streams: 512
load_assignment:
cluster_name: catalog_grpc
endpoints: ...GraphQL (BFF/Gateway)
Client-driven contract and composition across multiple domains
const server = new ApolloServer({
schema,
persistedQueries: {
cache: redisCache,
},
plugins: [responseCachePlugin()],
});
// resolver guardrails
const resolvers = {
Query: {
dashboard: async (_, args, ctx) =>
ctx.loaders.dashboardByUser.load(args.userId),
},
};| Approach | p50, ms | p95, ms | Throughput | Comment |
|---|---|---|---|---|
| REST (JSON, HTTP/1.1) | 12 ms | 41 ms | ~6.1k req/s | JSON serialization overhead and more bytes on the wire. |
| gRPC unary (Protobuf, HTTP/2) | 7 ms | 24 ms | ~9.8k req/s | Better CPU and network efficiency with similar business logic. |
| GraphQL gateway (persisted queries + DataLoader) | 15 ms | 53 ms | ~4.3k req/s | Great for UI flexibility, but resolver overhead and fan-out risks remain. |
Where the numbers come from
Protobuf schema evolution without surprises
Never reuse field numbers after deletion.
Mark removed fields as `reserved` (both by number and by name).
Add new fields only as optional/nullable with safe default behavior.
For enums, always keep `*_UNSPECIFIED = 0` and handle unknown values.
Breaking changes (type change, moving into `oneof`, removing required behavior) require a new contract version.
Before (v1)
syntax = "proto3";
message UserProfile {
string user_id = 1;
string email = 2;
string phone = 3;
}After (v2, safe evolution)
syntax = "proto3";
message UserProfile {
string user_id = 1;
string email = 2;
reserved 3;
reserved "phone";
optional string telegram = 4;
}| Change | Backward | Forward | Comment |
|---|---|---|---|
| Added a new field | Yes | Yes | Older consumers ignore unknown fields. |
| Removed field + reserved | Conditional | No | If old producers still send it, the new consumer loses that value. |
| Changed field type (int32 -> string) | No | No | Wire format changes and decoding becomes unsafe. |
| Added enum value | Yes | Conditional | Old code needs fallback handling for unknown enum values. |
Performance
Performance Engineering
Latency and throughput should be measured on your own workloads with realistic payloads.
Latency and throughput comparison
| Approach | Typical latency | Typical throughput | Common fit | Key trade-off |
|---|---|---|---|---|
| REST sync | 15-60 ms (p95) | 3k-8k req/s per node | External/public APIs, simple integrations | Heavier payloads and usually higher CPU serialization cost. |
| gRPC sync | 8-30 ms (p95) | 6k-15k req/s per node | Internal low-latency RPC, streaming | Needs IDL governance/tooling and HTTP/2 readiness. |
| GraphQL (BFF/Gateway) | 25-90 ms (p95) | 1k-5k req/s on the gateway | UI aggregation, product-driven contracts | Resolver fan-out, harder profiling and caching. |
| Queue-based async | 40 ms - 2 s | 10k-120k msg/s | Background commands, smoothing traffic spikes | Eventual consistency and a separate operational loop for queues. |
| Pub/Sub events | 20-300 ms | 50k-500k msg/s (cluster) | Domain events with multiple independent subscribers | Harder ordering/duplication control and contract evolution. |
Real event contracts: CloudEvents and AsyncAPI
CloudEvents (domain event example)
{
"specversion": "1.0",
"type": "com.shop.order.paid.v1",
"source": "urn:shop:payments",
"id": "evt-01HQ7V0R4Z6A0G3T95S1ZQ6B9N",
"time": "2026-03-03T14:23:44Z",
"subject": "order/938475",
"datacontenttype": "application/json",
"dataschema": "https://events.shop.dev/schemas/order-paid-v1.json",
"data": {
"orderId": "938475",
"userId": "u-1821",
"amount": 149.90,
"currency": "USD",
"paymentMethod": "card"
}
}AsyncAPI (channel + payload contract)
asyncapi: 3.0.0
info:
title: Order Events API
version: 1.4.0
channels:
order.paid.v1:
address: order.paid.v1
messages:
orderPaid:
$ref: '#/components/messages/OrderPaid'
operations:
onOrderPaid:
action: receive
channel:
$ref: '#/channels/order.paid.v1'
messages:
- $ref: '#/channels/order.paid.v1/messages/orderPaid'
components:
messages:
OrderPaid:
payload:
type: object
required: [orderId, userId, amount, currency]
properties:
orderId: { type: string }
userId: { type: string }
amount: { type: number }
currency: { type: string }Each event should include business key (`orderId`) and technical id (`id`) for deduplication.
Use explicit versioning in `type`/topic (`...v1`) and keep the schema in a registry.
Document delivery SLA, at-least-once/exactly-once expectations, and TTL.
Assign an owning team and deprecation policy for each event version.
Reliability
Fault Tolerance Patterns
Distributed communication is fragile unless resilience policies are part of the contract.
How to choose an interaction pattern
Need a response to the user within one HTTP request -> prefer a synchronous path.
Need resistance to spikes and loose coupling -> use async communication through a queue or topic.
If the operation is money/order critical, check idempotency and ordering before selecting a pattern.
If you have a lot of cross-service hops, reduce the depth of synchronous chains and implement cache/materialized views.
Timeout budget for each service hop and one end-to-end deadline policy.
Retries with jitter and a retry budget, so dependency degradation does not turn into a retry storm.
Circuit breaker/bulkhead for fault isolation and concurrency control.
Idempotency keys for commands and deduplication for event consumers.
Dead-letter and parking-lot queues for invalid or problematic messages.
Practical checklist
- For each integration channel, an owner, SLO and error budget are specified.
- Contracts are versioned and verified by contract tests in CI.
- There is a degradation strategy when the downstream service is unavailable.
- Tracing covers the end-to-end path through synchronous and asynchronous segments.
- Critical commands and events are processed idempotently.
References
Related chapters
- Event-Driven Architecture - Learn more about event models and designing asynchronous flows.
- Fault Tolerance Patterns - Timeouts, retries, and circuit breakers as a mandatory layer of inter-service communication.
- Consistency and idempotency - How to ensure data correctness during retries and re-delivery of events.
- Service Discovery - Communication between services relies on correct discovery of endpoints.
- Decomposition Strategies - Service boundaries directly affect the intensity and complexity of communications.
