Interservice communication patterns

Inter-service communication rarely fails on the happy path. It fails in timeouts, retries, and assumptions each side forgot to make explicit.

In real design work, the chapter shows how to choose synchronous and asynchronous patterns by SLA, acceptable business latency, coupling, and explicit rules for timeout, retry, backoff, and idempotency.

In interviews and engineering discussions, it helps frame backpressure, queue build-up, and partial failure as contract-level design properties rather than accidental implementation details.

Practical value of this chapter

Design in practice

Choose the interaction style by SLA, service coupling, and acceptable business-flow latency.

Decision quality

Encode timeout, retry, backoff, and idempotency in contracts rather than improvised handlers.

Interview articulation

Tie pattern choice to latency, reliability, and delivery-speed outcomes.

Failure framing

Model backpressure and queue growth before they become incidents in production.

Context

Decomposition Strategies

Service boundaries shape both the number and type of interactions between services.

Open chapter

Interservice communication patterns are chosen not by fashion but by three things: latency budget, operation criticality, and operational constraints. That choice decides whether system behavior stays predictable under load and during failures. This chapter spans the whole spectrum — both synchronous requests and asynchronous queues and events; a focused comparison of the synchronous API styles (REST, gRPC, GraphQL) lives in "Remote API Calls".

Synchronous interaction patterns

HTTP/gRPC request-response

Works well when the caller needs an immediate answer and the latency budget is tight. In production it needs clear timeouts, a retry budget, and an explicit degradation policy.

Aggregator/BFF composition

A dedicated service composes data from several upstream services. It is convenient for UI, but fan-out without caching and parallelism quickly becomes a latency bottleneck.

Asynchronous interaction patterns

Queue-based async

Producer and consumer are decoupled in time, which helps smooth traffic spikes and background work. It fits command processing when retries and a dead-letter queue must be controlled.

Pub/Sub events

One service publishes an event and multiple subscribers react independently. The publisher knows nothing about its consumers - you can add a new one without touching it, and that reduces coupling across teams.

Event-carried state transfer

The event carries enough context to reduce synchronous callback requests between services. The cost is stricter versioning discipline and careful payload sizing.

gRPC, REST, and GraphQL: configuration examples and overhead comparison

Absolute p50/p95 and req/s figures depend on payload size and shape, the network, TLS, the language runtime, and connection pool settings - someone else's synthetic benchmark does not transfer to your system.

It is more honest to compare the approaches by their overhead structure: serialization cost, connection count, and extra request-processing phases.

If you need quantitative anchors, measure on your own workload profile: real messages, a real network, and your target percentiles.

Public benchmarks (for example, TechEmpower rounds) compare specific frameworks and configurations, not 'REST vs gRPC' in the abstract: carry over the conclusions, not the numbers.

REST (HTTP/1.1 + JSON)

Simple integration and interoperability with external clients

# NGINX upstream + keep-alive
upstream user_api {
  server user-api:8080;
  keepalive 256;
}

server {
  listen 443 ssl http2;
  location /v1/ {
    proxy_http_version 1.1;
    proxy_set_header Connection "";
    proxy_set_header X-Request-Id $request_id;
    proxy_read_timeout 300ms;
    proxy_connect_timeout 80ms;
    proxy_pass http://user_api;
  }
}

gRPC (HTTP/2 + Protobuf)

Lower protocol overhead and a strict IDL contract

// service.proto
syntax = "proto3";
package catalog.v1;

service CatalogService {
  rpc GetItem(GetItemRequest) returns (GetItemResponse);
}

// envoy cluster (fragment)
clusters:
  - name: catalog_grpc
    connect_timeout: 0.08s
    type: STRICT_DNS
    http2_protocol_options:
      max_concurrent_streams: 512
    load_assignment:
      cluster_name: catalog_grpc
      endpoints: ...

GraphQL (BFF/Gateway)

Client-driven contract and composition across multiple domains

const server = new ApolloServer({
  schema,
  persistedQueries: {
    cache: redisCache,
  },
  plugins: [responseCachePlugin()],
});

// resolver guardrails
const resolvers = {
  Query: {
    dashboard: async (_, args, ctx) =>
      ctx.loaders.dashboardByUser.load(args.userId),
  },
};

Approach	Serialization and data size	Transport and parallelism	Where it usually wins
REST (JSON, HTTP/1.1)	Text-based JSON: more bytes on the wire and noticeable CPU parsing cost.	One in-flight request per connection; parallelism comes from a keep-alive connection pool, and queuing inside a connection runs into head-of-line blocking.	External APIs and integrations where compatibility, readability, and easy debugging matter most.
gRPC unary (Protobuf, HTTP/2)	Binary Protobuf is more compact and cheaper to parse than JSON; the gain is most visible on small structured messages.	HTTP/2 multiplexes concurrent calls over a single connection and supports streaming.	Internal calls with frequent small messages, tight latency budgets, and streaming.
GraphQL gateway (persisted queries + DataLoader)	The response is still JSON; on top of it come query parsing and validation plus resolver execution.	Runs over plain HTTP; the bottleneck is not the transport but resolver calls to downstream services - up to the N+1 query problem.	UI aggregation: fewer client round-trips and precise field selection at the cost of gateway overhead.

Where to find real measurements

gRPC Benchmarking Guide - methodology behind gRPC's continuous cross-language benchmarks; no REST comparison there
Protocol Buffers Overview - why the binary format is smaller and faster than JSON, and where it is not a good fit
GraphQL Performance - the N+1 problem, batching with DataLoader, and query demand control
TechEmpower Web Framework Benchmarks - framework comparison rounds; the numbers are tied to a specific round and hardware

Protobuf schema evolution without surprises

Never reuse field numbers after deletion.

Mark removed fields as `reserved` (both by number and by name).

Add new fields only as optional/nullable with safe default behavior.

For enums, always keep `*_UNSPECIFIED = 0` and handle unknown values.

Breaking changes (type change, moving into `oneof`, removing required behavior) require a new contract version.

Before (v1)

syntax = "proto3";
message UserProfile {
  string user_id = 1;
  string email = 2;
  string phone = 3;
}

After (v2, safe evolution)

syntax = "proto3";
message UserProfile {
  string user_id = 1;
  string email = 2;
  reserved 3;
  reserved "phone";
  optional string telegram = 4;
}

Change	Backward	Forward	Comment
Added a new field	Yes	Yes	Older consumers ignore unknown fields.
Removed field + reserved	Yes	Yes	Wire-compatible both ways: values from old producers land in unknown fields, old consumers read the default value. Retire any logic that depends on the field first.
Changed field type (int32 -> string)	No	No	Wire format changes and decoding becomes unsafe.
Added enum value	Yes	Conditional	Old code needs fallback handling for unknown enum values.

Performance

Performance Engineering

Latency and throughput should be measured on your own workloads with realistic payloads.

Open chapter

Latency and throughput comparison

Approach	Latency profile	Throughput profile	Common fit	Key trade-off
REST sync	Low inside a data center; grows with JSON size and call-chain depth	Bounded by serialization cost and connection pool size	External/public APIs, simple integrations	Heavier payloads and usually higher CPU serialization cost.
gRPC sync	Usually lower than REST on the same path: compact serialization plus HTTP/2	Higher for frequent small calls: multiplexing removes the connection-count limit	Internal low-latency RPC, streaming	Needs IDL governance/tooling and HTTP/2 readiness.
GraphQL (BFF/Gateway)	Driven by the slowest resolver and the query depth	Bounded by query execution on the gateway, not by the transport	UI aggregation, product-driven contracts	Resolver fan-out, harder profiling and caching.
Queue-based async	Driven by queue depth and consumer pace, not by the network	Scales horizontally with partitions and consumer count	Background commands, smoothing traffic spikes	Eventual consistency and a separate operational loop for queues.
Pub/Sub events	Depends on broker batching, durability settings, and subscriber speed	High: log-oriented brokers are optimized for sequential writes and batching	Domain events with multiple independent subscribers	Harder ordering/duplication control and contract evolution.

Backpressure and flow control

Backpressure is the receiver's ability to slow the sender down when work arrives faster than it can be processed. Without explicit flow control, every asynchronous integration degrades the same way: queues grow, latency creeps up, and the system runs out of memory at the worst possible moment.

Overload signal: consumer lag

The first sign that producers outpace consumers is growing consumer lag and queue depth. Watch the growth rate, not just the absolute value: a steady upward trend means the system is living on borrowed time.

Bounded concurrency and buffers

Bounded queues and explicit concurrency limits - worker pool size, semaphores, bulkheads - turn overload into a controlled slowdown instead of memory exhaustion and cascading failure.

Prefetch and credit-based schemes

The consumer states how much it is ready to take: prefetch limits in RabbitMQ (`basic.qos`), pause/resume and `max.poll.records` in Kafka, the credit-based HTTP/2 flow-control windows that gRPC relies on, and `request(n)` in Reactive Streams.

Reacting to overflow

When the buffer does fill up, you need an explicit policy: slow the producer down, answer 429 with `Retry-After`, shed non-critical traffic, or divert messages to a dead-letter queue. The worst option is an unbounded buffer that just postpones the problem.

Real event contracts: CloudEvents and AsyncAPI

CloudEvents (domain event example)

{
  "specversion": "1.0",
  "type": "com.shop.order.paid.v1",
  "source": "urn:shop:payments",
  "id": "evt-01HQ7V0R4Z6A0G3T95S1ZQ6B9N",
  "time": "2026-03-03T14:23:44Z",
  "subject": "order/938475",
  "datacontenttype": "application/json",
  "dataschema": "https://events.shop.dev/schemas/order-paid-v1.json",
  "data": {
    "orderId": "938475",
    "userId": "u-1821",
    "amount": 149.90,
    "currency": "USD",
    "paymentMethod": "card"
  }
}

AsyncAPI (channel + payload contract)

asyncapi: 3.0.0
info:
  title: Order Events API
  version: 1.4.0
channels:
  order.paid.v1:
    address: order.paid.v1
    messages:
      orderPaid:
        $ref: '#/components/messages/OrderPaid'
operations:
  onOrderPaid:
    action: receive
    channel:
      $ref: '#/channels/order.paid.v1'
    messages:
      - $ref: '#/channels/order.paid.v1/messages/orderPaid'
components:
  messages:
    OrderPaid:
      payload:
        type: object
        required: [orderId, userId, amount, currency]
        properties:
          orderId: { type: string }
          userId: { type: string }
          amount: { type: number }
          currency: { type: string }

Each event should include business key (`orderId`) and technical id (`id`) for deduplication.

Use explicit versioning in `type`/topic (`...v1`) and keep the schema in a registry.

Document delivery SLA, at-least-once/exactly-once expectations, and TTL.

Assign an owning team and deprecation policy for each event version.

Reliability

Fault Tolerance Patterns

Distributed communication is fragile unless resilience policies are part of the contract.

Open chapter

How to choose an interaction pattern

Need a response to the user within one HTTP request -> prefer a synchronous path.

When resistance to spikes and loose coupling come first, use async communication through a queue or topic.

If the operation is money/order critical, check idempotency and ordering before selecting a pattern.

The more cross-service hops there are in the path, the costlier synchronous chains get: reduce their depth and put a cache or materialized views in front.

Timeout budget for each service hop and one end-to-end deadline policy.

Retries with jitter and a retry budget, so dependency degradation does not turn into a retry storm.

Circuit breaker/bulkhead for fault isolation and concurrency control.

Idempotency keys for commands and deduplication for event consumers.

Dead-letter and parking-lot queues for invalid or problematic messages.

Practical checklist

For each integration channel, an owner, SLO and error budget are specified.
Contracts are versioned and verified by contract tests in CI.
There is a degradation strategy when the downstream service is unavailable.
Tracing covers the end-to-end path through synchronous and asynchronous segments.
Critical commands and events are processed idempotently.

References

Related chapters

Event-Driven Architecture - Once asynchronous flows multiply, events need deliberate design: event models and flow design are covered here.
Fault Tolerance Patterns - Timeouts, retries, and circuit breakers as a mandatory layer of inter-service communication.
Consistency and idempotency - How to ensure data correctness during retries and re-delivery of events.
Service Discovery - Communication between services relies on correct discovery of endpoints.
Decomposition Strategies - Service boundaries directly affect the intensity and complexity of communications.