Inter-Service Communication Patterns

Module 26: Advanced Backend & API Development

Introduction to Service Communication

In a microservices architecture, services need to communicate with each other to fulfill business operations. The choice of communication patterns significantly impacts system characteristics like performance, reliability, and scalability.

Analogy: Communication in an Organization

Microservices communication is similar to how people communicate in a large organization:

  • Synchronous communication is like calling someone on the phone – you wait for their immediate response before continuing.
  • Asynchronous communication is like sending an email – you can continue with other tasks while waiting for a response.
  • Event-based communication is like a company announcement system – departments respond to relevant announcements without being directly contacted.
  • Broadcasting is like an all-hands meeting – a message is sent to everyone, regardless of relevance.
graph TD A[Service Communication] --> B[Synchronous] A --> C[Asynchronous] B --> D[Request/Response] B --> E[RPC] C --> F[Message-Based] C --> G[Event-Based] D --> D1[REST] D --> D2[GraphQL] E --> E1[gRPC] E --> E2[Apache Thrift] F --> F1[Message Queues] F --> F2[Service Bus] G --> G1[Event Streaming] G --> G2[Pub/Sub]

Synchronous Communication Patterns

Synchronous communication involves a service directly calling another service and waiting for a response before proceeding.

Request-Response over HTTP/REST

The most common pattern for service-to-service communication, using HTTP as the transport protocol.


// Example of service-to-service REST call in Node.js
const axios = require('axios');

async function getOrderDetails(orderId) {
  try {
    // Call Order Service
    const orderResponse = await axios.get(`http://order-service/orders/${orderId}`);
    const order = orderResponse.data;
    
    // Call User Service to get customer details
    const userResponse = await axios.get(`http://user-service/users/${order.customerId}`);
    const user = userResponse.data;
    
    // Call Product Service to get product details
    const productPromises = order.items.map(item => 
      axios.get(`http://product-service/products/${item.productId}`)
    );
    const productResponses = await Promise.all(productPromises);
    const products = productResponses.map(response => response.data);
    
    return {
      order,
      customer: user,
      products
    };
  } catch (error) {
    console.error('Error fetching order details:', error);
    throw new Error('Failed to retrieve order details');
  }
}
      

REST API Design Best Practices for Microservices

  • Use resource-oriented endpoints: Design APIs around business resources, not operations
  • Consistent error handling: Use standard HTTP status codes and error formats
  • API versioning: Use URL or header-based versioning to maintain compatibility
  • Throttling and rate limiting: Protect services from excessive requests
  • Idempotent operations: Ensure repeated requests have the same effect
  • HATEOAS: Include links to related resources to reduce coupling

gRPC (Remote Procedure Call)

A high-performance, binary protocol framework for service-to-service communication.


// Example gRPC Proto definition
syntax = "proto3";

package ecommerce;

service OrderService {
  rpc GetOrder(OrderRequest) returns (Order);
  rpc CreateOrder(CreateOrderRequest) returns (Order);
  rpc UpdateOrder(UpdateOrderRequest) returns (Order);
  rpc DeleteOrder(OrderRequest) returns (DeleteResponse);
}

message OrderRequest {
  string order_id = 1;
}

message Order {
  string order_id = 1;
  string customer_id = 2;
  string status = 3;
  repeated OrderItem items = 4;
  double total_amount = 5;
  string created_at = 6;
}

message OrderItem {
  string product_id = 1;
  int32 quantity = 2;
  double price = 3;
}

message CreateOrderRequest {
  string customer_id = 1;
  repeated OrderItem items = 2;
}

message UpdateOrderRequest {
  string order_id = 1;
  string status = 2;
}

message DeleteResponse {
  bool success = 1;
}
      

gRPC vs REST Comparison

Feature gRPC REST
Protocol HTTP/2 (binary) HTTP (text)
Contract Strong (Protocol Buffers) Loose (OpenAPI optional)
Performance Higher (binary, multiplexed) Lower (text, sequential)
Browser Support Limited (requires proxy) Native
Streaming Bidirectional streaming Limited (SSE, WebSockets)
Code Generation Native, multi-language Third-party tools
Learning Curve Steeper Gentler

GraphQL for Service Aggregation

GraphQL can be used for service-to-service communication, especially when aggregating data from multiple services.


// GraphQL query to fetch related data
const query = `
  query GetOrderDetails($orderId: ID!) {
    order(id: $orderId) {
      id
      status
      createdAt
      totalAmount
      customer {
        id
        name
        email
      }
      items {
        quantity
        product {
          id
          name
          price
          imageUrl
        }
      }
    }
  }
`;

async function getOrderDetails(orderId) {
  const variables = { orderId };
  const result = await graphqlClient.request(query, variables);
  return result.order;
}
      

Real-World Example: Netflix API Gateway

Netflix uses GraphQL in their API Gateway to aggregate data from multiple backend microservices:

  1. Mobile and web clients make a single GraphQL request to the API Gateway
  2. The Gateway's resolvers call multiple backend services (user profiles, recommendations, content, viewing history)
  3. Results are combined and transformed into the exact shape requested by the client
  4. This pattern reduces network overhead and simplifies client implementation

Synchronous Communication Challenges

While straightforward, synchronous communication introduces several challenges in microservices architectures:

Temporal Coupling

Services become dependent on each other's availability and response times.

Availability Mathematics

If each service has 99.9% uptime (good SLA), consider the overall availability:

  • 1 dependency: 99.9% × 99.9% = 99.8% availability
  • 5 dependencies: 99.9%5 = 99.5% availability
  • 10 dependencies: 99.9%10 = 99.0% availability
  • 20 dependencies: 99.9%20 = 98.0% availability

The more synchronous dependencies, the lower the overall system availability.

Latency Accumulation

Each service call adds latency, creating a cumulative effect.

Latency Accumulation in Synchronous Calls Client API Gateway Order Service Payment Service Request (20ms) Get Order (50ms) Validate Payment (100ms) Response (30ms) Response (40ms) Response (20ms) Total Latency: 20+50+100+30+40+20 = 260ms

Network Unreliability

Network failures are more common in distributed systems than in monoliths.

Resource Exhaustion

Synchronous patterns can lead to resource exhaustion under load.

Mitigating Synchronous Communication Challenges

Several patterns can help address the challenges of synchronous communication:

Circuit Breaker Pattern

Prevents cascading failures when a service is unresponsive.


// Example using Resilience4j in Java
CircuitBreaker circuitBreaker = CircuitBreakerRegistry.ofDefaults()
    .circuitBreaker("paymentService");

public PaymentResponse processPayment(PaymentRequest request) {
    return Try.ofSupplier(
        CircuitBreaker.decorateSupplier(
            circuitBreaker, 
            () -> paymentServiceClient.processPayment(request)
        )
    ).recover(throwable -> {
        // Fallback logic when circuit is open
        return new PaymentResponse(
            PaymentStatus.PENDING, 
            "Payment processing delayed"
        );
    }).get();
}
      

Timeout Patterns

Ensures services don't wait indefinitely for responses.


// Example of timeout pattern in Node.js with Axios
const axios = require('axios');

async function getProductDetails(productId) {
  try {
    // Set timeout to 2 seconds
    const response = await axios.get(
      `http://product-service/products/${productId}`, 
      { timeout: 2000 }
    );
    return response.data;
  } catch (error) {
    if (error.code === 'ECONNABORTED') {
      console.log('Request timed out');
      // Return cached or default data
      return getCachedProductDetails(productId);
    }
    throw error;
  }
}
      

Bulkhead Pattern

Isolates failures to prevent them from taking down the entire system.


// Example using a thread pool bulkhead in Java
ThreadPoolBulkhead bulkhead = ThreadPoolBulkheadRegistry
    .ofDefaults()
    .bulkhead("paymentService", ThreadPoolBulkheadConfig.custom()
        .maxThreadPoolSize(10)
        .coreThreadPoolSize(5)
        .queueCapacity(100)
        .build());

public CompletableFuture<PaymentResponse> processPayment(PaymentRequest request) {
    return Bulkhead.decorateCallable(
        bulkhead,
        () -> paymentServiceClient.processPayment(request)
    ).get();
}
      

Retry Patterns

Automatically retries failed requests with appropriate backoff strategies.


// Example of retry pattern with exponential backoff in Python
import time
import random
from requests.exceptions import RequestException

def get_with_retry(url, max_retries=3, base_delay=1):
    retries = 0
    while retries < max_retries:
        try:
            response = requests.get(url, timeout=2)
            response.raise_for_status()
            return response.json()
        except RequestException as e:
            retries += 1
            if retries == max_retries:
                raise e
            
            # Exponential backoff with jitter
            delay = base_delay * (2 ** (retries - 1)) + random.uniform(0, 0.5)
            print(f"Request failed, retrying in {delay:.2f} seconds")
            time.sleep(delay)
    
    # Should never reach here
    raise Exception("Unexpected exit from retry loop")
      

Retry Backoff Strategies

  • Constant backoff: Wait the same amount of time between retries
  • Linear backoff: Increase wait time linearly (e.g., 1s, 2s, 3s)
  • Exponential backoff: Double wait time for each retry (e.g., 1s, 2s, 4s, 8s)
  • Exponential backoff with jitter: Add randomness to avoid thundering herd (e.g., 1.2s, 2.1s, 3.9s)

API Gateways and BFF Pattern

Aggregates multiple service calls to reduce latency for clients.

graph TD Client[Mobile Client] --> BFF[Mobile BFF] BFF --> UserService[User Service] BFF --> OrderService[Order Service] BFF --> PaymentService[Payment Service] BFF --> NotificationService[Notification Service] subgraph Backend Microservices UserService OrderService PaymentService NotificationService end

Backend for Frontend (BFF) Pattern

The BFF pattern involves creating separate API gateways for different client types:

  • Mobile BFF: Optimized for mobile clients (data-efficient, battery-aware)
  • Web BFF: Optimized for web clients (more data, rich interactions)
  • 3rd Party BFF: Restricted data access for external integrations

Each BFF knows exactly what its client needs and can aggregate data efficiently from backend services.

Asynchronous Communication Patterns

Asynchronous communication decouples services, allowing them to communicate without waiting for immediate responses.

Message-Based Communication

Services communicate by sending messages through message brokers or queues.

graph LR A[Order Service] -->|Send Message| B[Message Queue] B -->|Process Message| C[Payment Service] C -->|Send Message| D[Different Queue] D -->|Process Message| E[Shipping Service]

// Example of publishing a message with RabbitMQ in Node.js
const amqp = require('amqplib');

async function publishOrderCreatedEvent(order) {
  const connection = await amqp.connect('amqp://localhost');
  const channel = await connection.createChannel();
  
  const exchange = 'order_events';
  const routingKey = 'order.created';
  const message = Buffer.from(JSON.stringify({
    eventType: 'OrderCreated',
    timestamp: new Date().toISOString(),
    data: order
  }));
  
  await channel.assertExchange(exchange, 'topic', { durable: true });
  channel.publish(exchange, routingKey, message, {
    persistent: true
  });
  
  console.log(`Published OrderCreated event for order ${order.id}`);
  
  setTimeout(() => {
    connection.close();
  }, 500);
}
      

// Example of consuming a message with RabbitMQ in Node.js
const amqp = require('amqplib');

async function startPaymentProcessor() {
  const connection = await amqp.connect('amqp://localhost');
  const channel = await connection.createChannel();
  
  const exchange = 'order_events';
  const queue = 'payment_service_orders';
  const routingKey = 'order.created';
  
  await channel.assertExchange(exchange, 'topic', { durable: true });
  await channel.assertQueue(queue, { durable: true });
  await channel.bindQueue(queue, exchange, routingKey);
  
  console.log('Payment processor waiting for messages...');
  
  channel.consume(queue, async (msg) => {
    if (!msg) return;
    
    try {
      const event = JSON.parse(msg.content.toString());
      console.log(`Processing payment for order ${event.data.id}`);
      
      // Process payment logic here
      await processPayment(event.data);
      
      // Acknowledge the message
      channel.ack(msg);
    } catch (error) {
      console.error('Error processing payment:', error);
      // Reject and requeue the message
      channel.nack(msg, false, true);
    }
  });
}
      

Popular Message Brokers

Message Broker Key Features Best For
RabbitMQ Mature, flexible routing, many protocols Complex routing, multiple patterns
Apache Kafka High throughput, persistence, event streaming Big data, event sourcing, analytics
Amazon SQS Fully managed, simple, serverless-friendly AWS workloads, simple queuing
Google Pub/Sub Managed, global, at-least-once delivery GCP workloads, global distribution
Azure Service Bus Managed, AMQP support, integrations Azure workloads, enterprise integration
Redis Pub/Sub Simple, fast, in-memory Simple use cases, low latency

Event-Driven Architecture

Services publish events when their state changes, and other services react to those events.

graph TD A[Order Service] -->|Publishes| B[Order Created Event] B -->|Consumed by| C[Payment Service] B -->|Consumed by| D[Inventory Service] B -->|Consumed by| E[Notification Service] B -->|Consumed by| F[Analytics Service]

// Example of publishing an event with Kafka in Java
public void createOrder(Order order) {
    // Save order to database
    orderRepository.save(order);
    
    // Publish event
    OrderCreatedEvent event = new OrderCreatedEvent(
        order.getId(),
        order.getCustomerId(),
        order.getItems(),
        order.getTotalAmount(),
        LocalDateTime.now()
    );
    
    kafkaTemplate.send(
        "order-events", 
        order.getId(), 
        event
    );
}
      

Real-World Example: Uber's Event-Driven Architecture

Uber uses event-driven microservices for their ride-sharing platform:

  1. When a rider requests a ride, a "RideRequested" event is published
  2. The driver matching service subscribes to this event and finds nearby drivers
  3. When a driver accepts, a "RideAccepted" event is published
  4. Multiple services react to this event:
    • Payment service pre-authorizes the rider's payment method
    • Notification service alerts the rider their driver is coming
    • Mapping service begins tracking and ETAs
    • Analytics service records the match for future optimization

This architecture allows Uber to scale each service independently and add new functionality without modifying existing services.

Event Sourcing and CQRS

Advanced patterns that leverage event-driven architecture for complex scenarios.

Event Sourcing

Storing all changes to application state as a sequence of events.

graph LR A[Command] --> B[Command Handler] B --> C[Apply Event] C --> D[Event Store] D --> E[Event Stream] E --> F[Projections] F --> G[Query Models] G --> H[Query API]

// Example of event sourcing in a bank account service
public class BankAccount {
    private String accountId;
    private double balance;
    private List<Event> uncommittedEvents = new ArrayList<>();
    
    // Apply a deposit
    public void deposit(double amount) {
        if (amount <= 0) {
            throw new IllegalArgumentException("Amount must be positive");
        }
        
        // Create and apply the event
        FundsDepositedEvent event = new FundsDepositedEvent(
            accountId, 
            amount, 
            LocalDateTime.now()
        );
        apply(event);
        uncommittedEvents.add(event);
    }
    
    // Apply a withdrawal
    public void withdraw(double amount) {
        if (amount <= 0) {
            throw new IllegalArgumentException("Amount must be positive");
        }
        
        if (balance < amount) {
            throw new InsufficientFundsException("Insufficient funds");
        }
        
        // Create and apply the event
        FundsWithdrawnEvent event = new FundsWithdrawnEvent(
            accountId, 
            amount, 
            LocalDateTime.now()
        );
        apply(event);
        uncommittedEvents.add(event);
    }
    
    // Apply an event to update the state
    private void apply(Event event) {
        if (event instanceof FundsDepositedEvent) {
            this.balance += ((FundsDepositedEvent) event).getAmount();
        } else if (event instanceof FundsWithdrawnEvent) {
            this.balance -= ((FundsWithdrawnEvent) event).getAmount();
        }
        // Handle other event types
    }
    
    // Reconstruct account state from events
    public static BankAccount fromEvents(String accountId, List<Event> events) {
        BankAccount account = new BankAccount();
        account.accountId = accountId;
        
        for (Event event : events) {
            account.apply(event);
        }
        
        return account;
    }
    
    // Get uncommitted events for persistence
    public List<Event> getUncommittedEvents() {
        return new ArrayList<>(uncommittedEvents);
    }
    
    // Clear uncommitted events after persistence
    public void clearUncommittedEvents() {
        uncommittedEvents.clear();
    }
}
      

Event Sourcing Benefits

  • Complete audit trail: Every change is captured as an event
  • Temporal queries: Ability to determine system state at any point in time
  • Event replay: Can rebuild state by replaying events
  • Easier debugging: Can understand exactly how the system reached its current state

Command Query Responsibility Segregation (CQRS)

Separating read and write operations into different models.

CQRS Pattern Command Side Command Command Handler Write Model Event Bus Query Side Query Query Handler Read Model Write Client Read Client

CQRS Example: E-commerce Product Catalog

  • Write model (Commands):
    • CreateProduct
    • UpdateProductDetails
    • UpdateInventory
    • ChangeProductPrice
  • Read models (Queries):
    • ProductSearchResults (optimized for search)
    • ProductDetailsPage (with reviews, related products)
    • ProductInventorySummary (for internal use)
    • ProductPricingReport (for analytics)

Each read model is purpose-built for specific use cases and optimized for query performance.

Choosing the Right Communication Pattern

The choice between synchronous and asynchronous communication depends on several factors:

Scenario Recommended Pattern Rationale
User waiting for immediate response Synchronous Provides immediate feedback
Background processing Asynchronous No need for immediate response
Need for guaranteed delivery Asynchronous with queues Persistent messaging ensures delivery
Operation that affects multiple services Event-driven Loose coupling, better scalability
Simple data retrieval Synchronous Simpler implementation for straightforward queries
High throughput requirements Asynchronous Better handling of traffic spikes
Complex workflows Orchestration or Choreography Coordinates multiple services

Analogy: Business Communication Types

Comparing synchronous and asynchronous communication is like different business meeting formats:

  • Synchronous = In-person meetings: Immediate feedback, real-time collaboration, but requires everyone to be available at the same time and can waste time waiting.
  • Asynchronous = Email communication: Participants respond when available, allows for parallel work, better documentation, but slower feedback cycle.
  • Event-driven = Company announcement board: Information is published centrally, interested parties subscribe to updates, no direct coordination needed.

Hybrid Approach: Uber Ride Service

Many real-world systems use a combination of patterns:

  • Synchronous (REST/gRPC): Ride requests, driver acceptance, payment processing
  • Asynchronous (Message Queue): Driver location updates, receipt generation
  • Event-Driven: Ride status changes, notification triggers
  • Event Sourcing: Complete ride history for auditing and analytics

Practical Exercise

Designing Communication Patterns for a Food Delivery Service

For this exercise, you'll design the communication patterns for a food delivery service with the following microservices:

Complete the following tasks:

  1. Identify which interactions should use synchronous communication and which should use asynchronous
  2. Design the event schema for at least three key events in the system
  3. Draw a diagram showing the communication flow for the "Place Order" process
  4. Implement error handling strategies for at least two failure scenarios

Example Event Schema: OrderCreated


{
  "eventType": "OrderCreated",
  "version": "1.0",
  "id": "ede39fde-1d32-4231-a1f5-1a63f7c3ef77",
  "source": "order-service",
  "timestamp": "2025-03-15T14:30:45.123Z",
  "correlationId": "cust-order-5678",
  "data": {
    "orderId": "ORD-12345",
    "customerId": "CUST-6789",
    "restaurantId": "REST-1234",
    "items": [
      {
        "itemId": "ITEM-001",
        "name": "Margherita Pizza",
        "quantity": 2,
        "price": 12.99,
        "specialInstructions": "Extra cheese"
      }
    ],
    "totalAmount": 25.98,
    "deliveryAddress": {
      "street": "123 Main St",
      "city": "San Francisco",
      "state": "CA",
      "zipCode": "94105"
    },
    "orderTime": "2025-03-15T14:30:40.000Z",
    "estimatedDeliveryTime": "2025-03-15T15:15:00.000Z"
  }
}
        

Conclusion and Key Takeaways

In the next lecture, we'll explore API Gateway implementation, examining how to create a unified entry point for microservices that handles cross-cutting concerns.

Additional Resources