Dynamo Request Planes User Guide

View as Markdown

Overview

Dynamo supports multiple transport mechanisms for its request plane (the communication layer between services). You can choose from three different request plane modes based on your deployment requirements:

  • TCP (default): Direct TCP connection for optimal performance
  • NATS: Message broker-based request plane
  • HTTP: HTTP/2-based request plane

This guide explains how to configure and use request plane in your Dynamo deployment.

What is a Request Plane?

The request plane is the transport layer that handles communication between Dynamo services (e.g., frontend to backend, worker to worker). Different request planes offer different trade-offs:

Request PlaneSuitable ForCharacteristics
NATSProduction deployments with KV routingRequires NATS infrastructure, provides pub/sub patterns, highest flexibility
TCPLow-latency direct communicationDirect connections, minimal overhead
HTTPStandard deployments, debuggingHTTP/2 protocol, easier observability with standard tools, widely compatible

Request Plane vs KV Event Plane

Dynamo has two independent communication planes:

  • Request plane (DYN_REQUEST_PLANE): how RPC requests flow between components (frontend → router → worker), via tcp, http, or nats.
  • KV event plane (currently only NATS is supported): how KV cache events (and optional router replica sync) are distributed/persisted for KV-aware routing.

Note: If you are using tcp or http request plane with KV events enabled (default), NATS is automatically initialized. You can optionally configure NATS_SERVER environment variable (e.g., NATS_SERVER=nats://nats-hostname:port) to specify a custom NATS server; otherwise, it defaults to localhost:4222. To completely disable NATS, use --no-kv-events on the frontend.

Because they are independent, you can mix them.

For example, a deployment with TCP request plane can use different KV event planes:

  • JetStream KV events: requests use TCP, KV routing still uses NATS JetStream + object store for persistence.
  • NATS Core KV events (local indexer): requests use TCP, KV events use NATS Core pub/sub and persistence lives on workers.
  • no KV events: requests use TCP and KV routing runs without events (no NATS required, but no event-backed persistence).

Configuration

Environment Variable

Set the request plane mode using the DYN_REQUEST_PLANE environment variable:

$export DYN_REQUEST_PLANE=<mode>

Where <mode> is one of:

  • tcp (default)
  • nats
  • http

The value is case-insensitive.

Default Behavior

If DYN_REQUEST_PLANE is not set or contains an invalid value, Dynamo defaults to tcp.

Usage Examples

Using TCP (Default)

TCP is the default request plane and provides direct, low-latency communication between services.

Configuration:

$# TCP is the default, so no need to set DYN_REQUEST_PLANE explicitly
$# But you can explicitly set it if desired:
$export DYN_REQUEST_PLANE=tcp
$
$# Optional: Configure TCP server host and port
$export DYN_TCP_RPC_HOST=0.0.0.0 # Default host
$# export DYN_TCP_RPC_PORT=9999 # Optional: specify a fixed port
$
$# Run your Dynamo service
$DYN_REQUEST_PLANE=tcp python -m dynamo.frontend --http-port=8000 &
$DYN_REQUEST_PLANE=tcp python -m dynamo.vllm --model Qwen/Qwen3-0.6B

Note: By default, TCP uses an OS-assigned free port (port 0). This is ideal for environments where multiple services may run on the same machine or when you want to avoid port conflicts. If you need a specific port (e.g., for firewall rules), set DYN_TCP_RPC_PORT explicitly.

When to use TCP:

  • Simple deployments with direct service-to-service communication (e.g. frontend to backend)
  • Minimal infrastructure requirements (NATS is initialized by default for KV events but can be disabled with --no-kv-events)
  • Low-latency requirements

TCP Configuration Options:

Additional TCP-specific environment variables:

  • DYN_TCP_RPC_HOST: Server host address (default: auto-detected)
  • DYN_TCP_RPC_PORT: Server port. If not set, the OS assigns a free port automatically (recommended for most deployments). Set explicitly only if you need a specific port for firewall rules.
  • DYN_TCP_MAX_MESSAGE_SIZE: Maximum message size for TCP client (default: 32MB)
  • DYN_TCP_REQUEST_TIMEOUT: Request timeout for TCP client (default: 10 seconds)
  • DYN_TCP_POOL_SIZE: Connection pool size for TCP client (default: 50)
  • DYN_TCP_CONNECT_TIMEOUT: Connect timeout for TCP client (default: 3 seconds)
  • DYN_TCP_CHANNEL_BUFFER: Request channel buffer size for TCP client (default: 100)

Using HTTP

HTTP/2 provides a standards-based request plane that’s easy to debug and widely compatible.

Configuration:

$# Optional: Configure HTTP server host and port
$export DYN_HTTP_RPC_HOST=0.0.0.0 # Default host
$export DYN_HTTP_RPC_PORT=8888 # Default port
$export DYN_HTTP_RPC_ROOT_PATH=/v1/rpc # Default path
$
$# Run your Dynamo service
$DYN_REQUEST_PLANE=http python -m dynamo.frontend --http-port=8000 &
$DYN_REQUEST_PLANE=http python -m dynamo.vllm --model Qwen/Qwen3-0.6B

When to use HTTP:

  • Standard deployments requiring HTTP compatibility
  • Debugging scenarios (use curl, browser tools, etc.)
  • Integration with HTTP-based infrastructure
  • Load balancers and proxies that work with HTTP

HTTP Configuration Options:

Additional HTTP-specific environment variables:

  • DYN_HTTP_RPC_HOST: Server host address (default: auto-detected)
  • DYN_HTTP_RPC_PORT: Server port (default: 8888)
  • DYN_HTTP_RPC_ROOT_PATH: Root path for RPC endpoints (default: /v1/rpc)

DYN_HTTP2_*: Various HTTP/2 client configuration options

  • DYN_HTTP2_MAX_FRAME_SIZE: Maximum frame size for HTTP client (default: 1MB)
  • DYN_HTTP2_MAX_CONCURRENT_STREAMS: Maximum concurrent streams for HTTP client (default: 1000)
  • DYN_HTTP2_POOL_MAX_IDLE_PER_HOST: Maximum idle connections per host for HTTP client (default: 100)
  • DYN_HTTP2_POOL_IDLE_TIMEOUT_SECS: Idle timeout for HTTP client (default: 90 seconds)
  • DYN_HTTP2_KEEP_ALIVE_INTERVAL_SECS: Keep-alive interval for HTTP client (default: 30 seconds)
  • DYN_HTTP2_KEEP_ALIVE_TIMEOUT_SECS: Keep-alive timeout for HTTP client (default: 10 seconds)
  • DYN_HTTP2_ADAPTIVE_WINDOW: Enable adaptive flow control (default: true)

Using NATS

NATS provides durable jetstream messaging for request plane and can be used for KV events (and router replica sync).

Prerequisites:

  • NATS server must be running and accessible
  • Configure NATS connection via standard Dynamo NATS environment variables
$# Explicitly set to NATS
$export DYN_REQUEST_PLANE=nats
$
$# Run your Dynamo service
$DYN_REQUEST_PLANE=nats python -m dynamo.frontend --http-port=8000 &
$DYN_REQUEST_PLANE=nats python -m dynamo.vllm --model Qwen/Qwen3-0.6B

When to use NATS:

  • Production deployments with service discovery
  • KV-aware routing with accurate cache state tracking (requires NATS for event transport). Note: approximate mode (--no-kv-events) provides KV routing without NATS but with reduced accuracy.
  • Need for message replay and persistence features

Limitations:

  • NATS does not support payloads beyond 16MB (use TCP for larger payloads)

Complete Example

Here’s a complete example showing how to launch a Dynamo deployment with different request planes:

See examples/backends/vllm/launch/agg_request_planes.sh for a complete working example that demonstrates launching Dynamo with TCP, HTTP, or NATS request planes.

Real-World Example

The Dynamo repository includes a complete example demonstrating all three request planes:

Location: examples/backends/vllm/launch/agg_request_planes.sh

$cd examples/backends/vllm/launch
$
$# Run with TCP
$./agg_request_planes.sh --tcp
$
$# Run with HTTP
$./agg_request_planes.sh --http
$
$# Run with NATS
$./agg_request_planes.sh --nats

Architecture Details

Network Manager

The request plane implementation is centralized in the Network Manager (lib/runtime/src/pipeline/network/manager.rs), which:

  1. Reads the DYN_REQUEST_PLANE environment variable at startup
  2. Creates the appropriate server and client implementations
  3. Provides a transport-agnostic interface to the rest of the codebase
  4. Manages all network configuration and lifecycle

Transport Abstraction

All request plane implementations conform to common trait interfaces:

  • RequestPlaneServer: Server-side interface for receiving requests
  • RequestPlaneClient: Client-side interface for sending requests

This abstraction means your application code doesn’t need to change when switching request planes.

Configuration Loading

Request plane configuration is loaded from environment variables at startup and cached globally. The configuration hierarchy is:

  1. Mode Selection: DYN_REQUEST_PLANE (defaults to tcp)
  2. Transport-Specific Config: Mode-specific environment variables (e.g., DYN_TCP_*, DYN_HTTP2_*)

Migration Guide

From NATS to TCP

  1. Stop your Dynamo services
  2. Set environment variable DYN_REQUEST_PLANE=tcp
  3. Optionally configure TCP-specific settings (e.g., DYN_TCP_RPC_HOST). Note: DYN_TCP_RPC_PORT is optional; if not set, an OS-assigned free port is used automatically.
  4. Restart your services

From NATS to HTTP

  1. Stop your Dynamo services
  2. Set environment variable DYN_REQUEST_PLANE=http
  3. Optionally configure HTTP-specific settings (DYN_HTTP_RPC_PORT, etc.)
  4. Restart your services

Testing the Migration

After switching request planes, verify your deployment:

$# Test with a simple request
$curl http://localhost:8000/v1/chat/completions \
> -H "Content-Type: application/json" \
> -d '{
> "model": "Qwen/Qwen3-0.6B",
> "messages": [{"role": "user", "content": "Hello!"}]
> }'

Troubleshooting

Issue: Services Can’t Communicate

Symptoms: Requests timeout or fail to reach the backend

Solutions:

  • Verify all services use the same DYN_REQUEST_PLANE setting
  • Check that server ports are not blocked by k8s network policies or firewalls
  • For TCP/HTTP: Ensure host/port configurations are correct and accessible
  • For NATS: Verify NATS server is running and accessible

Issue: “Invalid request plane mode” Error

Symptoms: Service fails to start with configuration error

Solutions:

  • Check DYN_REQUEST_PLANE spelling (valid values: nats, tcp, http)
  • Value is case-insensitive but must be one of the three options
  • If not set, defaults to tcp

Issue: Port Conflicts

Symptoms: Server fails to start due to “address already in use”

Solutions:

  • TCP: By default, TCP uses an OS-assigned free port, so port conflicts should be rare. If you explicitly set DYN_TCP_RPC_PORT to a specific port and get conflicts, either change the port or remove the setting to use automatic port assignment.
  • HTTP default port: 8888 (adjust environment variable DYN_HTTP_RPC_PORT)

Performance Considerations

Latency

  • TCP: Lowest latency due to direct connections and binary serialization
  • HTTP: Moderate latency with HTTP/2 overhead
  • NATS: Moderate latency due to nats jet stream persistence

Resource Usage

  • TCP: Minimal infrastructure (NATS required only if using KV events, can disable with --no-kv-events)
  • HTTP: Minimal infrastructure (NATS required only if using KV events, can disable with --no-kv-events)
  • NATS: Requires running NATS server (additional memory/CPU)