Dynamo Event Plane

View as Markdown

The event plane provides Dynamo with a pub/sub layer for near real-time event exchange between components. It delivers KV cache updates, worker load metrics, and sequence tracking events, enabling features like KV-aware routing and disaggregated serving.

When Is the Event Plane Used?

Key use cases:

  • KV cache events — Workers publish cache state so the router can make cache-aware scheduling decisions.
  • Worker load metrics — Workers report utilization so the router can balance load.
  • Sequence tracking — Coordinates active sequences across router replicas for fault tolerant routing.

Event plane architecture showing NATS and ZMQ transport options connecting Frontend, Planner, and Worker

Choosing a Transport

The event plane supports two transports:

NATS (default)ZMQ
External infrastructureRequires a NATS serverNone (peer-to-peer)
Setup complexitySimple — point at a NATS serverAutomatic — workers bind sockets and register via discovery
Best forLarge-scale deploymentsLow operational overhead

Configuration

Transport Selection

Set the DYN_EVENT_PLANE environment variable to choose a transport:

$# Use NATS (default -- no need to set explicitly)
$export DYN_EVENT_PLANE=nats
$
$# Use ZMQ
$export DYN_EVENT_PLANE=zmq

Python components also accept this as a CLI flag:

$# vLLM backend
$python3 -m dynamo.vllm --event-plane zmq --model Qwen/Qwen3-0.6B
$
$# SGLang backend
$python3 -m dynamo.sglang --event-plane zmq --model Qwen/Qwen3-0.6B

Environment Variables

VariableDescriptionDefault
DYN_EVENT_PLANETransport: nats or zmqnats
NATS_SERVERNATS server URL (NATS transport only)nats://localhost:4222

NATS Transport

When using NATS (DYN_EVENT_PLANE=nats or unset):

  • Requires a running NATS server. Set NATS_SERVER if it is not on localhost:4222.
  • Events are published to NATS subjects scoped by namespace and component.
  • Built-in reconnection and message buffering during brief disconnections.

Example setup:

$export NATS_SERVER=nats://nats-server:4222
$export DYN_EVENT_PLANE=nats
$
$# Start workers -- they publish events to NATS automatically
$python3 -m dynamo.vllm --model Qwen/Qwen3-0.6B
$
$# Start frontend -- it subscribes to events from NATS automatically
$python3 -m dynamo.frontend --router-mode kv

ZMQ Transport

When using ZMQ (DYN_EVENT_PLANE=zmq):

  • No external server required. Each worker binds a ZMQ PUB socket and advertises its address through the discovery system.
  • Subscribers automatically discover and connect to all active publishers.
  • When publishers come and go (e.g., workers scaling up/down), subscribers dynamically adjust their connections.

Example setup:

$export DYN_EVENT_PLANE=zmq
$
$# Start workers -- each binds a ZMQ socket, registers with discovery
$python3 -m dynamo.vllm --model Qwen/Qwen3-0.6B
$
$# Start frontend -- discovers workers and connects directly
$python3 -m dynamo.frontend --router-mode kv

Disabling the Event Plane

If you do not need KV-aware routing, you can disable the event plane entirely:

$python3 -m dynamo.frontend --router-mode kv --no-kv-events

With --no-kv-events:

  • The router falls back to prediction-based cache-aware routing (estimates cache state from routing decisions).
  • No NATS server or ZMQ sockets are needed.
  • TTL-based expiration and LRU pruning keep predicted state from growing stale.

Deployment Modes

Bare Metal / Local

Both transports work out of the box:

$# NATS (requires nats-server running)
$export NATS_SERVER=nats://localhost:4222
$
$# OR ZMQ (no extra infrastructure)
$export DYN_EVENT_PLANE=zmq

Kubernetes (with Dynamo Operator)

The operator can inject DYN_EVENT_PLANE into pods. The same transport options apply. If using NATS, deploy a NATS server in the cluster and set NATS_SERVER accordingly.