Feature Benchmarks
Feature Benchmarks evaluate Dynamo features, topologies, and feature stacks under controlled traffic. Each page states the question, compares deployable configurations, shows how to reproduce the run, and links to the Recipe target when one deployment should be used directly.
Features Under Test
Serving techniques and topology changes benchmarked across the comparisons below.
Routes traffic to workers with reusable KV cache so TTFT, ITL, and goodput can improve on prefix-heavy workloads.
Separates prompt prefill and token decode into specialized worker pools for long-context latency and throughput tests.
Spreads MoE experts across a wider GPU set so expert-heavy requests get more parallel capacity.
Reuses multimodal embeddings, especially repeated images, instead of recomputing them for every request.
Drafts candidate tokens and verifies them with the target model; Eagle3 is the speculative path used here.
Moves colder KV blocks to a host-memory tier so longer context can fit without keeping all KV on GPU.
Moves decode coordination into the Dynamo frontend so routing and cache policy can act before backend execution.
Runs serving workers across node boundaries to compare aggregate, single-node P/D, and multi-node P/D shapes.
Agentic coding throughput stack
How much do KV routing, speculative decoding, P/D split, and KV offload gain when composed?

Frontend decoding plus embedding cache
How do Dynamo frontend decoding and embedding cache change a single-GPU multimodal benchmark versus vanilla vLLM serve?

Multimodal embedding cache
How much does enabling the vLLM multimodal embedding cache improve repeated-image traffic on one GB200 worker?

KV-aware routing + WideEP + P/D split
Does disaggregated KV-aware routing with WideEP improve latency and goodput against a GB200 control?

KV-aware routing + prefill/decode split
Does disaggregated KV-aware routing reduce TTFT and ITL compared with aggregated round-robin routing?

Aggregate vs single-node P/D vs multi-node P/D
How do vLLM topologies compare when normalized by GPU?
