--- title: Dynamo Feature Compatibility Matrices --- This document provides a comprehensive compatibility matrix for key Dynamo features across the supported backends. *Updated for Dynamo v0.8.0* **Legend:** * ✅ : Supported * 🚧 : Work in Progress / Experimental / Limited ## Quick Comparison | Feature | vLLM | TensorRT-LLM | SGLang | Source | | :------------------------ | :---: | :----------: | :----: | :------------------------- | | **Disaggregated Serving** | ✅ | ✅ | ✅ | [Design Doc][disagg] | | **KV-Aware Routing** | ✅ | ✅ | ✅ | [Router Doc][kv-routing] | | **SLA-Based Planner** | ✅ | ✅ | ✅ | [Planner Doc][planner] | | **KV Block Manager** | ✅ | ✅ | 🚧 | [KVBM Doc][kvbm] | | **Multimodal (Image)** | ✅ | ✅ | ✅ | [Multimodal Doc][mm] | | **Multimodal (Video)** | ✅ | | | [Multimodal Doc][mm] | | **Multimodal (Audio)** | 🚧 | | | [Multimodal Doc][mm] | | **Request Migration** | ✅ | 🚧 | ✅ | [Migration Doc][migration] | | **Request Cancellation** | ✅ | ✅ | 🚧 | Backend READMEs | | **LoRA** | ✅ | | | [K8s Guide][lora] | | **Tool Calling** | ✅ | ✅ | ✅ | [Tool Calling Doc][tools] | | **Speculative Decoding** | ✅ | ✅ | 🚧 | Backend READMEs | ## 1. vLLM Backend vLLM offers the broadest feature coverage in Dynamo, with full support for disaggregated serving, KV-aware routing, KV block management, LoRA adapters, and multimodal inference including video and audio. *Source: [vLLM Backend][vllm-readme]* | Feature | Disaggregated Serving | KV-Aware Routing | SLA-Based Planner | KV Block Manager | Multimodal | Request Migration | Request Cancellation | LoRA | Tool Calling | Speculative Decoding | | :------------------------ | :-------------------: | :--------------: | :---------------: | :--------------: | :--------: | :---------------: | :------------------: | :---: | :----------: | :------------------: | | **Disaggregated Serving** | — | | | | | | | | | | | **KV-Aware Routing** | ✅ | — | | | | | | | | | | **SLA-Based Planner** | ✅ | ✅ | — | | | | | | | | | **KV Block Manager** | ✅ | ✅ | ✅ | — | | | | | | | | **Multimodal** | ✅ | 1 | — | ✅ | — | | | | | | | **Request Migration** | ✅ | ✅ | ✅ | ✅ | ✅ | — | | | | | | **Request Cancellation** | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | — | | | | | **LoRA** | ✅ | ✅2 | — | ✅ | — | ✅ | ✅ | — | | | | **Tool Calling** | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | — | | | **Speculative Decoding** | ✅ | ✅ | — | ✅ | — | ✅ | ✅ | — | ✅ | — | > **Notes:** > 1. **Multimodal + KV-Aware Routing**: The KV router uses token-based hashing and does not yet support image/video hashes, so it falls back to random/round-robin routing. ([Source][kv-routing]) > 2. **KV-Aware LoRA Routing**: vLLM supports routing requests based on LoRA adapter affinity. > 3. **Audio Support**: vLLM supports audio models like Qwen2-Audio (experimental). ([Source][mm-vllm]) > 4. **Video Support**: vLLM supports video input with frame sampling. ([Source][mm-vllm]) > 5. **Speculative Decoding**: Eagle3 support documented. ([Source][vllm-spec]) ## 2. SGLang Backend SGLang is optimized for high-throughput serving with fast primitives, providing robust support for disaggregated serving, KV-aware routing, and request migration. *Source: [SGLang Backend][sglang-readme]* | Feature | Disaggregated Serving | KV-Aware Routing | SLA-Based Planner | KV Block Manager | Multimodal | Request Migration | Request Cancellation | LoRA | Tool Calling | Speculative Decoding | | :------------------------ | :-------------------: | :--------------: | :---------------: | :--------------: | :--------: | :---------------: | :------------------: | :---: | :----------: | :------------------: | | **Disaggregated Serving** | — | | | | | | | | | | | **KV-Aware Routing** | ✅ | — | | | | | | | | | | **SLA-Based Planner** | ✅ | ✅ | — | | | | | | | | | **KV Block Manager** | 🚧 | 🚧 | 🚧 | — | | | | | | | | **Multimodal** | ✅2 | 1 | — | 🚧 | — | | | | | | | **Request Migration** | ✅ | ✅ | ✅ | 🚧 | ✅ | — | | | | | | **Request Cancellation** | 🚧3 | ✅ | ✅ | 🚧 | 🚧 | ✅ | — | | | | | **LoRA** | | | | 🚧 | | | | — | | | | **Tool Calling** | ✅ | ✅ | ✅ | 🚧 | ✅ | ✅ | ✅ | | — | | | **Speculative Decoding** | 🚧 | 🚧 | — | 🚧 | — | 🚧 | — | | 🚧 | — | > **Notes:** > 1. **Multimodal + KV-Aware Routing**: Not supported. ([Source][kv-routing]) > 2. **Multimodal Patterns**: Supports **E/PD** and **E/P/D** only (requires separate vision encoder). Does **not** support simple Aggregated (EPD) or Traditional Disagg (EP/D). ([Source][mm-sglang]) > 3. **Request Cancellation**: Cancellation during the remote prefill phase is not supported in disaggregated mode. ([Source][sglang-readme]) > 4. **Speculative Decoding**: Code hooks exist (`spec_decode_stats` in publisher), but no examples or documentation yet. ## 3. TensorRT-LLM Backend TensorRT-LLM delivers maximum inference performance and optimization, with full KVBM integration and robust disaggregated serving support. *Source: [TensorRT-LLM Backend][trtllm-readme]* | Feature | Disaggregated Serving | KV-Aware Routing | SLA-Based Planner | KV Block Manager | Multimodal | Request Migration | Request Cancellation | LoRA | Tool Calling | Speculative Decoding | | :------------------------ | :-------------------: | :--------------: | :---------------: | :--------------: | :--------: | :---------------: | :------------------: | :---: | :----------: | :------------------: | | **Disaggregated Serving** | — | | | | | | | | | | | **KV-Aware Routing** | ✅ | — | | | | | | | | | | **SLA-Based Planner** | ✅ | ✅ | — | | | | | | | | | **KV Block Manager** | ✅ | ✅ | ✅ | — | | | | | | | | **Multimodal** | ✅1 | 2 | — | ✅ | — | | | | | | | **Request Migration** | 🚧3 | ✅ | ✅ | ✅ | 🚧 | — | | | | | | **Request Cancellation** | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | — | | | | | **LoRA** | | | | | | | | — | | | | **Tool Calling** | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | | — | | | **Speculative Decoding** | ✅ | ✅ | — | ✅ | — | ✅ | ✅ | | ✅ | — | > **Notes:** > 1. **Multimodal Disaggregation**: Fully supports **EP/D** (Traditional) pattern. **E/P/D** (Full Disaggregation) is WIP and currently supports pre-computed embeddings only. ([Source][mm-trtllm]) > 2. **Multimodal + KV-Aware Routing**: Not supported. The KV router currently tracks token-based blocks only. ([Source][kv-routing]) > 3. **Request Migration**: Supported on **Decode/Aggregated** workers only. **Prefill** workers do not support migration. ([Source][trtllm-readme]) > 4. **Speculative Decoding**: Llama 4 + Eagle support documented. ([Source][trtllm-eagle]) --- ## Source References ### Backends [vllm-readme]: /dynamo/v-0-8-0/components/backends/v-llm [sglang-readme]: /dynamo/v-0-8-0/components/backends/sg-lang [trtllm-readme]: /dynamo/v-0-8-0/components/backends/tensor-rt-llm ### Design Docs [disagg]: /dynamo/v-0-8-0/design-docs/disaggregated-serving [kv-routing]: /dynamo/v-0-8-0/additional-resources/router-details/kv-cache-routing [planner]: /dynamo/v-0-8-0/components/planner/overview [kvbm]: /dynamo/v-0-8-0/components/kvbm/overview [migration]: /dynamo/v-0-8-0/additional-resources/fault-tolerance/request-migration [tools]: /dynamo/v-0-8-0/user-guides/tool-calling ### Multimodal [mm]: /dynamo/v-0-8-0/user-guides/multimodality-support [mm-vllm]: /dynamo/v-0-8-0/additional-resources/multimodal-details/v-llm [mm-trtllm]: /dynamo/v-0-8-0/additional-resources/multimodal-details/tensor-rt-llm [mm-sglang]: /dynamo/v-0-8-0/additional-resources/multimodal-details/sg-lang ### Feature-specific [lora]: /dynamo/v-0-8-0/kubernetes-deployment/deployment-guide/managing-models-with-dynamo-model [vllm-spec]: /dynamo/v-0-8-0/additional-resources/backend-details/v-llm/speculative-decoding [trtllm-eagle]: /dynamo/v-0-8-0/additional-resources/backend-details/tensor-rt-llm/llama-4-eagle