SGLang
Running SGLang with Dynamo
Use the Latest Release
We recommend using the latest stable release of Dynamo to avoid breaking changes:
You can find the latest release here and check out the corresponding branch with:
Dynamo SGLang integrates SGLang engines into Dynamo’s distributed runtime, enabling disaggregated serving, KV-aware routing, and request cancellation while maintaining full compatibility with SGLang’s native engine arguments. It supports LLM inference, embedding models, multimodal vision models, and diffusion-based generation (LLM, image, video).
Installation
Install Latest Release
We recommend using uv to install:
This installs Dynamo with the compatible SGLang version.
Install for Development
Development installation
Requires Rust and the CUDA toolkit (nvcc).
This is the ideal way for agents to also develop. You can provide the path to both repos and the virtual environment and have it rerun these commands as it makes changes
Docker
Build and run container
Feature Support Matrix
Quick Start
Python / CLI Deployment
Start infrastructure services for local development:
Launch an aggregated serving deployment:
Verify the deployment:
Kubernetes Deployment
You can deploy SGLang with Dynamo on Kubernetes using a DynamoGraphDeployment. For more details, see the SGLang Kubernetes Deployment Guide.
Next Steps
- Reference Guide: Worker types, architecture, and configuration
- Examples: All deployment patterns with launch scripts
- Disaggregation: P/D architecture and KV transfer details
- Diffusion: LLM, image, and video diffusion models
- Observability: Metrics, tracing, and Grafana dashboards
- Deploying SGLang with Dynamo on Kubernetes: Kubernetes deployment guide