SGLang

View as Markdown

Use the Latest Release

We recommend using the latest stable release of Dynamo to avoid breaking changes.


Dynamo SGLang integrates SGLang engines into Dynamo’s distributed runtime, enabling disaggregated serving, KV-aware routing, and request cancellation while maintaining full compatibility with SGLang’s native engine arguments. It supports LLM inference, embedding models, multimodal vision models, and diffusion-based generation (LLM, image, video).

Installation

Install Latest Release

We recommend using uv to install:

$uv venv --python 3.12 --seed
$uv pip install --prerelease=allow "ai-dynamo[sglang]"

This installs the latest stable release of Dynamo with the compatible SGLang version.

Install for Development

Docker

Pull and launch the SGLang runtime image:

$docker run --gpus all -it --rm \
> --network host --shm-size=10G \
> --ulimit memlock=-1 --ulimit stack=67108864 \
> --ulimit nofile=65536:65536 \
> --ipc host \
> lmsysorg/sglang:v{sglang_version}

Inside the container, install build dependencies and Rust:

$apt-get update -qq && apt-get install -y -qq \
> build-essential libclang-dev curl git > /dev/null 2>&1
$
$curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y
$source "$HOME/.cargo/env"
$
$pip install maturin[patchelf]

Clone and build Dynamo:

$cd /sgl-workspace/
$git clone https://github.com/ai-dynamo/dynamo.git
$cd dynamo
$
$cd lib/bindings/python/
$maturin build -o /tmp
$pip install /tmp/ai_dynamo_runtime*.whl
$
$cd /sgl-workspace/dynamo/
$pip install -e .

Feature Support Matrix

FeatureStatusNotes
Disaggregated ServingPrefill/decode separation with NIXL KV transfer
KV-Aware Routing
SLA-Based Planner
Multimodal SupportImage via EPD, E/PD, E/P/D patterns
Diffusion ModelsLLM diffusion, image, and video generation
Request CancellationAggregated full; disaggregated decode-only
Graceful ShutdownDiscovery unregister + grace period
ObservabilityMetrics, tracing, and Grafana dashboards
KVBMPlanned

Quick Start

Python / CLI Deployment

Start infrastructure services for local development:

$docker compose -f deploy/docker-compose.yml up -d

Launch an aggregated serving deployment:

$cd $DYNAMO_HOME/examples/backends/sglang
$./launch/agg.sh

Verify the deployment:

$curl localhost:8000/v1/chat/completions \
> -H "Content-Type: application/json" \
> -d '{
> "model": "Qwen/Qwen3-0.6B",
> "messages": [{"role": "user", "content": "Explain why Roger Federer is considered one of the greatest tennis players of all time"}],
> "stream": true,
> "max_tokens": 30
> }'

Kubernetes Deployment

You can deploy SGLang with Dynamo on Kubernetes using a DynamoGraphDeployment. For more details, see the SGLang Kubernetes Deployment Guide.

Next Steps