Quickstart

Get a Dynamo OpenAI-compatible endpoint running in a container in about 5 minutes.

View as Markdown

Choose Your Path

Dynamo is backend-agnostic — every install path works with SGLang, TensorRT-LLM, and vLLM. Pick the install path that fits your environment, then choose your backend.

Pull a Container

Containers have all dependencies pre-installed. Pick your backend:

$docker run --gpus all --network host --rm -it nvcr.io/nvidia/ai-dynamo/sglang-runtime:1.0.2

Hugging Face token required for gated models. Llama, Kimi, Qwen-VL, and other gated models require HF_TOKEN in your environment and accepting the model card’s license on huggingface.co. Set export HF_TOKEN=hf_… before launching.

For container versions and tags, see Release Artifacts.

Start the Frontend

In your container, start the OpenAI-compatible frontend on port 8000:

$python3 -m dynamo.frontend --discovery-backend file

--discovery-backend file avoids needing etcd. To run frontend and worker in the same terminal, background each command with > logfile.log 2>&1 &.

Start a Worker

In another terminal, launch a worker for your backend:

$python3 -m dynamo.sglang --model-path Qwen/Qwen3-0.6B --discovery-backend file

Verify and Test

Check the endpoint is up:

$curl -sf http://localhost:8000/health && echo OK

If you see OK, send a chat completion:

$curl localhost:8000/v1/chat/completions \
> -H "Content-Type: application/json" \
> -d '{"model": "Qwen/Qwen3-0.6B",
> "messages": [{"role": "user", "content": "Hello!"}],
> "max_tokens": 50}'

Connection refused? The frontend takes a few seconds to start — retry. For production liveness and readiness probes, see Health Checks.

From the Blog

Dive Deeper

Pick a full install path from the four options above, or explore how Dynamo works under the hood: