Quickstart
Get a Dynamo OpenAI-compatible endpoint running in a container in about 5 minutes.
Choose Your Path
You’re here. Container fast path.
Full walkthrough — PyPI, configuration.
Production multi-node clusters.
For contributors against main.
Dynamo is backend-agnostic — every install path works with SGLang, TensorRT-LLM, and vLLM. Pick the install path that fits your environment, then choose your backend.
Pull a Container
Containers have all dependencies pre-installed. Pick your backend:
SGLang
TensorRT-LLM
vLLM
Hugging Face token required for gated models. Llama, Kimi, Qwen-VL, and other gated models require HF_TOKEN in your environment and accepting the model card’s license on huggingface.co. Set export HF_TOKEN=hf_… before launching.
For container versions and tags, see Release Artifacts.
Start the Frontend
In your container, start the OpenAI-compatible frontend on port 8000:
--discovery-backend file avoids needing etcd. To run frontend and worker in the same terminal, background each command with > logfile.log 2>&1 &.
Start a Worker
In another terminal, launch a worker for your backend:
SGLang
TensorRT-LLM
vLLM
Verify and Test
Check the endpoint is up:
If you see OK, send a chat completion:
Connection refused? The frontend takes a few seconds to start — retry. For production liveness and readiness probes, see Health Checks.
From the Blog
How Dynamo optimizes for agentic workloads at three layers: the frontend API, the router, and KV cache management.
How Dynamo’s concurrent global index evolved through six iterations to sustain over 100M ops/sec.
Dive Deeper
Pick a full install path from the four options above, or explore how Dynamo works under the hood: