This guide covers running Dynamo **using the CLI on your local machine or VM**.
**Looking to deploy on Kubernetes instead?**
See the [Kubernetes Installation Guide](/dynamo/v-0-9-0/kubernetes-deployment/deployment-guide/detailed-installation-guide)
and [Kubernetes Quickstart](/dynamo/v-0-9-0/kubernetes-deployment/deployment-guide) for cluster deployments.
## Install Dynamo
**Option A: Containers (Recommended)**
Containers have all dependencies pre-installed. No setup required.
```bash
# SGLang
docker run --gpus all --network host --rm -it nvcr.io/nvidia/ai-dynamo/sglang-runtime:0.9.0
# TensorRT-LLM
docker run --gpus all --network host --rm -it nvcr.io/nvidia/ai-dynamo/tensorrtllm-runtime:0.9.0
# vLLM
docker run --gpus all --network host --rm -it nvcr.io/nvidia/ai-dynamo/vllm-runtime:0.9.0
```
To run frontend and worker in the same container, either:
- Run processes in background with `&` (see Run Dynamo section below), or
- Open a second terminal and use `docker exec -it bash`
See [Release Artifacts](/dynamo/v-0-9-0/getting-started/release-artifacts#container-images) for available
versions and backend guides for run instructions: [SGLang](/dynamo/v-0-9-0/components/backends/sg-lang) |
[TensorRT-LLM](/dynamo/v-0-9-0/components/backends/tensor-rt-llm) | [vLLM](/dynamo/v-0-9-0/components/backends/v-llm)
**Option B: Install from PyPI**
```bash
# Install uv (recommended Python package manager)
curl -LsSf https://astral.sh/uv/install.sh | sh
# Create virtual environment
uv venv venv
source venv/bin/activate
uv pip install pip
```
Install system dependencies and the Dynamo wheel for your chosen backend:
**SGLang**
```bash
sudo apt install python3-dev
uv pip install --prerelease=allow "ai-dynamo[sglang]"
```
For CUDA 13 (B300/GB300), the container is recommended. See
[SGLang install docs](https://docs.sglang.io/get_started/install.html) for details.
**TensorRT-LLM**
```bash
sudo apt install python3-dev
pip install torch==2.9.0 torchvision --index-url https://download.pytorch.org/whl/cu130
pip install --pre --extra-index-url https://pypi.nvidia.com "ai-dynamo[trtllm]"
```
TensorRT-LLM requires `pip` due to a transitive Git URL dependency that
`uv` doesn't resolve. We recommend using the TensorRT-LLM container for
broader compatibility. See the [TRT-LLM backend guide](/dynamo/v-0-9-0/components/backends/tensor-rt-llm)
for details.
**vLLM**
```bash
sudo apt install python3-dev libxcb1
uv pip install --prerelease=allow "ai-dynamo[vllm]"
```
## Run Dynamo
**(Optional)** Before running Dynamo, verify your system configuration:
`python3 deploy/sanity_check.py`
Start the frontend, then start a worker for your chosen backend.
To run in a single terminal (useful in containers), append `> logfile.log 2>&1 &`
to run processes in background. Example: `python3 -m dynamo.frontend --store-kv file > dynamo.frontend.log 2>&1 &`
```bash
# Start the OpenAI compatible frontend (default port is 8000)
# --store-kv file avoids needing etcd (frontend and workers must share a disk)
python3 -m dynamo.frontend --store-kv file
```
In another terminal (or same terminal if using background mode), start a worker:
**SGLang**
```bash
python3 -m dynamo.sglang --model-path Qwen/Qwen3-0.6B --store-kv file
```
**TensorRT-LLM**
```bash
python3 -m dynamo.trtllm --model-path Qwen/Qwen3-0.6B --store-kv file
```
**vLLM**
```bash
python3 -m dynamo.vllm --model Qwen/Qwen3-0.6B --store-kv file \
--kv-events-config '{"enable_kv_cache_events": false}'
```
For dependency-free local development, disable KV event publishing (avoids NATS):
- **vLLM:** Add `--kv-events-config '{"enable_kv_cache_events": false}'`
- **SGLang:** No flag needed (KV events disabled by default)
- **TensorRT-LLM:** No flag needed (KV events disabled by default)
**TensorRT-LLM only:** The warning `Cannot connect to ModelExpress server/transport error. Using direct download.`
is expected and can be safely ignored.
## Test Your Deployment
```bash
curl localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model": "Qwen/Qwen3-0.6B",
"messages": [{"role": "user", "content": "Hello!"}],
"max_tokens": 50}'
```