# Deploying Dynamo on Kubernetes High-level guide to Dynamo Kubernetes deployments. Start here, then dive into specific guides. ## Important Terminology **Kubernetes Namespace**: The K8s namespace where your DynamoGraphDeployment resource is created. - Used for: Resource isolation, RBAC, organizing deployments - Example: `dynamo-system`, `team-a-namespace` **Dynamo Namespace**: The logical namespace used by Dynamo components for [service discovery](/dynamo/v-0-9-0/kubernetes-deployment/deployment-guide/service-discovery). - Used for: Runtime component communication, service discovery - Specified in: `.spec.services..dynamoNamespace` field - Example: `my-llm`, `production-model`, `dynamo-dev` These are independent. A single Kubernetes namespace can host multiple Dynamo namespaces, and vice versa. ## Prerequisites Before you begin, ensure you have the following tools installed: | Tool | Minimum Version | Installation Guide | |------|-----------------|-------------------| | **kubectl** | v1.24+ | [Install kubectl](https://kubernetes.io/docs/tasks/tools/#kubectl) | | **Helm** | v3.0+ | [Install Helm](https://helm.sh/docs/intro/install/) | Verify your installation: ```bash kubectl version --client # Should show v1.24+ helm version # Should show v3.0+ ``` For detailed installation instructions, see the [Prerequisites section](/dynamo/v-0-9-0/kubernetes-deployment/deployment-guide/detailed-installation-guide#prerequisites) in the Installation Guide. ## Pre-deployment Checks Before deploying the platform, run the pre-deployment checks to ensure the cluster is ready: ```bash ./deploy/pre-deployment/pre-deployment-check.sh ``` This validates kubectl connectivity, StorageClass configuration, and GPU availability. See [pre-deployment checks](https://github.com/ai-dynamo/dynamo/tree/main/deploy/pre-deployment/README) for more details. ## 1. Install Platform First ```bash # 1. Set environment export NAMESPACE=dynamo-system export RELEASE_VERSION=0.x.x # any version of Dynamo 0.3.2+ listed at https://github.com/ai-dynamo/dynamo/releases # 2. Install CRDs (skip if on shared cluster where CRDs already exist) helm fetch https://helm.ngc.nvidia.com/nvidia/ai-dynamo/charts/dynamo-crds-${RELEASE_VERSION}.tgz helm install dynamo-crds dynamo-crds-${RELEASE_VERSION}.tgz --namespace default # 3. Install Platform helm fetch https://helm.ngc.nvidia.com/nvidia/ai-dynamo/charts/dynamo-platform-${RELEASE_VERSION}.tgz helm install dynamo-platform dynamo-platform-${RELEASE_VERSION}.tgz --namespace ${NAMESPACE} --create-namespace ``` **For Shared/Multi-Tenant Clusters:** If your cluster has namespace-restricted Dynamo operators, add this flag to step 3: ```bash --set dynamo-operator.namespaceRestriction.enabled=true ``` For more details or customization options (including multinode deployments), see **[Installation Guide for Dynamo Kubernetes Platform](/dynamo/v-0-9-0/kubernetes-deployment/deployment-guide/detailed-installation-guide)**. ## 2. Choose Your Backend Each backend has deployment examples and configuration options: | Backend | Aggregated | Aggregated + Router | Disaggregated | Disaggregated + Router | Disaggregated + Planner | Disaggregated Multi-node | |--------------|:----------:|:-------------------:|:-------------:|:----------------------:|:-----------------------:|:------------------------:| | **[SGLang](https://github.com/ai-dynamo/dynamo/tree/main/examples/backends/sglang/deploy/README)** | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | | **[TensorRT-LLM](https://github.com/ai-dynamo/dynamo/tree/main/examples/backends/trtllm/deploy/README)** | ✅ | ✅ | ✅ | ✅ | 🚧 | ✅ | | **[vLLM](https://github.com/ai-dynamo/dynamo/tree/main/examples/backends/vllm/deploy/README)** | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ## 3. Deploy Your First Model ```bash export NAMESPACE=dynamo-system kubectl create namespace ${NAMESPACE} # to pull model from HF export HF_TOKEN= kubectl create secret generic hf-token-secret \ --from-literal=HF_TOKEN="$HF_TOKEN" \ -n ${NAMESPACE}; # Deploy any example (this uses vLLM with Qwen model using aggregated serving) kubectl apply -f examples/backends/vllm/deploy/agg.yaml -n ${NAMESPACE} # Check status kubectl get dynamoGraphDeployment -n ${NAMESPACE} # Test it kubectl port-forward svc/vllm-agg-frontend 8000:8000 -n ${NAMESPACE} curl http://localhost:8000/v1/models ``` For SLA-based autoscaling, see [SLA Planner Guide](/dynamo/v-0-9-0/components/planner/planner-guide). ## Understanding Dynamo's Custom Resources Dynamo provides two main Kubernetes Custom Resources for deploying models: ### DynamoGraphDeploymentRequest (DGDR) - Simplified SLA-Driven Configuration The **recommended approach** for generating optimal configurations. DGDR provides a high-level interface where you specify: - Model name and backend framework - SLA targets (latency requirements) - GPU type (optional) Dynamo automatically handles profiling and generates an optimized DGD spec in the status. Perfect for: - SLA-driven configuration generation - Automated resource optimization - Users who want simplicity over control **Note**: DGDR generates a DGD spec which you can then use to deploy. ### DynamoGraphDeployment (DGD) - Direct Configuration A lower-level interface that defines your complete inference pipeline: - Model configuration - Resource allocation (GPUs, memory) - Scaling policies - Frontend/backend connections Use this when you need fine-grained control or have already completed profiling. Refer to the [API Reference and Documentation](/dynamo/v-0-9-0/additional-resources/api-reference-k-8-s) for more details. ## 📖 API Reference & Documentation For detailed technical specifications of Dynamo's Kubernetes resources: - **[API Reference](/dynamo/v-0-9-0/additional-resources/api-reference-k-8-s)** - Complete CRD field specifications for all Dynamo resources - **[Create Deployment](/dynamo/v-0-9-0/additional-resources/creating-deployments)** - Step-by-step deployment creation with DynamoGraphDeployment - **[Operator Guide](/dynamo/v-0-9-0/kubernetes-deployment/deployment-guide/dynamo-operator)** - Dynamo operator configuration and management ### Choosing Your Architecture Pattern When creating a deployment, select the architecture pattern that best fits your use case: - **Development / Testing** - Use `agg.yaml` as the base configuration - **Production with Load Balancing** - Use `agg_router.yaml` to enable scalable, load-balanced inference - **High Performance / Disaggregated** - Use `disagg_router.yaml` for maximum throughput and modular scalability ### Frontend and Worker Components You can run the Frontend on one machine (e.g., a CPU node) and workers on different machines (GPU nodes). The Frontend serves as a framework-agnostic HTTP entry point that: - Provides OpenAI-compatible `/v1/chat/completions` endpoint - Auto-discovers backend workers via [service discovery](/dynamo/v-0-9-0/kubernetes-deployment/deployment-guide/service-discovery) (Kubernetes-native by default) - Routes requests and handles load balancing - Validates and preprocesses requests ### Customizing Your Deployment Example structure: ```yaml apiVersion: nvidia.com/v1alpha1 kind: DynamoGraphDeployment metadata: name: my-llm spec: services: Frontend: dynamoNamespace: my-llm componentType: frontend replicas: 1 extraPodSpec: mainContainer: image: your-image VllmDecodeWorker: # or SGLangDecodeWorker, TrtllmDecodeWorker dynamoNamespace: dynamo-dev componentType: worker replicas: 1 envFromSecret: hf-token-secret # for HuggingFace models resources: limits: gpu: "1" extraPodSpec: mainContainer: image: your-image command: ["/bin/sh", "-c"] args: - python3 -m dynamo.vllm --model YOUR_MODEL [--your-flags] ``` Worker command examples per backend: ```yaml # vLLM worker args: - python3 -m dynamo.vllm --model Qwen/Qwen3-0.6B # SGLang worker args: - >- python3 -m dynamo.sglang --model-path deepseek-ai/DeepSeek-R1-Distill-Llama-8B --tp 1 --trust-remote-code # TensorRT-LLM worker args: - python3 -m dynamo.trtllm --model-path deepseek-ai/DeepSeek-R1-Distill-Llama-8B --served-model-name deepseek-ai/DeepSeek-R1-Distill-Llama-8B --extra-engine-args /workspace/examples/backends/trtllm/engine_configs/deepseek-r1-distill-llama-8b/agg.yaml ``` Key customization points include: - **Model Configuration**: Specify model in the args command - **Resource Allocation**: Configure GPU requirements under `resources.limits` - **Scaling**: Set `replicas` for number of worker instances - **Routing Mode**: Enable KV-cache routing by setting `DYN_ROUTER_MODE=kv` in Frontend envs - **Worker Specialization**: Add `--is-prefill-worker` flag for disaggregated prefill workers ## Additional Resources - **[Examples](https://github.com/ai-dynamo/dynamo/tree/main/examples/README.md)** - Complete working examples - **[Create Custom Deployments](/dynamo/v-0-9-0/additional-resources/creating-deployments)** - Build your own CRDs - **[Managing Models with DynamoModel](/dynamo/v-0-9-0/kubernetes-deployment/deployment-guide/managing-models-with-dynamo-model)** - Deploy LoRA adapters and manage models - **[Operator Documentation](/dynamo/v-0-9-0/kubernetes-deployment/deployment-guide/dynamo-operator)** - How the platform works - **[Service Discovery](/dynamo/v-0-9-0/kubernetes-deployment/deployment-guide/service-discovery)** - Discovery backends and configuration - **[Helm Charts](https://github.com/ai-dynamo/dynamo/tree/main/deploy/helm/README)** - For advanced users - **[GitOps Deployment with FluxCD](/dynamo/v-0-9-0/additional-resources/flux-cd)** - For advanced users - **[Logging](/dynamo/v-0-9-0/kubernetes-deployment/observability-k-8-s/logging)** - For logging setup - **[Multinode Deployment](/dynamo/v-0-9-0/kubernetes-deployment/multinode/multinode-deployments)** - For multinode deployment - **[Grove](/dynamo/v-0-9-0/kubernetes-deployment/multinode/grove)** - For grove details and custom installation - **[Monitoring](/dynamo/v-0-9-0/kubernetes-deployment/observability-k-8-s/metrics)** - For monitoring setup - **[Model Caching with Fluid](/dynamo/v-0-9-0/additional-resources/model-caching-with-fluid)** - For model caching with Fluid