Installation Guide for Dynamo Kubernetes Platform
Deploy and manage Dynamo inference graphs on Kubernetes with automated orchestration and scaling, using the Dynamo Kubernetes Platform.
Before You Start
Determine your cluster environment:
Shared/Multi-Tenant Cluster (K8s cluster with existing Dynamo artifacts):
- CRDs already installed cluster-wide - skip CRD installation step
- A cluster-wide Dynamo operator is likely already running
- Do NOT install another operator - use the existing cluster-wide operator
- Only install a namespace-restricted operator if you specifically need to prevent the cluster-wide operator from managing your namespace (e.g., testing operator features you’re developing)
Dedicated Cluster (full cluster admin access):
- You install CRDs yourself
- Can use cluster-wide operator (default)
Local Development (Minikube, testing):
- See Minikube Setup first, then follow installation steps below
To check if CRDs already exist:
To check if a cluster-wide operator already exists:
Installation Paths
Platform is installed using Dynamo Kubernetes Platform helm chart.
Path A: Pre-built Artifacts
- Use case: Production deployment, shared or dedicated clusters
- Source: NGC published Helm charts
- Time: ~10 minutes
- Jump to: Path A
Path B: Custom Build from Source
- Use case: Contributing to Dynamo, using latest features from main branch, customization
- Requirements: Docker build environment
- Time: ~30 minutes
- Jump to: Path B
All helm install commands could be overridden by either setting the values.yaml file or by passing in your own values.yaml:
and/or setting values as flags to the helm install command, as follows:
Prerequisites
Before installing the Dynamo Kubernetes Platform, ensure you have the following tools and access:
Required Tools
Cluster and Access Requirements
- Kubernetes cluster v1.24+ with admin or namespace-scoped access
- Cluster type determined (shared vs dedicated) — see Before You Start
- CRD status checked if on a shared cluster
- NGC credentials (optional) — required only if pulling NVIDIA images from NGC
Verify Installation
Run the following to confirm your tools are correctly installed:
Pre-Deployment Checks
Before proceeding, run the pre-deployment check script to verify your cluster meets all requirements:
This script validates kubectl connectivity, default StorageClass configuration, and GPU node availability. See Pre-Deployment Checks for details.
No cluster? See Minikube Setup for local development.
Estimated installation time: 5-30 minutes depending on path
Path A: Production Install
Install from NGC published artifacts.
For Shared/Multi-Tenant Clusters:
If your cluster has namespace-restricted Dynamo operators, you MUST add namespace restriction to your installation:
Note: Use the full path dynamo-operator.namespaceRestriction.enabled=true (not just namespaceRestriction.enabled=true).
If you see this validation error, you need namespace restriction:
[!TIP] For multinode deployments, you need to install multinode orchestration components:
Option 1 (Recommended): Grove + KAI Scheduler
- Grove and KAI Scheduler can be installed manually or through the dynamo-platform helm install command.
- When using the dynamo-platform helm install command, Grove and KAI Scheduler are NOT installed by default. You can enable their installation by setting the following flags:
Option 2: LeaderWorkerSet (LWS) + Volcano
- If using LWS for multinode deployments, you must also install Volcano (required dependency):
- LWS Installation
- Volcano Installation (required for gang scheduling with LWS)
- These must be installed manually before deploying multinode workloads with LWS.
See the Multinode Deployment Guide for details on orchestrator selection.
[!TIP] By default, Model Express Server is not used. If you wish to use an existing Model Express Server, you can set the modelExpressURL to the existing server’s URL in the helm install command:
[!TIP] By default, Dynamo Operator is installed cluster-wide and will monitor all namespaces. If you wish to restrict the operator to monitor only a specific namespace (the helm release namespace by default), you can set the namespaceRestriction.enabled to true. You can also change the restricted namespace by setting the targetNamespace property.
Path B: Custom Build from Source
Build and deploy from source for customization, contributing to Dynamo, or using the latest features from the main branch.
Note: This gives you access to the latest unreleased features and fixes on the main branch.
Verify Installation
Next Steps
-
Deploy Model/Workflow
-
Explore Backend Guides
-
Optional:
- Set up Prometheus & Grafana
- SLA Planner Guide (for SLA-aware scheduling and autoscaling)
Troubleshooting
“VALIDATION ERROR: Cannot install cluster-wide Dynamo operator”
Cause: Attempting cluster-wide install on a shared cluster with existing namespace-restricted operators.
Solution: Add namespace restriction to your installation:
Note: Use the full path dynamo-operator.namespaceRestriction.enabled=true (not just namespaceRestriction.enabled=true).
CRDs already exist
Cause: Installing CRDs on a cluster where they’re already present (common on shared clusters).
Solution: Skip step 2 (CRD installation), proceed directly to platform installation.
To check if CRDs exist:
Pods not starting?
HuggingFace model access?
Bitnami etcd “unrecognized” image?
This error that you might encounter during helm install is due to bitnami changing their docker repository to a secure one.
just add the following to the helm install command:
Clean uninstall?
To uninstall the platform, you can run the following command:
To uninstall the CRDs, follow these steps:
Get all of the dynamo CRDs installed in your cluster:
You should see something like this:
Delete each CRD one by one: