Kubernetes Deployment | NVIDIA Dynamo Documentation

Use the Kubernetes guides when you are ready to move beyond a local Dynamo process and deploy on a GPU cluster. Dynamo’s Kubernetes path is native to the platform: inference graphs are expressed as Dynamo CRDs, reconciled by the Dynamo operator, installed with Helm, and integrated with Kubernetes service discovery, Gateway API Inference Extension, scheduling, observability, and model-loading workflows.

This does not make Kubernetes the only way to use Dynamo. Local containers, PyPI installs, and standalone components remain the right path for evaluation, development, and incremental adoption.

Start with the Kubernetes Quickstart to run one model end to end. Then use the rest of the Kubernetes Deployment section based on what you need next:

Goal	Guide
Install the operator and prerequisites	Installation Guide
Deploy and manage models	Deployment Overview
Load models faster across pods	Model Caching and ModelExpress
Operate a cluster deployment	Autoscaling, Rolling Update, Disagg Communication, and Observability Metrics
Scale disaggregated serving	Multinode Deployments, Grove, and Topology Aware Scheduling
Integrate with Kubernetes serving APIs	Gateway API Inference Extension (GAIE) and LWS

If you are still evaluating Dynamo locally, start with the Quickstart and Local Installation first.