For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
Digest
  • Getting Started
    • Quickstart
    • Introduction
    • Local Installation
    • Building from Source
    • Kubernetes Deployment
    • Contribution Guide
  • Resources
    • Support Matrix
    • Feature Matrix
    • Release Artifacts
    • Examples
    • Glossary
  • Digest
    • NVIDIA Dynamo Snapshot: Fast Startup for Inference Workloads on Kubernetes
    • DynoSim: Simulating the Pareto Frontier
    • Dynamo Day 0 support for TokenSpeed
    • Multi-Turn Agentic Harnesses
    • Full-Stack Optimizations for Agentic Inference
    • Flash Indexer: Inter-Galactic KV Routing
  • Kubernetes Deployment
  • Feature Guides
    • KV Cache Aware Routing
    • Disaggregated Serving
    • KV Cache Offloading
    • Benchmarking
    • Tool Calling & Reasoning Parsing
    • Fault Tolerance
    • Observability (Local)
    • Inference Simulation
    • Agents
    • LoRA Adapters
    • Multimodal
    • Diffusion
    • Fastokens Tokenizer
  • Backends
    • SGLang
    • TensorRT-LLM
    • vLLM
  • Components
    • Frontend
    • Router
    • Planner
    • Profiler
    • KVBM
  • Integrations
  • Design Docs
    • Overall Architecture
    • Architecture Flow
    • Disaggregated Serving
    • Distributed Runtime
  • Documentation
    • Dynamo Docs Guide
NVIDIANVIDIA
Developer-friendly docs for your API
Privacy Policy | Your Privacy Choices | Terms of Service | Accessibility | Corporate Policies | Product Security | Contact

Copyright © 2026, NVIDIA Corporation.

LogoLogoDocumentation
Digest
Getting Started

Kubernetes Deployment

Use Dynamo’s Kubernetes-native path when you are ready to deploy on a GPU cluster.

||View as Markdown|

Use the Kubernetes guides when you are ready to move beyond a local Dynamo process and deploy on a GPU cluster. Dynamo’s Kubernetes path is native to the platform: inference graphs are expressed as Dynamo CRDs, reconciled by the Dynamo operator, installed with Helm, and integrated with Kubernetes service discovery, Gateway API Inference Extension, scheduling, observability, and model-loading workflows.

This does not make Kubernetes the only way to use Dynamo. Local containers, PyPI installs, and standalone components remain the right path for evaluation, development, and incremental adoption.

Start with the Kubernetes Quickstart to run one model end to end. Then use the rest of the Kubernetes Deployment section based on what you need next:

GoalGuide
Install the operator and prerequisitesInstallation Guide
Deploy and manage modelsDeployment Overview
Load models faster across podsModel Caching and ModelExpress
Operate a cluster deploymentAutoscaling, Rolling Update, Disagg Communication, and Observability Metrics
Scale disaggregated servingMultinode Deployments, Grove, and Topology Aware Scheduling
Integrate with Kubernetes serving APIsGateway API Inference Extension (GAIE) and LWS

If you are still evaluating Dynamo locally, start with the Quickstart and Local Installation first.

Previous

Building from Source

Next

Contribution Guide