API Reference (K8s)
⚠️ Important: This documentation is automatically generated from source code. Do not edit this file directly.
API Reference
Packages
nvidia.com/v1alpha1
Package v1alpha1 contains API Schema definitions for the nvidia.com v1alpha1 API group.
This package defines the DynamoGraphDeploymentRequest (DGDR) custom resource, which provides a high-level, SLA-driven interface for deploying machine learning models on Dynamo.
Package v1alpha1 contains API Schema definitions for the nvidia.com v1alpha1 API group.
Resource Types
- DynamoCheckpoint
- DynamoComponentDeployment
- DynamoGraphDeployment
- DynamoGraphDeploymentRequest
- DynamoGraphDeploymentScalingAdapter
- DynamoModel
Autoscaling
Deprecated: This field is deprecated and ignored. Use DynamoGraphDeploymentScalingAdapter with HPA, KEDA, or Planner for autoscaling instead. See docs/pages/kubernetes/autoscaling.md for migration guidance. This field will be removed in a future API version.
Appears in:
CheckpointMode
Underlying type: string
CheckpointMode defines how checkpoint creation is handled
Validation:
- Enum: [Auto Manual]
Appears in:
ComponentKind
Underlying type: string
ComponentKind represents the type of underlying Kubernetes resource.
Validation:
- Enum: [PodClique PodCliqueScalingGroup Deployment LeaderWorkerSet]
Appears in:
ConfigMapKeySelector
ConfigMapKeySelector selects a specific key from a ConfigMap. Used to reference external configuration data stored in ConfigMaps.
Appears in:
DGDRState
Underlying type: string
Validation:
- Enum: [Initializing Pending Profiling Deploying Ready DeploymentDeleted Failed]
Appears in:
DGDState
Underlying type: string
Validation:
- Enum: [initializing pending successful failed]
Appears in:
DeploymentOverridesSpec
DeploymentOverridesSpec allows users to customize metadata for auto-created DynamoGraphDeployments. When autoApply is enabled, these overrides are applied to the generated DGD resource.
Appears in:
DeploymentStatus
DeploymentStatus tracks the state of an auto-created DynamoGraphDeployment. This status is populated when autoApply is enabled and a DGD is created.
Appears in:
DynamoCheckpoint
DynamoCheckpoint is the Schema for the dynamocheckpoints API It represents a container checkpoint that can be used to restore pods to a warm state
DynamoCheckpointIdentity
DynamoCheckpointIdentity defines the inputs that determine checkpoint equivalence Two checkpoints with the same identity hash are considered equivalent
Appears in:
DynamoCheckpointJobConfig
DynamoCheckpointJobConfig defines the configuration for the checkpoint creation Job
Appears in:
DynamoCheckpointPhase
Underlying type: string
DynamoCheckpointPhase represents the current phase of the checkpoint lifecycle
Validation:
- Enum: [Pending Creating Ready Failed]
Appears in:
DynamoCheckpointSpec
DynamoCheckpointSpec defines the desired state of DynamoCheckpoint
Appears in:
DynamoCheckpointStatus
DynamoCheckpointStatus defines the observed state of DynamoCheckpoint
Appears in:
DynamoCheckpointStorageType
Underlying type: string
DynamoCheckpointStorageType defines the supported storage backends for checkpoints
Validation:
- Enum: [pvc s3 oci]
Appears in:
DynamoComponentDeployment
DynamoComponentDeployment is the Schema for the dynamocomponentdeployments API
DynamoComponentDeploymentSharedSpec
Appears in:
DynamoComponentDeploymentSpec
DynamoComponentDeploymentSpec defines the desired state of DynamoComponentDeployment
Appears in:
DynamoGraphDeployment
DynamoGraphDeployment is the Schema for the dynamographdeployments API.
DynamoGraphDeploymentRequest
DynamoGraphDeploymentRequest is the Schema for the dynamographdeploymentrequests API. It serves as the primary interface for users to request model deployments with specific performance and resource constraints, enabling SLA-driven deployments.
Lifecycle:
- Initializing → Pending: Validates spec and prepares for profiling
- Pending → Profiling: Creates and runs profiling job (online or AIC)
- Profiling → Ready/Deploying: Generates DGD spec after profiling completes
- Deploying → Ready: When autoApply=true, monitors DGD until Ready
- Ready: Terminal state when DGD is operational or spec is available
- DeploymentDeleted: Terminal state when auto-created DGD is manually deleted
The spec becomes immutable once profiling starts. Users must delete and recreate the DGDR to modify configuration after this point.
DEPRECATION NOTICE: v1alpha1 DynamoGraphDeploymentRequest is deprecated. Please migrate to nvidia.com/v1beta1 DynamoGraphDeploymentRequest. v1alpha1 will be removed in a future release.
DynamoGraphDeploymentRequestSpec
DynamoGraphDeploymentRequestSpec defines the desired state of a DynamoGraphDeploymentRequest. This CRD serves as the primary interface for users to request model deployments with specific performance constraints and resource requirements, enabling SLA-driven deployments.
Appears in:
DynamoGraphDeploymentRequestStatus
DynamoGraphDeploymentRequestStatus represents the observed state of a DynamoGraphDeploymentRequest. The controller updates this status as the DGDR progresses through its lifecycle.
Appears in:
DynamoGraphDeploymentScalingAdapter
DynamoGraphDeploymentScalingAdapter provides a scaling interface for individual services within a DynamoGraphDeployment. It implements the Kubernetes scale subresource, enabling integration with HPA, KEDA, and custom autoscalers.
The adapter acts as an intermediary between autoscalers and the DGD, ensuring that only the adapter controller modifies the DGD’s service replicas. This prevents conflicts when multiple autoscaling mechanisms are in play.
DynamoGraphDeploymentScalingAdapterSpec
DynamoGraphDeploymentScalingAdapterSpec defines the desired state of DynamoGraphDeploymentScalingAdapter
Appears in:
DynamoGraphDeploymentScalingAdapterStatus
DynamoGraphDeploymentScalingAdapterStatus defines the observed state of DynamoGraphDeploymentScalingAdapter
Appears in:
DynamoGraphDeploymentServiceRef
DynamoGraphDeploymentServiceRef identifies a specific service within a DynamoGraphDeployment
Appears in:
DynamoGraphDeploymentSpec
DynamoGraphDeploymentSpec defines the desired state of DynamoGraphDeployment.
Appears in:
DynamoGraphDeploymentStatus
DynamoGraphDeploymentStatus defines the observed state of DynamoGraphDeployment.
Appears in:
DynamoModel
DynamoModel is the Schema for the dynamo models API
DynamoModelSpec
DynamoModelSpec defines the desired state of DynamoModel
Appears in:
DynamoModelStatus
DynamoModelStatus defines the observed state of DynamoModel
Appears in:
EPPConfig
EPPConfig contains configuration for EPP (Endpoint Picker Plugin) components. EPP is responsible for intelligent endpoint selection and KV-aware routing.
Appears in:
EndpointInfo
EndpointInfo represents a single endpoint (pod) serving the model
Appears in:
ExtraPodMetadata
Appears in:
ExtraPodSpec
Appears in:
IngressSpec
Appears in:
IngressTLSSpec
Appears in:
ModelReference
ModelReference identifies a model served by this component
Appears in:
ModelSource
ModelSource defines the source location of a model
Appears in:
MultinodeSpec
Appears in:
PVC
Appears in:
ProfilingConfigSpec
ProfilingConfigSpec defines configuration for the profiling process. This structure maps directly to the profile_sla.py config format. See dynamo/profiler/utils/profiler_argparse.py for the complete schema.
Appears in:
ResourceItem
Appears in:
Resources
Resources defines requested and limits for a component, including CPU, memory, GPUs/devices, and any runtime-specific resources.
Appears in:
Restart
Appears in:
RestartPhase
Underlying type: string
Appears in:
RestartStatus
RestartStatus contains the status of the restart of the graph deployment.
Appears in:
RestartStrategy
Appears in:
RestartStrategyType
Underlying type: string
Appears in:
RollingUpdatePhase
Underlying type: string
RollingUpdatePhase represents the current phase of a rolling update.
Validation:
- Enum: [Pending InProgress Completed Failed ]
Appears in:
RollingUpdateStatus
RollingUpdateStatus tracks the progress of a rolling update.
Appears in:
ScalingAdapter
ScalingAdapter configures whether a service uses the DynamoGraphDeploymentScalingAdapter for replica management. When enabled, the DGDSA owns the replicas field and external autoscalers (HPA, KEDA, Planner) can control scaling via the Scale subresource.
Appears in:
ServiceCheckpointConfig
ServiceCheckpointConfig configures checkpointing for a DGD service
Appears in:
ServiceCheckpointStatus
ServiceCheckpointStatus contains checkpoint information for a single service.
Appears in:
ServiceReplicaStatus
ServiceReplicaStatus contains replica information for a single service.
Appears in:
SharedMemorySpec
Appears in:
VolumeMount
VolumeMount references a PVC defined at the top level for volumes to be mounted by the component
Appears in:
nvidia.com/v1beta1
Package v1beta1 contains API Schema definitions for the nvidia.com v1beta1 API group.
Resource Types
BackendType
Underlying type: string
BackendType specifies the inference backend.
Validation:
- Enum: [auto sglang trtllm vllm]
Appears in:
DGDRPhase
Underlying type: string
DGDRPhase represents the lifecycle phase of a DynamoGraphDeploymentRequest.
Validation:
- Enum: [Pending Profiling Ready Deploying Deployed Failed]
Appears in:
DeploymentInfoStatus
DeploymentInfoStatus tracks the state of the deployed DynamoGraphDeployment.
Appears in:
v1beta1 DynamoGraphDeploymentRequest
DynamoGraphDeploymentRequest is the Schema for the dynamographdeploymentrequests API. It provides a simplified, SLA-driven interface for deploying inference models on Dynamo. Users specify a model and optional performance targets; the controller handles profiling, configuration selection, and deployment.
Lifecycle:
- Pending: Spec validated, preparing for profiling
- Profiling: Profiling job is running to discover optimal configurations
- Ready: Profiling complete, generated DGD spec available in status
- Deploying: DGD is being created and rolled out (when autoApply=true)
- Deployed: DGD is running and healthy
- Failed: An unrecoverable error occurred
v1beta1 DynamoGraphDeploymentRequestSpec
DynamoGraphDeploymentRequestSpec defines the desired state of a DynamoGraphDeploymentRequest. Only the Model field is required; all other fields are optional and have sensible defaults.
Appears in:
v1beta1 DynamoGraphDeploymentRequestStatus
DynamoGraphDeploymentRequestStatus represents the observed state of a DynamoGraphDeploymentRequest.
Appears in:
FeaturesSpec
FeaturesSpec controls optional Dynamo platform features in the generated deployment.
Appears in:
HardwareSpec
HardwareSpec describes the hardware resources available for profiling and deployment. These fields are typically auto-filled by the operator from cluster discovery.
Appears in:
MockerSpec
MockerSpec configures the simulated (mocker) backend.
Appears in:
ModelCacheSpec
ModelCacheSpec references a PVC containing pre-downloaded model weights.
Appears in:
OptimizationType
Underlying type: string
OptimizationType specifies the profiling optimization strategy.
Validation:
- Enum: [latency throughput]
Appears in:
OverridesSpec
OverridesSpec allows customizing the profiling job and the generated DynamoGraphDeployment.
Appears in:
ParetoConfig
ParetoConfig represents a single Pareto-optimal deployment configuration discovered during profiling.
Appears in:
ProfilingPhase
Underlying type: string
ProfilingPhase represents a sub-phase within the profiling pipeline. When the DGDR Phase is “Profiling”, this value indicates which step of the profiling pipeline is currently executing.
Validation:
- Enum: [Initializing SweepingPrefill SweepingDecode SelectingConfig BuildingCurves GeneratingDGD Done]
Appears in:
ProfilingResultsStatus
ProfilingResultsStatus contains the output of the profiling process.
Appears in:
SLASpec
SLASpec defines the service-level agreement targets for profiling optimization. Exactly one mode should be active: ttft+itl (default), e2eLatency, or optimizationType.
Appears in:
SearchStrategy
Underlying type: string
SearchStrategy controls the profiling search depth.
Validation:
- Enum: [rapid thorough]
Appears in:
WorkloadSpec
WorkloadSpec defines the workload characteristics for SLA-based profiling.
Appears in:
operator.config.dynamo.nvidia.com/v1alpha1
Resource Types
CheckpointConfiguration
CheckpointConfiguration holds checkpoint/restore settings.
Appears in:
CheckpointOCIConfig
CheckpointOCIConfig holds OCI registry storage configuration.
Appears in:
CheckpointPVCConfig
CheckpointPVCConfig holds PVC storage configuration.
Appears in:
CheckpointS3Config
CheckpointS3Config holds S3 storage configuration.
Appears in:
CheckpointStorageConfiguration
CheckpointStorageConfiguration holds storage backend configuration for checkpoints.
Appears in:
DiscoveryBackend
Underlying type: string
DiscoveryBackend is the type for the discovery backend.
Appears in:
DiscoveryConfiguration
DiscoveryConfiguration holds discovery backend settings.
Appears in:
GPUConfiguration
GPUConfiguration holds GPU discovery settings.
Appears in:
GroveConfiguration
GroveConfiguration holds Grove orchestrator settings.
Appears in:
InfrastructureConfiguration
InfrastructureConfiguration holds service mesh and backend addresses.
Appears in:
IngressConfiguration
IngressConfiguration holds ingress settings.
Appears in:
KaiSchedulerConfiguration
KaiSchedulerConfiguration holds Kai-scheduler settings.
Appears in:
LWSConfiguration
LWSConfiguration holds LWS orchestrator settings.
Appears in:
LeaderElectionConfiguration
LeaderElectionConfiguration holds leader election settings.
Appears in:
LoggingConfiguration
LoggingConfiguration holds logging settings.
Appears in:
MPIConfiguration
MPIConfiguration holds MPI SSH secret settings.
Appears in:
MetricsServer
MetricsServer extends Server with secure serving option.
Appears in:
NamespaceConfiguration
NamespaceConfiguration determines operator namespace mode.
Appears in:
NamespaceScopeConfiguration
NamespaceScopeConfiguration holds lease settings for namespace-restricted mode.
Appears in:
OperatorConfiguration
OperatorConfiguration is the Schema for the operator configuration.
OrchestratorConfiguration
OrchestratorConfiguration holds orchestrator override settings.
Appears in:
RBACConfiguration
RBACConfiguration holds RBAC settings for cluster-wide mode.
Appears in:
SecurityConfiguration
SecurityConfiguration holds HTTP/2 and TLS settings.
Appears in:
Server
Server holds a bind address and port.
Appears in:
ServerConfiguration
ServerConfiguration holds server bind addresses and ports.
Appears in:
WebhookServer
WebhookServer extends Server with host and certificate directory.
Appears in:
Operator Default Values Injection
The Dynamo operator automatically applies default values to various fields when they are not explicitly specified in your deployments. These defaults include:
-
Health Probes: Startup, liveness, and readiness probes are configured differently for frontend, worker, and planner components. For example, worker components receive a startup probe with a 2-hour timeout (720 failures × 10 seconds) to accommodate long model loading times.
-
Security Context: All components receive
fsGroup: 1000by default to ensure proper file permissions for mounted volumes. This can be overridden via theextraPodSpec.securityContextfield. -
Shared Memory: All components receive an 8Gi shared memory volume mounted at
/dev/shmby default (can be disabled or resized via thesharedMemoryfield). -
Environment Variables: Components automatically receive environment variables like
DYN_NAMESPACE,DYN_PARENT_DGD_K8S_NAME,DYNAMO_PORT, and backend-specific variables. -
Pod Configuration: Default
terminationGracePeriodSecondsof 60 seconds andrestartPolicy: Always. -
Autoscaling: When enabled without explicit metrics, defaults to CPU-based autoscaling with 80% target utilization.
-
Backend-Specific Behavior: For multinode deployments, probes are automatically modified or removed for worker nodes depending on the backend framework (VLLM, SGLang, or TensorRT-LLM).
Pod Specification Defaults
All components receive the following pod-level defaults unless overridden:
terminationGracePeriodSeconds:60secondsrestartPolicy:Always
Security Context
The operator automatically applies default security context settings to all components to ensure proper file permissions, particularly for mounted volumes:
fsGroup:1000- Sets the group ownership of mounted volumes and any files created in those volumes
This default ensures that non-root containers can write to mounted volumes (like model caches or persistent storage) without permission issues. The fsGroup setting is particularly important for:
- Model downloads and caching
- Compilation cache directories
- Persistent volume claims (PVCs)
- SSH key generation in multinode deployments
Overriding Security Context
To override the default security context, specify your own securityContext in the extraPodSpec of your component:
Important: When you provide any securityContext object in extraPodSpec, the operator will not inject any defaults. This gives you complete control over the security context, including the ability to run as root (by omitting runAsNonRoot or setting it to false).
OpenShift and Security Context Constraints
In OpenShift environments with Security Context Constraints (SCCs), you may need to omit explicit UID/GID values to allow OpenShift’s admission controllers to assign them dynamically:
Alternatively, if you want to keep the default fsGroup: 1000 behavior and are certain your cluster allows it, you don’t need to specify anything - the operator defaults will work.
Shared Memory Configuration
Shared memory is enabled by default for all components:
- Enabled:
true(unless explicitly disabled viasharedMemory.disabled) - Size:
8Gi - Mount Path:
/dev/shm - Volume Type:
emptyDirwithmemorymedium
To disable shared memory or customize the size, use the sharedMemory field in your component specification.
Health Probes by Component Type
The operator applies different default health probes based on the component type.
Frontend Components
Frontend components receive the following probe configurations:
Liveness Probe:
- Type: HTTP GET
- Path:
/health - Port:
http(8000) - Initial Delay: 60 seconds
- Period: 60 seconds
- Timeout: 30 seconds
- Failure Threshold: 10
Readiness Probe:
- Type: Exec command
- Command:
curl -s http://localhost:${DYNAMO_PORT}/health | jq -e ".status == \"healthy\"" - Initial Delay: 60 seconds
- Period: 60 seconds
- Timeout: 30 seconds
- Failure Threshold: 10
Worker Components
Worker components receive the following probe configurations:
Liveness Probe:
- Type: HTTP GET
- Path:
/live - Port:
system(9090) - Period: 5 seconds
- Timeout: 30 seconds
- Failure Threshold: 1
Readiness Probe:
- Type: HTTP GET
- Path:
/health - Port:
system(9090) - Period: 10 seconds
- Timeout: 30 seconds
- Failure Threshold: 60
Startup Probe:
- Type: HTTP GET
- Path:
/live - Port:
system(9090) - Period: 10 seconds
- Timeout: 5 seconds
- Failure Threshold: 720 (allows up to 2 hours for startup: 10s × 720 = 7200s)
:::{note}
For larger models (typically >70B parameters) or slower storage systems, you may need to increase the failureThreshold to allow more time for model loading. Calculate the required threshold based on your expected startup time: failureThreshold = (expected_startup_seconds / period). Override the startup probe in your component specification if the default 2-hour window is insufficient.
:::
Multinode Deployment Probe Modifications
For multinode deployments, the operator modifies probes based on the backend framework and node role:
VLLM Backend
The operator automatically selects between two deployment modes based on parallelism configuration:
Tensor/Pipeline Parallel Mode (when world_size > GPUs_per_node):
- Uses Ray for distributed execution (
--distributed-executor-backend ray) - Leader nodes: Starts Ray head and runs vLLM; all probes remain active
- Worker nodes: Run Ray agents only; all probes (liveness, readiness, startup) are removed
Data Parallel Mode (when world_size × data_parallel_size > GPUs_per_node):
- Worker nodes: All probes (liveness, readiness, startup) are removed
- Leader nodes: All probes remain active
SGLang Backend
- Worker nodes: All probes (liveness, readiness, startup) are removed
TensorRT-LLM Backend
- Leader nodes: All probes remain unchanged
- Worker nodes:
- Liveness and startup probes are removed
- Readiness probe is replaced with a TCP socket check on SSH port (2222):
- Initial Delay: 20 seconds
- Period: 20 seconds
- Timeout: 5 seconds
- Failure Threshold: 10
Environment Variables
The operator automatically injects environment variables into component containers based on component type, backend framework, and operator configuration. User-provided envs values always take precedence over operator defaults.
All Components
These environment variables are injected into every component container regardless of type.
Infrastructure (Conditional)
These are injected into all components when the corresponding infrastructure service is configured in the operator’s OperatorConfiguration.
Frontend Components
Worker Components
Planner Components
EPP (Endpoint Picker Plugin) Components
VLLM Backend
TensorRT-LLM Backend
Checkpoint / Restore
These environment variables are injected when checkpoint/restore is enabled for a component.
Service Accounts
The following component types automatically receive dedicated service accounts:
- Planner:
planner-serviceaccount - EPP:
epp-serviceaccount
Image Pull Secrets
The operator automatically discovers and injects image pull secrets for container images. When a component specifies a container image, the operator:
- Scans all Kubernetes secrets of type
kubernetes.io/dockerconfigjsonin the component’s namespace - Extracts the docker registry server URLs from each secret’s authentication configuration
- Matches the container image’s registry host against the discovered registry URLs
- Automatically injects matching secrets as
imagePullSecretsin the pod specification
This eliminates the need to manually specify image pull secrets for each component. The operator maintains an internal index of docker secrets and their associated registries, refreshing this index periodically.
To disable automatic image pull secret discovery for a specific component, add the following annotation:
Autoscaling Defaults
When autoscaling is enabled but no metrics are specified, the operator applies:
- Default Metric: CPU utilization
- Target Average Utilization:
80%
Port Configurations
Default container ports are configured based on component type:
Frontend Components
- Port: 8000
- Protocol: TCP
- Name:
http
Worker Components
- Port: 9090 (system)
- Protocol: TCP
- Name:
system - Port: 19090 (NIXL)
- Protocol: TCP
- Name:
nixl
Planner Components
- Port: 9085
- Protocol: TCP
- Name:
metrics
EPP Components
- Port: 9002 (gRPC)
- Protocol: TCP
- Name:
grpc - Port: 9003 (gRPC health)
- Protocol: TCP
- Name:
grpc-health - Port: 9090 (metrics)
- Protocol: TCP
- Name:
metrics
Backend-Specific Configurations
VLLM
- Ray Head Port: 6379 (for Ray cluster coordination in multinode TP/PP deployments)
- Data Parallel RPC Port: 13445 (for data parallel multinode deployments)
SGLang
- Distribution Init Port: 29500 (for multinode deployments)
TensorRT-LLM
- SSH Port: 2222 (for multinode MPI communication)
- OpenMPI Environment:
OMPI_MCA_orte_keep_fqdn_hostnames=1
Implementation Reference
For users who want to understand the implementation details or contribute to the operator, the default values described in this document are set in the following source files:
- Health Probes, Security Context & Pod Specifications:
internal/dynamo/graph.go- Contains the main logic for applying default probes, security context, environment variables, shared memory, and pod configurations - Component-Specific Defaults:
internal/dynamo/component_common.go- Base container and pod spec shared by all component typesinternal/dynamo/component_frontend.gointernal/dynamo/component_worker.gointernal/dynamo/component_planner.gointernal/dynamo/component_epp.go
- Image Pull Secrets:
internal/secrets/docker.go- Implements the docker secret indexer and automatic discovery - Backend-Specific Behavior:
- Checkpoint / Restore:
internal/checkpoint/dgd_integration.go- Checkpoint env var injection and volume setup - Constants & Annotations:
internal/consts/consts.go- Defines annotation keys and other constants
Notes
- All these defaults can be overridden by explicitly specifying values in your DynamoComponentDeployment or DynamoGraphDeployment resources
- User-specified probes (via
livenessProbe,readinessProbe, orstartupProbefields) take precedence over operator defaults - For security context, if you provide any
securityContextinextraPodSpec, no defaults will be injected, giving you full control - For multinode deployments, some defaults are modified or removed as described above to accommodate distributed execution patterns
- The
extraPodSpec.mainContainerfield can be used to override probe configurations set by the operator