⚠️ Important: This documentation is automatically generated from source code. Do not edit this file directly.

Packages

nvidia.com/v1alpha1

nvidia.com/v1alpha1

Package v1alpha1 contains API Schema definitions for the nvidia.com v1alpha1 API group.

This package defines the DynamoGraphDeploymentRequest (DGDR) custom resource, which provides a high-level, SLA-driven interface for deploying machine learning models on Dynamo.

Package v1alpha1 contains API Schema definitions for the nvidia.com v1alpha1 API group.

Resource Types

Autoscaling

Appears in:

Field	Description	Default	Validation
`enabled` boolean
`minReplicas` integer
`maxReplicas` integer
`behavior` HorizontalPodAutoscalerBehavior
`metrics` MetricSpec array

ConfigMapKeySelector

ConfigMapKeySelector selects a specific key from a ConfigMap. Used to reference external configuration data stored in ConfigMaps.

Appears in:

ProfilingConfigSpec

Field	Description	Default	Validation
`name` string	Name of the ConfigMap containing the desired data.		Required: {}
`key` string	Key in the ConfigMap to select. If not specified, defaults to “disagg.yaml”.	disagg.yaml

DeploymentOverridesSpec

DeploymentOverridesSpec allows users to customize metadata for auto-created DynamoGraphDeployments. When autoApply is enabled, these overrides are applied to the generated DGD resource.

Appears in:

DynamoGraphDeploymentRequestSpec

Field	Description	Validation
`name` string	Name is the desired name for the created DynamoGraphDeployment. If not specified, defaults to the DGDR name.	Optional: {}
`namespace` string	Namespace is the desired namespace for the created DynamoGraphDeployment. If not specified, defaults to the DGDR namespace.	Optional: {}
`labels` object (keys:string, values:string)	Labels are additional labels to add to the DynamoGraphDeployment metadata. These are merged with auto-generated labels from the profiling process.	Optional: {}
`annotations` object (keys:string, values:string)	Annotations are additional annotations to add to the DynamoGraphDeployment metadata.	Optional: {}
`workersImage` string	WorkersImage specifies the container image to use for DynamoGraphDeployment worker components. This image is used for both temporary DGDs created during online profiling and the final DGD. If omitted, the image from the base config file (e.g., disagg.yaml) is used. Example: “nvcr.io/nvidia/ai-dynamo/vllm-runtime:0.6.1”	Optional: {}

DeploymentStatus

DeploymentStatus tracks the state of an auto-created DynamoGraphDeployment. This status is populated when autoApply is enabled and a DGD is created.

Appears in:

DynamoGraphDeploymentRequestStatus

Field	Description	Default	Validation
`name` string	Name is the name of the created DynamoGraphDeployment.
`namespace` string	Namespace is the namespace of the created DynamoGraphDeployment.
`state` string	State is the current state of the DynamoGraphDeployment. This value is mirrored from the DGD’s status.state field.
`created` boolean	Created indicates whether the DGD has been successfully created. Used to prevent recreation if the DGD is manually deleted by users.

DynamoComponentDeployment

DynamoComponentDeployment is the Schema for the dynamocomponentdeployments API

Field	Description	Default	Validation
`apiVersion` string	`nvidia.com/v1alpha1`
`kind` string	`DynamoComponentDeployment`
`metadata` ObjectMeta	Refer to Kubernetes API documentation for fields of `metadata`.
`spec` DynamoComponentDeploymentSpec	Spec defines the desired state for this Dynamo component deployment.

DynamoComponentDeploymentSharedSpec

Appears in:

Field	Description	Validation
`annotations` object (keys:string, values:string)	Annotations to add to generated Kubernetes resources for this component (such as Pod, Service, and Ingress when applicable).
`labels` object (keys:string, values:string)	Labels to add to generated Kubernetes resources for this component.
`serviceName` string	The name of the component
`componentType` string	ComponentType indicates the role of this component (for example, “main”).
`subComponentType` string	SubComponentType indicates the sub-role of this component (for example, “prefill”).
`dynamoNamespace` string	DynamoNamespace is deprecated and will be removed in a future version. The DGD Kubernetes namespace and DynamoGraphDeployment name are used to construct the Dynamo namespace for each component	Optional: {}
`globalDynamoNamespace` boolean	GlobalDynamoNamespace indicates that the Component will be placed in the global Dynamo namespace
`resources` Resources	Resources requested and limits for this component, including CPU, memory, GPUs/devices, and any runtime-specific resources.
`autoscaling` Autoscaling	Autoscaling config for this component (replica range, target utilization, etc.).
`envs` EnvVar array	Envs defines additional environment variables to inject into the component containers.
`envFromSecret` string	EnvFromSecret references a Secret whose key/value pairs will be exposed as environment variables in the component containers.
`volumeMounts` VolumeMount array	VolumeMounts references PVCs defined at the top level for volumes to be mounted by the component.
`ingress` IngressSpec	Ingress config to expose the component outside the cluster (or through a service mesh).
`modelRef` ModelReference	ModelRef references a model that this component serves When specified, a headless service will be created for endpoint discovery
`sharedMemory` SharedMemorySpec	SharedMemory controls the tmpfs mounted at /dev/shm (enable/disable and size).
`extraPodMetadata` ExtraPodMetadata	ExtraPodMetadata adds labels/annotations to the created Pods.
`extraPodSpec` ExtraPodSpec	ExtraPodSpec allows to override the main pod spec configuration. It is a k8s standard PodSpec. It also contains a MainContainer (standard k8s Container) field that allows overriding the main container configuration.
`livenessProbe` Probe	LivenessProbe to detect and restart unhealthy containers.
`readinessProbe` Probe	ReadinessProbe to signal when the container is ready to receive traffic.
`replicas` integer	Replicas is the desired number of Pods for this component when autoscaling is not used.
`multinode` MultinodeSpec	Multinode is the configuration for multinode components.

DynamoComponentDeploymentSpec

DynamoComponentDeploymentSpec defines the desired state of DynamoComponentDeployment

Appears in:

DynamoComponentDeployment

Field	Description	Validation
`backendFramework` string	BackendFramework specifies the backend framework (e.g., “sglang”, “vllm”, “trtllm”)	Enum: [sglang vllm trtllm]
`annotations` object (keys:string, values:string)	Annotations to add to generated Kubernetes resources for this component (such as Pod, Service, and Ingress when applicable).
`labels` object (keys:string, values:string)	Labels to add to generated Kubernetes resources for this component.
`serviceName` string	The name of the component
`componentType` string	ComponentType indicates the role of this component (for example, “main”).
`subComponentType` string	SubComponentType indicates the sub-role of this component (for example, “prefill”).
`dynamoNamespace` string	DynamoNamespace is deprecated and will be removed in a future version. The DGD Kubernetes namespace and DynamoGraphDeployment name are used to construct the Dynamo namespace for each component	Optional: {}
`globalDynamoNamespace` boolean	GlobalDynamoNamespace indicates that the Component will be placed in the global Dynamo namespace
`resources` Resources	Resources requested and limits for this component, including CPU, memory, GPUs/devices, and any runtime-specific resources.
`autoscaling` Autoscaling	Autoscaling config for this component (replica range, target utilization, etc.).
`envs` EnvVar array	Envs defines additional environment variables to inject into the component containers.
`envFromSecret` string	EnvFromSecret references a Secret whose key/value pairs will be exposed as environment variables in the component containers.
`volumeMounts` VolumeMount array	VolumeMounts references PVCs defined at the top level for volumes to be mounted by the component.
`ingress` IngressSpec	Ingress config to expose the component outside the cluster (or through a service mesh).
`modelRef` ModelReference	ModelRef references a model that this component serves When specified, a headless service will be created for endpoint discovery
`sharedMemory` SharedMemorySpec	SharedMemory controls the tmpfs mounted at /dev/shm (enable/disable and size).
`extraPodMetadata` ExtraPodMetadata	ExtraPodMetadata adds labels/annotations to the created Pods.
`extraPodSpec` ExtraPodSpec	ExtraPodSpec allows to override the main pod spec configuration. It is a k8s standard PodSpec. It also contains a MainContainer (standard k8s Container) field that allows overriding the main container configuration.
`livenessProbe` Probe	LivenessProbe to detect and restart unhealthy containers.
`readinessProbe` Probe	ReadinessProbe to signal when the container is ready to receive traffic.
`replicas` integer	Replicas is the desired number of Pods for this component when autoscaling is not used.
`multinode` MultinodeSpec	Multinode is the configuration for multinode components.

DynamoGraphDeployment

DynamoGraphDeployment is the Schema for the dynamographdeployments API.

Field	Description	Default	Validation
`apiVersion` string	`nvidia.com/v1alpha1`
`kind` string	`DynamoGraphDeployment`
`metadata` ObjectMeta	Refer to Kubernetes API documentation for fields of `metadata`.
`spec` DynamoGraphDeploymentSpec	Spec defines the desired state for this graph deployment.
`status` DynamoGraphDeploymentStatus	Status reflects the current observed state of this graph deployment.

DynamoGraphDeploymentRequest

DynamoGraphDeploymentRequest is the Schema for the dynamographdeploymentrequests API. It serves as the primary interface for users to request model deployments with specific performance and resource constraints, enabling SLA-driven deployments.

Lifecycle:

Initial → Pending: Validates spec and prepares for profiling
Pending → Profiling: Creates and runs profiling job (online or AIC)
Profiling → Ready/Deploying: Generates DGD spec after profiling completes
Deploying → Ready: When autoApply=true, monitors DGD until Ready
Ready: Terminal state when DGD is operational or spec is available
DeploymentDeleted: Terminal state when auto-created DGD is manually deleted

The spec becomes immutable once profiling starts. Users must delete and recreate the DGDR to modify configuration after this point.

Field	Description	Default	Validation
`apiVersion` string	`nvidia.com/v1alpha1`
`kind` string	`DynamoGraphDeploymentRequest`
`metadata` ObjectMeta	Refer to Kubernetes API documentation for fields of `metadata`.
`spec` DynamoGraphDeploymentRequestSpec	Spec defines the desired state for this deployment request.
`status` DynamoGraphDeploymentRequestStatus	Status reflects the current observed state of this deployment request.

DynamoGraphDeploymentRequestSpec

DynamoGraphDeploymentRequestSpec defines the desired state of a DynamoGraphDeploymentRequest. This CRD serves as the primary interface for users to request model deployments with specific performance constraints and resource requirements, enabling SLA-driven deployments.

Appears in:

DynamoGraphDeploymentRequest

Field	Description	Default	Validation
`model` string	Model specifies the model to deploy (e.g., “Qwen/Qwen3-0.6B”, “meta-llama/Llama-3-70b”). This is a high-level identifier for easy reference in kubectl output and logs. The controller automatically sets this value in profilingConfig.config.deployment.model.		Required: {}
`backend` string	Backend specifies the inference backend to use. The controller automatically sets this value in profilingConfig.config.engine.backend.		Enum: [vllm sglang trtllm] Required: {}
`enableGpuDiscovery` boolean	EnableGpuDiscovery controls whether the profiler should automatically discover GPU resources from the Kubernetes cluster nodes. When enabled, the profiler will override any manually specified hardware configuration (min_num_gpus_per_engine, max_num_gpus_per_engine, num_gpus_per_node) with values detected from the cluster. Requires cluster-wide node access permissions - only available with cluster-scoped operators.	false	Optional: {}
`profilingConfig` ProfilingConfigSpec	ProfilingConfig provides the complete configuration for the profiling job. This configuration is passed directly to the profiler. The structure matches the profile_sla config format exactly (see ProfilingConfigSpec for schema). Note: deployment.model and engine.backend are automatically set from the high-level modelName and backend fields and should not be specified in this config.		Required: {}
`autoApply` boolean	AutoApply indicates whether to automatically create a DynamoGraphDeployment after profiling completes. If false, only the spec is generated and stored in status. Users can then manually create a DGD using the generated spec.	false
`deploymentOverrides` DeploymentOverridesSpec	DeploymentOverrides allows customizing metadata for the auto-created DGD. Only applicable when AutoApply is true.		Optional: {}

DynamoGraphDeploymentRequestStatus

DynamoGraphDeploymentRequestStatus represents the observed state of a DynamoGraphDeploymentRequest. The controller updates this status as the DGDR progresses through its lifecycle.

Appears in:

DynamoGraphDeploymentRequest

Field	Description	Validation
`state` string	State is a high-level textual status of the deployment request lifecycle. Possible values: "", “Pending”, “Profiling”, “Deploying”, “Ready”, “DeploymentDeleted”, “Failed” Empty string ("") represents the initial state before initialization.
`backend` string	Backend is extracted from profilingConfig.config.engine.backend for display purposes. This field is populated by the controller and shown in kubectl output.	Optional: {}
`observedGeneration` integer	ObservedGeneration reflects the generation of the most recently observed spec. Used to detect spec changes and enforce immutability after profiling starts.
`conditions` Condition array	Conditions contains the latest observed conditions of the deployment request. Standard condition types include: Validation, Profiling, SpecGenerated, DeploymentReady. Conditions are merged by type on patch updates.
`profilingResults` string	ProfilingResults contains a reference to the ConfigMap holding profiling data. Format: “configmap/<name>“	Optional: {}
`generatedDeployment` RawExtension	GeneratedDeployment contains the full generated DynamoGraphDeployment specification including metadata, based on profiling results. Users can extract this to create a DGD manually, or it’s used automatically when autoApply is true. Stored as RawExtension to preserve all fields including metadata.	EmbeddedResource: {} Optional: {}
`deployment` DeploymentStatus	Deployment tracks the auto-created DGD when AutoApply is true. Contains name, namespace, state, and creation status of the managed DGD.	Optional: {}

DynamoGraphDeploymentSpec

DynamoGraphDeploymentSpec defines the desired state of DynamoGraphDeployment.

Appears in:

DynamoGraphDeployment

Field	Description	Validation
`pvcs` PVC array	PVCs defines a list of persistent volume claims that can be referenced by components. Each PVC must have a unique name that can be referenced in component specifications.	MaxItems: 100 Optional: {}
`services` object (keys:string, values:DynamoComponentDeploymentSharedSpec	Services are the services to deploy as part of this deployment.	MaxProperties: 25 Optional: {}
`envs` EnvVar array	Envs are environment variables applied to all services in the deployment unless overridden by service-specific configuration.	Optional: {}
`backendFramework` string	BackendFramework specifies the backend framework (e.g., “sglang”, “vllm”, “trtllm”).	Enum: [sglang vllm trtllm]

DynamoGraphDeploymentStatus

DynamoGraphDeploymentStatus defines the observed state of DynamoGraphDeployment.

Appears in:

DynamoGraphDeployment

Field	Description	Default	Validation
`state` string	State is a high-level textual status of the graph deployment lifecycle.
`conditions` Condition array	Conditions contains the latest observed conditions of the graph deployment. The slice is merged by type on patch updates.

DynamoModel

DynamoModel is the Schema for the dynamo models API

Field	Description	Default	Validation
`apiVersion` string	`nvidia.com/v1alpha1`
`kind` string	`DynamoModel`
`metadata` ObjectMeta	Refer to Kubernetes API documentation for fields of `metadata`.
`spec` DynamoModelSpec
`status` DynamoModelStatus

DynamoModelSpec

DynamoModelSpec defines the desired state of DynamoModel

Appears in:

DynamoModel

Field	Description	Default	Validation
`modelName` string	ModelName is the full model identifier (e.g., “meta-llama/Llama-3.3-70B-Instruct-lora”)		Required: {}
`baseModelName` string	BaseModelName is the base model identifier that matches the service label This is used to discover endpoints via headless services		Required: {}
`modelType` string	ModelType specifies the type of model (e.g., “base”, “lora”, “adapter”)	base	Enum: [base lora adapter]
`source` ModelSource	Source specifies the model source location (only applicable for lora model type)

DynamoModelStatus

DynamoModelStatus defines the observed state of DynamoModel

Appears in:

DynamoModel

Field	Description	Default	Validation
`endpoints` EndpointInfo array	Endpoints is the current list of all endpoints for this model
`readyEndpoints` integer	ReadyEndpoints is the count of endpoints that are ready
`totalEndpoints` integer	TotalEndpoints is the total count of endpoints
`conditions` Condition array	Conditions represents the latest available observations of the model’s state

EndpointInfo

EndpointInfo represents a single endpoint (pod) serving the model

Appears in:

DynamoModelStatus

Field	Description	Default	Validation
`address` string	Address is the full address of the endpoint (e.g., “http://10.0.1.5:9090”)
`podName` string	PodName is the name of the pod serving this endpoint
`ready` boolean	Ready indicates whether the endpoint is ready to serve traffic For LoRA models: true if the POST /loras request succeeded with a 2xx status code For base models: always false (no probing performed)

IngressSpec

Appears in:

Field	Description	Default	Validation
`enabled` boolean	Enabled exposes the component through an ingress or virtual service when true.
`host` string	Host is the base host name to route external traffic to this component.
`useVirtualService` boolean	UseVirtualService indicates whether to configure a service-mesh VirtualService instead of a standard Ingress.
`virtualServiceGateway` string	VirtualServiceGateway optionally specifies the gateway name to attach the VirtualService to.
`hostPrefix` string	HostPrefix is an optional prefix added before the host.
`annotations` object (keys:string, values:string)	Annotations to set on the generated Ingress/VirtualService resources.
`labels` object (keys:string, values:string)	Labels to set on the generated Ingress/VirtualService resources.
`tls` IngressTLSSpec	TLS holds the TLS configuration used by the Ingress/VirtualService.
`hostSuffix` string	HostSuffix is an optional suffix appended after the host.
`ingressControllerClassName` string	IngressControllerClassName selects the ingress controller class (e.g., “nginx”).

IngressTLSSpec

Appears in:

IngressSpec

Field	Description	Default	Validation
`secretName` string	SecretName is the name of a Kubernetes Secret containing the TLS certificate and key.

ModelReference

ModelReference identifies a model served by this component

Appears in:

Field	Description	Default	Validation
`name` string	Name is the base model identifier (e.g., “llama-3-70b-instruct-v1”)		Required: {}
`revision` string	Revision is the model revision/version (optional)

ModelSource

ModelSource defines the source location of a model

Appears in:

DynamoModelSpec

Field	Description	Default	Validation
`uri` string	URI is the model source URI Supported formats: - S3: s3://bucket/path/to/model - HuggingFace: hf://org/model@revision_sha		Required: {}

MultinodeSpec

Appears in:

Field	Description	Default	Validation
`nodeCount` integer	Indicates the number of nodes to deploy for multinode components. Total number of GPUs is NumberOfNodes * GPU limit. Must be greater than 1.	2	Minimum: 2

PVC

Appears in:

DynamoGraphDeploymentSpec

Field	Description	Validation
`create` boolean	Create indicates to create a new PVC
`name` string	Name is the name of the PVC	Required: {}
`storageClass` string	StorageClass to be used for PVC creation. Required when create is true.
`size` Quantity	Size of the volume in Gi, used during PVC creation. Required when create is true.
`volumeAccessMode` PersistentVolumeAccessMode	VolumeAccessMode is the volume access mode of the PVC. Required when create is true.

ProfilingConfigSpec

ProfilingConfigSpec defines configuration for the profiling process. This structure maps directly to the profile_sla.py config format. See benchmarks/profiler/utils/profiler_argparse.py for the complete schema.

Appears in:

DynamoGraphDeploymentRequestSpec

Field	Description	Validation
`config` JSON	Config is the profiling configuration as arbitrary JSON/YAML. This will be passed directly to the profiler. The profiler will validate the configuration and report any errors.	Optional: {} Type: object
`configMapRef` ConfigMapKeySelector	ConfigMapRef is an optional reference to a ConfigMap containing the DynamoGraphDeployment base config file (disagg.yaml). This is separate from the profiling config above. The path to this config will be set as engine.config in the profiling config.	Optional: {}
`profilerImage` string	ProfilerImage specifies the container image to use for profiling jobs. This image contains the profiler code and dependencies needed for SLA-based profiling. Example: “nvcr.io/nvidia/ai-dynamo/vllm-runtime:0.6.1”	Required: {}

SharedMemorySpec

Appears in:

Field	Description	Default	Validation
`disabled` boolean
`size` Quantity

VolumeMount

VolumeMount references a PVC defined at the top level for volumes to be mounted by the component

Appears in:

Field	Description	Default	Validation
`name` string	Name references a PVC name defined in the top-level PVCs map		Required: {}
`mountPoint` string	MountPoint specifies where to mount the volume. If useAsCompilationCache is true and mountPoint is not specified, a backend-specific default will be used.
`useAsCompilationCache` boolean	UseAsCompilationCache indicates this volume should be used as a compilation cache. When true, backend-specific environment variables will be set and default mount points may be used.	false

Operator Default Values Injection

The Dynamo operator automatically applies default values to various fields when they are not explicitly specified in your deployments. These defaults include:

Health Probes: Startup, liveness, and readiness probes are configured differently for frontend, worker, and planner components. For example, worker components receive a startup probe with a 2-hour timeout (720 failures × 10 seconds) to accommodate long model loading times.
Security Context: All components receive fsGroup: 1000 by default to ensure proper file permissions for mounted volumes. This can be overridden via the extraPodSpec.securityContext field.
Shared Memory: All components receive an 8Gi shared memory volume mounted at /dev/shm by default (can be disabled or resized via the sharedMemory field).
Environment Variables: Components automatically receive environment variables like DYN_NAMESPACE, DYN_PARENT_DGD_K8S_NAME, DYNAMO_PORT, and backend-specific variables.
Pod Configuration: Default terminationGracePeriodSeconds of 60 seconds and restartPolicy: Always.
Autoscaling: When enabled without explicit metrics, defaults to CPU-based autoscaling with 80% target utilization.
Backend-Specific Behavior: For multinode deployments, probes are automatically modified or removed for worker nodes depending on the backend framework (VLLM, SGLang, or TensorRT-LLM).

Pod Specification Defaults

All components receive the following pod-level defaults unless overridden:

terminationGracePeriodSeconds: 60 seconds
restartPolicy: Always

Security Context

The operator automatically applies default security context settings to all components to ensure proper file permissions, particularly for mounted volumes:

fsGroup: 1000 - Sets the group ownership of mounted volumes and any files created in those volumes

This default ensures that non-root containers can write to mounted volumes (like model caches or persistent storage) without permission issues. The fsGroup setting is particularly important for:

Model downloads and caching
Compilation cache directories
Persistent volume claims (PVCs)
SSH key generation in multinode deployments

Overriding Security Context

To override the default security context, specify your own securityContext in the extraPodSpec of your component:

1 services:
2   YourWorker:
3     extraPodSpec:
4       securityContext:
5         fsGroup: 2000  # Custom group ID
6         runAsUser: 1000
7         runAsGroup: 1000
8         runAsNonRoot: true

Important: When you provide any securityContext object in extraPodSpec, the operator will not inject any defaults. This gives you complete control over the security context, including the ability to run as root (by omitting runAsNonRoot or setting it to false).

OpenShift and Security Context Constraints

In OpenShift environments with Security Context Constraints (SCCs), you may need to omit explicit UID/GID values to allow OpenShift’s admission controllers to assign them dynamically:

1 services:
2   YourWorker:
3     extraPodSpec:
4       securityContext:
5         # Omit fsGroup to let OpenShift assign it based on SCC
6         # OpenShift will inject the appropriate UID range

Alternatively, if you want to keep the default fsGroup: 1000 behavior and are certain your cluster allows it, you don’t need to specify anything - the operator defaults will work.

Shared Memory Configuration

Shared memory is enabled by default for all components:

Enabled: true (unless explicitly disabled via sharedMemory.disabled)
Size: 8Gi
Mount Path: /dev/shm
Volume Type: emptyDir with memory medium

To disable shared memory or customize the size, use the sharedMemory field in your component specification.

Health Probes by Component Type

The operator applies different default health probes based on the component type.

Frontend Components

Frontend components receive the following probe configurations:

Liveness Probe:

Type: HTTP GET
Path: /health
Port: http (8000)
Initial Delay: 60 seconds
Period: 60 seconds
Timeout: 30 seconds
Failure Threshold: 10

Readiness Probe:

Type: Exec command
Command: curl -s http://localhost:${DYNAMO_PORT}/health | jq -e ".status == \"healthy\""
Initial Delay: 60 seconds
Period: 60 seconds
Timeout: 30 seconds
Failure Threshold: 10

Worker Components

Worker components receive the following probe configurations:

Liveness Probe:

Type: HTTP GET
Path: /live
Port: system (9090)
Period: 5 seconds
Timeout: 30 seconds
Failure Threshold: 1

Readiness Probe:

Type: HTTP GET
Path: /health
Port: system (9090)
Period: 10 seconds
Timeout: 30 seconds
Failure Threshold: 60

Startup Probe:

Type: HTTP GET
Path: /live
Port: system (9090)
Period: 10 seconds
Timeout: 5 seconds
Failure Threshold: 720 (allows up to 2 hours for startup: 10s × 720 = 7200s)

:::{note} For larger models (typically >70B parameters) or slower storage systems, you may need to increase the failureThreshold to allow more time for model loading. Calculate the required threshold based on your expected startup time: failureThreshold = (expected_startup_seconds / period). Override the startup probe in your component specification if the default 2-hour window is insufficient. :::

Multinode Deployment Probe Modifications

For multinode deployments, the operator modifies probes based on the backend framework and node role:

VLLM Backend

The operator automatically selects between two deployment modes based on parallelism configuration:

Ray-Based Mode (when world_size > GPUs_per_node):

Worker nodes: All probes (liveness, readiness, startup) are removed
Leader nodes: All probes remain active

Data Parallel Mode (when world_size × data_parallel_size > GPUs_per_node):

Worker nodes: All probes (liveness, readiness, startup) are removed
Leader nodes: All probes remain active

SGLang Backend

Worker nodes: All probes (liveness, readiness, startup) are removed

TensorRT-LLM Backend

Leader nodes: All probes remain unchanged
Worker nodes:
- Liveness and startup probes are removed
- Readiness probe is replaced with a TCP socket check on SSH port (2222):
  - Initial Delay: 20 seconds
  - Period: 20 seconds
  - Timeout: 5 seconds
  - Failure Threshold: 10

Environment Variables

The operator automatically injects environment variables based on component type and configuration:

All Components

DYN_NAMESPACE: The Dynamo namespace for the component
DYN_PARENT_DGD_K8S_NAME: The parent DynamoGraphDeployment Kubernetes resource name
DYN_PARENT_DGD_K8S_NAMESPACE: The parent DynamoGraphDeployment Kubernetes namespace

Frontend Components

DYNAMO_PORT: 8000
DYN_HTTP_PORT: 8000

Worker Components

DYN_SYSTEM_PORT: 9090 (automatically enables the system metrics server)
DYN_SYSTEM_USE_ENDPOINT_HEALTH_STATUS: ["generate"]
DYN_SYSTEM_ENABLED: true (needed for runtime images 0.6.1 and older)

Planner Components

PLANNER_PROMETHEUS_PORT: 9085

VLLM Backend (with compilation cache)

When a volume mount is configured with useAsCompilationCache: true:

VLLM_CACHE_ROOT: Set to the mount point of the cache volume

Service Account

Planner components automatically receive the following service account:

serviceAccountName: planner-serviceaccount

Image Pull Secrets

The operator automatically discovers and injects image pull secrets for container images. When a component specifies a container image, the operator:

Scans all Kubernetes secrets of type kubernetes.io/dockerconfigjson in the component’s namespace
Extracts the docker registry server URLs from each secret’s authentication configuration
Matches the container image’s registry host against the discovered registry URLs
Automatically injects matching secrets as imagePullSecrets in the pod specification

This eliminates the need to manually specify image pull secrets for each component. The operator maintains an internal index of docker secrets and their associated registries, refreshing this index periodically.

To disable automatic image pull secret discovery for a specific component, add the following annotation:

1 annotations:
2   nvidia.com/disable-image-pull-secret-discovery: "true"

Autoscaling Defaults

When autoscaling is enabled but no metrics are specified, the operator applies:

Default Metric: CPU utilization
Target Average Utilization: 80%

Port Configurations

Default container ports are configured based on component type:

Frontend Components

Port: 8000
Protocol: TCP
Name: http

Worker Components

Port: 9090
Protocol: TCP
Name: system

Planner Components

Port: 9085
Protocol: TCP
Name: metrics

Backend-Specific Configurations

VLLM

Ray Head Port: 6379 (for Ray-based multinode deployments)
Data Parallel RPC Port: 13445 (for data parallel multinode deployments)

SGLang

Distribution Init Port: 29500 (for multinode deployments)

TensorRT-LLM

SSH Port: 2222 (for multinode MPI communication)
OpenMPI Environment: OMPI_MCA_orte_keep_fqdn_hostnames=1

Implementation Reference

For users who want to understand the implementation details or contribute to the operator, the default values described in this document are set in the following source files:

Health Probes, Security Context & Pod Specifications: internal/dynamo/graph.go - Contains the main logic for applying default probes, security context, environment variables, shared memory, and pod configurations
Component-Specific Defaults:
Image Pull Secrets: internal/secrets/docker.go - Implements the docker secret indexer and automatic discovery
Backend-Specific Behavior:
Constants & Annotations: internal/consts/consts.go - Defines annotation keys and other constants

Notes

All these defaults can be overridden by explicitly specifying values in your DynamoComponentDeployment or DynamoGraphDeployment resources
User-specified probes (via livenessProbe, readinessProbe, or startupProbe fields) take precedence over operator defaults
For security context, if you provide any securityContext in extraPodSpec, no defaults will be injected, giving you full control
For multinode deployments, some defaults are modified or removed as described above to accommodate distributed execution patterns
The extraPodSpec.mainContainer field can be used to override probe configurations set by the operator