Additional Resources

API Reference (K8s)

View as Markdown

⚠️ Important: This documentation is automatically generated from source code. Do not edit this file directly.

API Reference

Packages

nvidia.com/v1alpha1

Package v1alpha1 contains API Schema definitions for the nvidia.com v1alpha1 API group.

This package defines the DynamoGraphDeploymentRequest (DGDR) custom resource, which provides a high-level, SLA-driven interface for deploying machine learning models on Dynamo.

Package v1alpha1 contains API Schema definitions for the nvidia.com v1alpha1 API group.

Resource Types

Autoscaling

Deprecated: This field is deprecated and ignored. Use DynamoGraphDeploymentScalingAdapter with HPA, KEDA, or Planner for autoscaling instead. See docs/pages/kubernetes/autoscaling.md for migration guidance. This field will be removed in a future API version.

Appears in:

FieldDescriptionDefaultValidation
enabled booleanDeprecated: This field is ignored.
minReplicas integerDeprecated: This field is ignored.
maxReplicas integerDeprecated: This field is ignored.
behavior HorizontalPodAutoscalerBehaviorDeprecated: This field is ignored.
metrics MetricSpec arrayDeprecated: This field is ignored.

CheckpointMode

Underlying type: string

CheckpointMode defines how checkpoint creation is handled

Validation:

  • Enum: [Auto Manual]

Appears in:

FieldDescription
AutoCheckpointModeAuto means the DGD controller will automatically create a Checkpoint CR
ManualCheckpointModeManual means the user must create the Checkpoint CR themselves

ComponentKind

Underlying type: string

ComponentKind represents the type of underlying Kubernetes resource.

Validation:

  • Enum: [PodClique PodCliqueScalingGroup Deployment LeaderWorkerSet]

Appears in:

FieldDescription
PodCliqueComponentKindPodClique represents a PodClique resource.
PodCliqueScalingGroupComponentKindPodCliqueScalingGroup represents a PodCliqueScalingGroup resource.
DeploymentComponentKindDeployment represents a Deployment resource.
LeaderWorkerSetComponentKindLeaderWorkerSet represents a LeaderWorkerSet resource.

ConfigMapKeySelector

ConfigMapKeySelector selects a specific key from a ConfigMap. Used to reference external configuration data stored in ConfigMaps.

Appears in:

FieldDescriptionDefaultValidation
name stringName of the ConfigMap containing the desired data.Required: {}
key stringKey in the ConfigMap to select. If not specified, defaults to “disagg.yaml”.disagg.yaml

DGDRState

Underlying type: string

Validation:

  • Enum: [Initializing Pending Profiling Deploying Ready DeploymentDeleted Failed]

Appears in:

FieldDescription
Initializing
Pending
Profiling
Deploying
Ready
DeploymentDeleted
Failed

DGDState

Underlying type: string

Validation:

  • Enum: [initializing pending successful failed]

Appears in:

FieldDescription
initializing
pending
successful
failed

DeploymentOverridesSpec

DeploymentOverridesSpec allows users to customize metadata for auto-created DynamoGraphDeployments. When autoApply is enabled, these overrides are applied to the generated DGD resource.

Appears in:

FieldDescriptionDefaultValidation
name stringName is the desired name for the created DynamoGraphDeployment.
If not specified, defaults to the DGDR name.
Optional: {}
namespace stringNamespace is the desired namespace for the created DynamoGraphDeployment.
If not specified, defaults to the DGDR namespace.
Optional: {}
labels object (keys:string, values:string)Labels are additional labels to add to the DynamoGraphDeployment metadata.
These are merged with auto-generated labels from the profiling process.
Optional: {}
annotations object (keys:string, values:string)Annotations are additional annotations to add to the DynamoGraphDeployment metadata.Optional: {}
workersImage stringWorkersImage specifies the container image to use for DynamoGraphDeployment worker components.
This image is used for both temporary DGDs created during online profiling and the final DGD.
If omitted, the image from the base config file (e.g., disagg.yaml) is used.
Example: “nvcr.io/nvidia/ai-dynamo/vllm-runtime:0.6.1”
Optional: {}

DeploymentStatus

DeploymentStatus tracks the state of an auto-created DynamoGraphDeployment. This status is populated when autoApply is enabled and a DGD is created.

Appears in:

FieldDescriptionDefaultValidation
name stringName is the name of the created DynamoGraphDeployment.
namespace stringNamespace is the namespace of the created DynamoGraphDeployment.
state DGDStateState is the current state of the DynamoGraphDeployment.
This value is mirrored from the DGD’s status.state field.
initializingEnum: [initializing pending successful failed]
created booleanCreated indicates whether the DGD has been successfully created.
Used to prevent recreation if the DGD is manually deleted by users.

DynamoCheckpoint

DynamoCheckpoint is the Schema for the dynamocheckpoints API It represents a container checkpoint that can be used to restore pods to a warm state

FieldDescriptionDefaultValidation
apiVersion stringnvidia.com/v1alpha1
kind stringDynamoCheckpoint
metadata ObjectMetaRefer to Kubernetes API documentation for fields of metadata.
spec DynamoCheckpointSpec
status DynamoCheckpointStatus

DynamoCheckpointIdentity

DynamoCheckpointIdentity defines the inputs that determine checkpoint equivalence Two checkpoints with the same identity hash are considered equivalent

Appears in:

FieldDescriptionDefaultValidation
model stringModel is the model identifier (e.g., “meta-llama/Llama-3-70B”)Required: {}
backendFramework stringBackendFramework is the runtime framework (vllm, sglang, trtllm)Enum: [vllm sglang trtllm]
Required: {}
dynamoVersion stringDynamoVersion is the Dynamo platform version (optional)
If not specified, version is not included in identity hash
This ensures checkpoint compatibility across Dynamo releases
Optional: {}
tensorParallelSize integerTensorParallelSize is the tensor parallel configuration1Minimum: 1
Optional: {}
pipelineParallelSize integerPipelineParallelSize is the pipeline parallel configuration1Minimum: 1
Optional: {}
dtype stringDtype is the data type (fp16, bf16, fp8, etc.)Optional: {}
maxModelLen integerMaxModelLen is the maximum sequence lengthMinimum: 1
Optional: {}
extraParameters object (keys:string, values:string)ExtraParameters are additional parameters that affect the checkpoint hash
Use for any framework-specific or custom parameters not covered above
Optional: {}

DynamoCheckpointJobConfig

DynamoCheckpointJobConfig defines the configuration for the checkpoint creation Job

Appears in:

FieldDescriptionDefaultValidation
podTemplateSpec PodTemplateSpecPodTemplateSpec allows customizing the checkpoint Job pod
This should include the container that runs the workload to be checkpointed
Required: {}
activeDeadlineSeconds integerActiveDeadlineSeconds specifies the maximum time the Job can run3600Optional: {}
backoffLimit integerBackoffLimit specifies the number of retries before marking the Job failed3Optional: {}
ttlSecondsAfterFinished integerTTLSecondsAfterFinished specifies how long to keep the Job after completion300Optional: {}

DynamoCheckpointPhase

Underlying type: string

DynamoCheckpointPhase represents the current phase of the checkpoint lifecycle

Validation:

  • Enum: [Pending Creating Ready Failed]

Appears in:

FieldDescription
PendingDynamoCheckpointPhasePending indicates the checkpoint CR has been created but the Job has not started
CreatingDynamoCheckpointPhaseCreating indicates the checkpoint Job is running
ReadyDynamoCheckpointPhaseReady indicates the checkpoint tar file is available on the PVC
FailedDynamoCheckpointPhaseFailed indicates the checkpoint creation failed

DynamoCheckpointSpec

DynamoCheckpointSpec defines the desired state of DynamoCheckpoint

Appears in:

FieldDescriptionDefaultValidation
identity DynamoCheckpointIdentityIdentity defines the inputs that determine checkpoint equivalenceRequired: {}
job DynamoCheckpointJobConfigJob defines the configuration for the checkpoint creation JobRequired: {}

DynamoCheckpointStatus

DynamoCheckpointStatus defines the observed state of DynamoCheckpoint

Appears in:

FieldDescriptionDefaultValidation
phase DynamoCheckpointPhasePhase represents the current phase of the checkpoint lifecycleEnum: [Pending Creating Ready Failed]
Optional: {}
identityHash stringIdentityHash is the computed hash of the checkpoint identity
This hash is used to identify equivalent checkpoints
Optional: {}
location stringLocation is the full URI/path to the checkpoint in the storage backend
For PVC: same as TarPath (e.g., /checkpoints/{hash}.tar)
For S3: s3://bucket/prefix/{hash}.tar
For OCI: oci://registry/repo:{hash}
Optional: {}
storageType DynamoCheckpointStorageTypeStorageType indicates the storage backend type used for this checkpointEnum: [pvc s3 oci]
Optional: {}
jobName stringJobName is the name of the checkpoint creation JobOptional: {}
createdAt TimeCreatedAt is the timestamp when the checkpoint tar was createdOptional: {}
message stringMessage provides additional information about the current stateOptional: {}
conditions Condition arrayConditions represent the latest available observations of the checkpoint’s stateOptional: {}

DynamoCheckpointStorageType

Underlying type: string

DynamoCheckpointStorageType defines the supported storage backends for checkpoints

Validation:

  • Enum: [pvc s3 oci]

Appears in:

DynamoComponentDeployment

DynamoComponentDeployment is the Schema for the dynamocomponentdeployments API

FieldDescriptionDefaultValidation
apiVersion stringnvidia.com/v1alpha1
kind stringDynamoComponentDeployment
metadata ObjectMetaRefer to Kubernetes API documentation for fields of metadata.
spec DynamoComponentDeploymentSpecSpec defines the desired state for this Dynamo component deployment.

DynamoComponentDeploymentSharedSpec

Appears in:

FieldDescriptionDefaultValidation
annotations object (keys:string, values:string)Annotations to add to generated Kubernetes resources for this component
(such as Pod, Service, and Ingress when applicable).
labels object (keys:string, values:string)Labels to add to generated Kubernetes resources for this component.
serviceName stringThe name of the component
componentType stringComponentType indicates the role of this component (for example, “main”).
subComponentType stringSubComponentType indicates the sub-role of this component (for example, “prefill”).
dynamoNamespace stringDynamoNamespace is deprecated and will be removed in a future version.
The DGD Kubernetes namespace and DynamoGraphDeployment name are used to construct the Dynamo namespace for each component
Optional: {}
globalDynamoNamespace booleanGlobalDynamoNamespace indicates that the Component will be placed in the global Dynamo namespace
resources ResourcesResources requested and limits for this component, including CPU, memory,
GPUs/devices, and any runtime-specific resources.
autoscaling AutoscalingDeprecated: This field is deprecated and ignored. Use DynamoGraphDeploymentScalingAdapter
with HPA, KEDA, or Planner for autoscaling instead. See docs/pages/kubernetes/autoscaling.md
for migration guidance. This field will be removed in a future API version.
envs EnvVar arrayEnvs defines additional environment variables to inject into the component containers.
envFromSecret stringEnvFromSecret references a Secret whose key/value pairs will be exposed as
environment variables in the component containers.
volumeMounts VolumeMount arrayVolumeMounts references PVCs defined at the top level for volumes to be mounted by the component.
ingress IngressSpecIngress config to expose the component outside the cluster (or through a service mesh).
modelRef ModelReferenceModelRef references a model that this component serves
When specified, a headless service will be created for endpoint discovery
Optional: {}
sharedMemory SharedMemorySpecSharedMemory controls the tmpfs mounted at /dev/shm (enable/disable and size).
extraPodMetadata ExtraPodMetadataExtraPodMetadata adds labels/annotations to the created Pods.Optional: {}
extraPodSpec ExtraPodSpecExtraPodSpec allows to override the main pod spec configuration.
It is a k8s standard PodSpec. It also contains a MainContainer (standard k8s Container) field
that allows overriding the main container configuration.
Optional: {}
livenessProbe ProbeLivenessProbe to detect and restart unhealthy containers.
readinessProbe ProbeReadinessProbe to signal when the container is ready to receive traffic.
replicas integerReplicas is the desired number of Pods for this component.
When scalingAdapter is enabled, this field is managed by the
DynamoGraphDeploymentScalingAdapter and should not be modified directly.
Minimum: 0
multinode MultinodeSpecMultinode is the configuration for multinode components.
scalingAdapter ScalingAdapterScalingAdapter configures whether this service uses the DynamoGraphDeploymentScalingAdapter.
When enabled, replicas are managed via DGDSA and external autoscalers can scale
the service using the Scale subresource. When disabled, replicas can be modified directly.
Optional: {}
eppConfig EPPConfigEPPConfig defines EPP-specific configuration options for Endpoint Picker Plugin components.
Only applicable when ComponentType is “epp”.
Optional: {}
checkpoint ServiceCheckpointConfigCheckpoint configures container checkpointing for this service.
When enabled, pods can be restored from a checkpoint files for faster cold start.
Optional: {}

DynamoComponentDeploymentSpec

DynamoComponentDeploymentSpec defines the desired state of DynamoComponentDeployment

Appears in:

FieldDescriptionDefaultValidation
backendFramework stringBackendFramework specifies the backend framework (e.g., “sglang”, “vllm”, “trtllm”)Enum: [sglang vllm trtllm]
annotations object (keys:string, values:string)Annotations to add to generated Kubernetes resources for this component
(such as Pod, Service, and Ingress when applicable).
labels object (keys:string, values:string)Labels to add to generated Kubernetes resources for this component.
serviceName stringThe name of the component
componentType stringComponentType indicates the role of this component (for example, “main”).
subComponentType stringSubComponentType indicates the sub-role of this component (for example, “prefill”).
dynamoNamespace stringDynamoNamespace is deprecated and will be removed in a future version.
The DGD Kubernetes namespace and DynamoGraphDeployment name are used to construct the Dynamo namespace for each component
Optional: {}
globalDynamoNamespace booleanGlobalDynamoNamespace indicates that the Component will be placed in the global Dynamo namespace
resources ResourcesResources requested and limits for this component, including CPU, memory,
GPUs/devices, and any runtime-specific resources.
autoscaling AutoscalingDeprecated: This field is deprecated and ignored. Use DynamoGraphDeploymentScalingAdapter
with HPA, KEDA, or Planner for autoscaling instead. See docs/pages/kubernetes/autoscaling.md
for migration guidance. This field will be removed in a future API version.
envs EnvVar arrayEnvs defines additional environment variables to inject into the component containers.
envFromSecret stringEnvFromSecret references a Secret whose key/value pairs will be exposed as
environment variables in the component containers.
volumeMounts VolumeMount arrayVolumeMounts references PVCs defined at the top level for volumes to be mounted by the component.
ingress IngressSpecIngress config to expose the component outside the cluster (or through a service mesh).
modelRef ModelReferenceModelRef references a model that this component serves
When specified, a headless service will be created for endpoint discovery
Optional: {}
sharedMemory SharedMemorySpecSharedMemory controls the tmpfs mounted at /dev/shm (enable/disable and size).
extraPodMetadata ExtraPodMetadataExtraPodMetadata adds labels/annotations to the created Pods.Optional: {}
extraPodSpec ExtraPodSpecExtraPodSpec allows to override the main pod spec configuration.
It is a k8s standard PodSpec. It also contains a MainContainer (standard k8s Container) field
that allows overriding the main container configuration.
Optional: {}
livenessProbe ProbeLivenessProbe to detect and restart unhealthy containers.
readinessProbe ProbeReadinessProbe to signal when the container is ready to receive traffic.
replicas integerReplicas is the desired number of Pods for this component.
When scalingAdapter is enabled, this field is managed by the
DynamoGraphDeploymentScalingAdapter and should not be modified directly.
Minimum: 0
multinode MultinodeSpecMultinode is the configuration for multinode components.
scalingAdapter ScalingAdapterScalingAdapter configures whether this service uses the DynamoGraphDeploymentScalingAdapter.
When enabled, replicas are managed via DGDSA and external autoscalers can scale
the service using the Scale subresource. When disabled, replicas can be modified directly.
Optional: {}
eppConfig EPPConfigEPPConfig defines EPP-specific configuration options for Endpoint Picker Plugin components.
Only applicable when ComponentType is “epp”.
Optional: {}
checkpoint ServiceCheckpointConfigCheckpoint configures container checkpointing for this service.
When enabled, pods can be restored from a checkpoint files for faster cold start.
Optional: {}

DynamoGraphDeployment

DynamoGraphDeployment is the Schema for the dynamographdeployments API.

FieldDescriptionDefaultValidation
apiVersion stringnvidia.com/v1alpha1
kind stringDynamoGraphDeployment
metadata ObjectMetaRefer to Kubernetes API documentation for fields of metadata.
spec DynamoGraphDeploymentSpecSpec defines the desired state for this graph deployment.
status DynamoGraphDeploymentStatusStatus reflects the current observed state of this graph deployment.

DynamoGraphDeploymentRequest

DynamoGraphDeploymentRequest is the Schema for the dynamographdeploymentrequests API. It serves as the primary interface for users to request model deployments with specific performance and resource constraints, enabling SLA-driven deployments.

Lifecycle:

  1. Initializing → Pending: Validates spec and prepares for profiling
  2. Pending → Profiling: Creates and runs profiling job (online or AIC)
  3. Profiling → Ready/Deploying: Generates DGD spec after profiling completes
  4. Deploying → Ready: When autoApply=true, monitors DGD until Ready
  5. Ready: Terminal state when DGD is operational or spec is available
  6. DeploymentDeleted: Terminal state when auto-created DGD is manually deleted

The spec becomes immutable once profiling starts. Users must delete and recreate the DGDR to modify configuration after this point.

DEPRECATION NOTICE: v1alpha1 DynamoGraphDeploymentRequest is deprecated. Please migrate to nvidia.com/v1beta1 DynamoGraphDeploymentRequest. v1alpha1 will be removed in a future release.

FieldDescriptionDefaultValidation
apiVersion stringnvidia.com/v1alpha1
kind stringDynamoGraphDeploymentRequest
metadata ObjectMetaRefer to Kubernetes API documentation for fields of metadata.
spec DynamoGraphDeploymentRequestSpecSpec defines the desired state for this deployment request.
status DynamoGraphDeploymentRequestStatusStatus reflects the current observed state of this deployment request.

DynamoGraphDeploymentRequestSpec

DynamoGraphDeploymentRequestSpec defines the desired state of a DynamoGraphDeploymentRequest. This CRD serves as the primary interface for users to request model deployments with specific performance constraints and resource requirements, enabling SLA-driven deployments.

Appears in:

FieldDescriptionDefaultValidation
model stringModel specifies the model to deploy (e.g., “Qwen/Qwen3-0.6B”, “meta-llama/Llama-3-70b”).
This is a high-level identifier for easy reference in kubectl output and logs.
The controller automatically sets this value in profilingConfig.config.deployment.model.
Required: {}
backend stringBackend specifies the inference backend for profiling.
The controller automatically sets this value in profilingConfig.config.engine.backend.
Profiling runs on real GPUs or via AIC simulation to collect performance data.
Enum: [vllm sglang trtllm]
Required: {}
useMocker booleanUseMocker indicates whether to deploy a mocker DynamoGraphDeployment instead of
a real backend deployment. When true, the deployment uses simulated engines that
don’t require GPUs, using the profiling data to simulate realistic timing behavior.
Mocker is available in all backend images and useful for large-scale experiments.
Profiling still runs against the real backend (specified above) to collect performance data.
false
profilingConfig ProfilingConfigSpecProfilingConfig provides the complete configuration for the profiling job.
Note: GPU discovery is automatically attempted to detect GPU resources from Kubernetes
cluster nodes. If the operator has node read permissions (cluster-wide or explicitly granted),
discovered GPU configuration is used as defaults when hardware configuration is not manually
specified (minNumGpusPerEngine, maxNumGpusPerEngine, numGpusPerNode). User-specified values
always take precedence over auto-discovered values. If GPU discovery fails (e.g.,
namespace-restricted operator without node permissions), manual hardware config is required.
This configuration is passed directly to the profiler.
The structure matches the profile_sla config format exactly (see ProfilingConfigSpec for schema).
Note: deployment.model and engine.backend are automatically set from the high-level
modelName and backend fields and should not be specified in this config.
Required: {}
enableGpuDiscovery booleanEnableGPUDiscovery controls whether the operator attempts to discover GPU hardware from cluster nodes.
DEPRECATED: This field is deprecated and will be removed in v1beta1. GPU discovery is now always
attempted automatically. Setting this field has no effect - the operator will always try to discover
GPU hardware when node read permissions are available. If discovery is unavailable (e.g., namespace-scoped
operator without permissions), manual hardware configuration is required regardless of this setting.
trueOptional: {}
autoApply booleanAutoApply indicates whether to automatically create a DynamoGraphDeployment
after profiling completes. If false, only the spec is generated and stored in status.
Users can then manually create a DGD using the generated spec.
false
deploymentOverrides DeploymentOverridesSpecDeploymentOverrides allows customizing metadata for the auto-created DGD.
Only applicable when AutoApply is true.
Optional: {}

DynamoGraphDeploymentRequestStatus

DynamoGraphDeploymentRequestStatus represents the observed state of a DynamoGraphDeploymentRequest. The controller updates this status as the DGDR progresses through its lifecycle.

Appears in:

FieldDescriptionDefaultValidation
state DGDRStateState is a high-level textual status of the deployment request lifecycle.InitializingEnum: [Initializing Pending Profiling Deploying Ready DeploymentDeleted Failed]
backend stringBackend is extracted from profilingConfig.config.engine.backend for display purposes.
This field is populated by the controller and shown in kubectl output.
Optional: {}
observedGeneration integerObservedGeneration reflects the generation of the most recently observed spec.
Used to detect spec changes and enforce immutability after profiling starts.
conditions Condition arrayConditions contains the latest observed conditions of the deployment request.
Standard condition types include: Validation, Profiling, SpecGenerated, DeploymentReady.
Conditions are merged by type on patch updates.
profilingResults stringProfilingResults contains a reference to the ConfigMap holding profiling data.
Format: “configmap/<name>“
Optional: {}
generatedDeployment RawExtensionGeneratedDeployment contains the full generated DynamoGraphDeployment specification
including metadata, based on profiling results. Users can extract this to create
a DGD manually, or it’s used automatically when autoApply is true.
Stored as RawExtension to preserve all fields including metadata.
For mocker backends, this contains the mocker DGD spec.
EmbeddedResource: {}
Optional: {}
deployment DeploymentStatusDeployment tracks the auto-created DGD when AutoApply is true.
Contains name, namespace, state, and creation status of the managed DGD.
Optional: {}

DynamoGraphDeploymentScalingAdapter

DynamoGraphDeploymentScalingAdapter provides a scaling interface for individual services within a DynamoGraphDeployment. It implements the Kubernetes scale subresource, enabling integration with HPA, KEDA, and custom autoscalers.

The adapter acts as an intermediary between autoscalers and the DGD, ensuring that only the adapter controller modifies the DGD’s service replicas. This prevents conflicts when multiple autoscaling mechanisms are in play.

FieldDescriptionDefaultValidation
apiVersion stringnvidia.com/v1alpha1
kind stringDynamoGraphDeploymentScalingAdapter
metadata ObjectMetaRefer to Kubernetes API documentation for fields of metadata.
spec DynamoGraphDeploymentScalingAdapterSpec
status DynamoGraphDeploymentScalingAdapterStatus

DynamoGraphDeploymentScalingAdapterSpec

DynamoGraphDeploymentScalingAdapterSpec defines the desired state of DynamoGraphDeploymentScalingAdapter

Appears in:

FieldDescriptionDefaultValidation
replicas integerReplicas is the desired number of replicas for the target service.
This field is modified by external autoscalers (HPA/KEDA/Planner) or manually by users.
Minimum: 0
Required: {}
dgdRef DynamoGraphDeploymentServiceRefDGDRef references the DynamoGraphDeployment and the specific service to scale.Required: {}

DynamoGraphDeploymentScalingAdapterStatus

DynamoGraphDeploymentScalingAdapterStatus defines the observed state of DynamoGraphDeploymentScalingAdapter

Appears in:

FieldDescriptionDefaultValidation
replicas integerReplicas is the current number of replicas for the target service.
This is synced from the DGD’s service replicas and is required for the scale subresource.
Optional: {}
selector stringSelector is a label selector string for the pods managed by this adapter.
Required for HPA compatibility via the scale subresource.
Optional: {}
lastScaleTime TimeLastScaleTime is the last time the adapter scaled the target service.Optional: {}

DynamoGraphDeploymentServiceRef

DynamoGraphDeploymentServiceRef identifies a specific service within a DynamoGraphDeployment

Appears in:

FieldDescriptionDefaultValidation
name stringName of the DynamoGraphDeploymentMinLength: 1
Required: {}
serviceName stringServiceName is the key name of the service within the DGD’s spec.services map to scaleMinLength: 1
Required: {}

DynamoGraphDeploymentSpec

DynamoGraphDeploymentSpec defines the desired state of DynamoGraphDeployment.

Appears in:

FieldDescriptionDefaultValidation
pvcs PVC arrayPVCs defines a list of persistent volume claims that can be referenced by components.
Each PVC must have a unique name that can be referenced in component specifications.
MaxItems: 100
Optional: {}
services object (keys:string, values:DynamoComponentDeploymentSharedSpec)Services are the services to deploy as part of this deployment.MaxProperties: 25
Optional: {}
envs EnvVar arrayEnvs are environment variables applied to all services in the deployment unless
overridden by service-specific configuration.
Optional: {}
backendFramework stringBackendFramework specifies the backend framework (e.g., “sglang”, “vllm”, “trtllm”).Enum: [sglang vllm trtllm]
restart RestartRestart specifies the restart policy for the graph deployment.Optional: {}

DynamoGraphDeploymentStatus

DynamoGraphDeploymentStatus defines the observed state of DynamoGraphDeployment.

Appears in:

FieldDescriptionDefaultValidation
observedGeneration integerObservedGeneration is the most recent generation observed by the controller.Optional: {}
state DGDStateState is a high-level textual status of the graph deployment lifecycle.initializingEnum: [initializing pending successful failed]
conditions Condition arrayConditions contains the latest observed conditions of the graph deployment.
The slice is merged by type on patch updates.
services object (keys:string, values:ServiceReplicaStatus)Services contains per-service replica status information.
The map key is the service name from spec.services.
Optional: {}
restart RestartStatusRestart contains the status of the restart of the graph deployment.Optional: {}
checkpoints object (keys:string, values:ServiceCheckpointStatus)Checkpoints contains per-service checkpoint status information.
The map key is the service name from spec.services.
Optional: {}
rollingUpdate RollingUpdateStatusRollingUpdate tracks the progress of operator manged rolling updates.
Currently only supported for singl-node, non-Grove deployments (DCD/Deployment).
Optional: {}

DynamoModel

DynamoModel is the Schema for the dynamo models API

FieldDescriptionDefaultValidation
apiVersion stringnvidia.com/v1alpha1
kind stringDynamoModel
metadata ObjectMetaRefer to Kubernetes API documentation for fields of metadata.
spec DynamoModelSpec
status DynamoModelStatus

DynamoModelSpec

DynamoModelSpec defines the desired state of DynamoModel

Appears in:

FieldDescriptionDefaultValidation
modelName stringModelName is the full model identifier (e.g., “meta-llama/Llama-3.3-70B-Instruct-lora”)Required: {}
baseModelName stringBaseModelName is the base model identifier that matches the service label
This is used to discover endpoints via headless services
Required: {}
modelType stringModelType specifies the type of model (e.g., “base”, “lora”, “adapter”)baseEnum: [base lora adapter]
Optional: {}
source ModelSourceSource specifies the model source location (only applicable for lora model type)Optional: {}

DynamoModelStatus

DynamoModelStatus defines the observed state of DynamoModel

Appears in:

FieldDescriptionDefaultValidation
endpoints EndpointInfo arrayEndpoints is the current list of all endpoints for this modelOptional: {}
readyEndpoints integerReadyEndpoints is the count of endpoints that are ready
totalEndpoints integerTotalEndpoints is the total count of endpoints
conditions Condition arrayConditions represents the latest available observations of the model’s stateOptional: {}

EPPConfig

EPPConfig contains configuration for EPP (Endpoint Picker Plugin) components. EPP is responsible for intelligent endpoint selection and KV-aware routing.

Appears in:

FieldDescriptionDefaultValidation
configMapRef ConfigMapKeySelectorConfigMapRef references a user-provided ConfigMap containing EPP configuration.
The ConfigMap should contain EndpointPickerConfig YAML.
Mutually exclusive with Config.
Optional: {}
config EndpointPickerConfigConfig allows specifying EPP EndpointPickerConfig directly as a structured object.
The operator will marshal this to YAML and create a ConfigMap automatically.
Mutually exclusive with ConfigMapRef.
One of ConfigMapRef or Config must be specified (no default configuration).
Uses the upstream type from github.com/kubernetes-sigs/gateway-api-inference-extension
Type: object
Optional: {}

EndpointInfo

EndpointInfo represents a single endpoint (pod) serving the model

Appears in:

FieldDescriptionDefaultValidation
address stringAddress is the full address of the endpoint (e.g., “http://10.0.1.5:9090”)
podName stringPodName is the name of the pod serving this endpointOptional: {}
ready booleanReady indicates whether the endpoint is ready to serve traffic
For LoRA models: true if the POST /loras request succeeded with a 2xx status code
For base models: always false (no probing performed)

ExtraPodMetadata

Appears in:

FieldDescriptionDefaultValidation
annotations object (keys:string, values:string)
labels object (keys:string, values:string)

ExtraPodSpec

Appears in:

FieldDescriptionDefaultValidation
mainContainer Container

IngressSpec

Appears in:

FieldDescriptionDefaultValidation
enabled booleanEnabled exposes the component through an ingress or virtual service when true.
host stringHost is the base host name to route external traffic to this component.
useVirtualService booleanUseVirtualService indicates whether to configure a service-mesh VirtualService instead of a standard Ingress.
virtualServiceGateway stringVirtualServiceGateway optionally specifies the gateway name to attach the VirtualService to.
hostPrefix stringHostPrefix is an optional prefix added before the host.
annotations object (keys:string, values:string)Annotations to set on the generated Ingress/VirtualService resources.
labels object (keys:string, values:string)Labels to set on the generated Ingress/VirtualService resources.
tls IngressTLSSpecTLS holds the TLS configuration used by the Ingress/VirtualService.
hostSuffix stringHostSuffix is an optional suffix appended after the host.
ingressControllerClassName stringIngressControllerClassName selects the ingress controller class (e.g., “nginx”).

IngressTLSSpec

Appears in:

FieldDescriptionDefaultValidation
secretName stringSecretName is the name of a Kubernetes Secret containing the TLS certificate and key.

ModelReference

ModelReference identifies a model served by this component

Appears in:

FieldDescriptionDefaultValidation
name stringName is the base model identifier (e.g., “llama-3-70b-instruct-v1”)Required: {}
revision stringRevision is the model revision/version (optional)Optional: {}

ModelSource

ModelSource defines the source location of a model

Appears in:

FieldDescriptionDefaultValidation
uri stringURI is the model source URI
Supported formats:
- S3: s3://bucket/path/to/model
- HuggingFace: hf://org/model@revision_sha
Required: {}

MultinodeSpec

Appears in:

FieldDescriptionDefaultValidation
nodeCount integerIndicates the number of nodes to deploy for multinode components.
Total number of GPUs is NumberOfNodes * GPU limit.
Must be greater than 1.
2Minimum: 2

PVC

Appears in:

FieldDescriptionDefaultValidation
create booleanCreate indicates to create a new PVC
name stringName is the name of the PVCRequired: {}
storageClass stringStorageClass to be used for PVC creation. Required when create is true.
size QuantitySize of the volume in Gi, used during PVC creation. Required when create is true.
volumeAccessMode PersistentVolumeAccessModeVolumeAccessMode is the volume access mode of the PVC. Required when create is true.

ProfilingConfigSpec

ProfilingConfigSpec defines configuration for the profiling process. This structure maps directly to the profile_sla.py config format. See dynamo/profiler/utils/profiler_argparse.py for the complete schema.

Appears in:

FieldDescriptionDefaultValidation
config JSONConfig is the profiling configuration as arbitrary JSON/YAML. This will be passed directly to the profiler.
The profiler will validate the configuration and report any errors.
Optional: {}
Type: object
configMapRef ConfigMapKeySelectorConfigMapRef is an optional reference to a ConfigMap containing the DynamoGraphDeployment
base config file (disagg.yaml). This is separate from the profiling config above.
The path to this config will be set as engine.config in the profiling config.
Optional: {}
profilerImage stringProfilerImage specifies the container image to use for profiling jobs.
This image contains the profiler code and dependencies needed for SLA-based profiling.
Example: “nvcr.io/nvidia/ai-dynamo/vllm-runtime:0.6.1”
Required: {}
outputPVC stringOutputPVC is an optional PersistentVolumeClaim name for storing profiling output.
If specified, all profiling artifacts (logs, plots, configs, raw data) will be written
to this PVC instead of an ephemeral emptyDir volume. This allows users to access
complete profiling results after the job completes by mounting the PVC.
The PVC must exist in the same namespace as the DGDR.
If not specified, profiling uses emptyDir and only essential data is saved to ConfigMaps.
Note: ConfigMaps are still created regardless of this setting for planner integration.
Optional: {}
resources ResourceRequirementsResources specifies the compute resource requirements for the profiling job container.
If not specified, no resource requests or limits are set.
Optional: {}
tolerations Toleration arrayTolerations allows the profiling job to be scheduled on nodes with matching taints.
For example, to schedule on GPU nodes, add a toleration for the nvidia.com/gpu taint.
Optional: {}
nodeSelector object (keys:string, values:string)NodeSelector is a selector which must match a node’s labels for the profiling pod to be scheduled on that node.
For example, to schedule on ARM64 nodes, use {“kubernetes.io/arch”: “arm64”}.
Optional: {}

ResourceItem

Appears in:

FieldDescriptionDefaultValidation
cpu stringCPU specifies the CPU resource request/limit (e.g., “1000m”, “2”)
memory stringMemory specifies the memory resource request/limit (e.g., “4Gi”, “8Gi”)
gpu stringGPU indicates the number of GPUs to request.
Total number of GPUs is NumberOfNodes * GPU in case of multinode deployment.
gpuType stringGPUType can specify a custom GPU type, e.g. “gpu.intel.com/xe”
By default if not specified, the GPU type is “nvidia.com/gpu”
custom object (keys:string, values:string)Custom specifies additional custom resource requests/limits

Resources

Resources defines requested and limits for a component, including CPU, memory, GPUs/devices, and any runtime-specific resources.

Appears in:

FieldDescriptionDefaultValidation
requests ResourceItemRequests specifies the minimum resources required by the component
limits ResourceItemLimits specifies the maximum resources allowed for the component
claims ResourceClaim arrayClaims specifies resource claims for dynamic resource allocation

Restart

Appears in:

FieldDescriptionDefaultValidation
id stringID is an arbitrary string that triggers a restart when changed.
Any modification to this value will initiate a restart of the graph deployment according to the strategy.
MinLength: 1
Required: {}
strategy RestartStrategyStrategy specifies the restart strategy for the graph deployment.Optional: {}

RestartPhase

Underlying type: string

Appears in:

FieldDescription
Pending
Restarting
Completed
Failed
Superseded

RestartStatus

RestartStatus contains the status of the restart of the graph deployment.

Appears in:

FieldDescriptionDefaultValidation
observedID stringObservedID is the restart ID that has been observed and is being processed.
Matches the Restart.ID field in the spec.
phase RestartPhasePhase is the phase of the restart.
inProgress string arrayInProgress contains the names of the services that are currently being restarted.Optional: {}

RestartStrategy

Appears in:

FieldDescriptionDefaultValidation
type RestartStrategyTypeType specifies the restart strategy type.SequentialEnum: [Sequential Parallel]
order string arrayOrder specifies the order in which the services should be restarted.Optional: {}

RestartStrategyType

Underlying type: string

Appears in:

FieldDescription
Sequential
Parallel

RollingUpdatePhase

Underlying type: string

RollingUpdatePhase represents the current phase of a rolling update.

Validation:

  • Enum: [Pending InProgress Completed Failed ]

Appears in:

FieldDescription
Pending
InProgress
Completed

RollingUpdateStatus

RollingUpdateStatus tracks the progress of a rolling update.

Appears in:

FieldDescriptionDefaultValidation
phase RollingUpdatePhasePhase indicates the current phase of the rolling update.Enum: [Pending InProgress Completed Failed ]
Optional: {}
startTime TimeStartTime is when the rolling update began.Optional: {}
endTime TimeEndTime is when the rolling update completed (successfully or failed).Optional: {}
updatedServices string arrayUpdatedServices is the list of services that have completed the rolling update.
A service is considered updated when its new replicas are all ready and old replicas are fully scaled down.
Only services of componentType Worker (or Prefill/Decode) are considered.
Optional: {}

ScalingAdapter

ScalingAdapter configures whether a service uses the DynamoGraphDeploymentScalingAdapter for replica management. When enabled, the DGDSA owns the replicas field and external autoscalers (HPA, KEDA, Planner) can control scaling via the Scale subresource.

Appears in:

FieldDescriptionDefaultValidation
enabled booleanEnabled indicates whether the ScalingAdapter should be enabled for this service.
When true, a DGDSA is created and owns the replicas field.
When false (default), no DGDSA is created and replicas can be modified directly in the DGD.
falseOptional: {}

ServiceCheckpointConfig

ServiceCheckpointConfig configures checkpointing for a DGD service

Appears in:

FieldDescriptionDefaultValidation
enabled booleanEnabled indicates whether checkpointing is enabled for this servicefalseOptional: {}
mode CheckpointModeMode defines how checkpoint creation is handled
- Auto: DGD controller creates Checkpoint CR automatically
- Manual: User must create Checkpoint CR
AutoEnum: [Auto Manual]
Optional: {}
checkpointRef stringCheckpointRef references an existing Checkpoint CR to use
If specified, Identity is ignored and this checkpoint is used directly
Optional: {}
identity DynamoCheckpointIdentityIdentity defines the checkpoint identity for hash computation
Used when Mode is Auto or when looking up existing checkpoints
Required when checkpointRef is not specified
Optional: {}

ServiceCheckpointStatus

ServiceCheckpointStatus contains checkpoint information for a single service.

Appears in:

FieldDescriptionDefaultValidation
checkpointName stringCheckpointName is the name of the associated Checkpoint CROptional: {}
identityHash stringIdentityHash is the computed hash of the checkpoint identityOptional: {}
ready booleanReady indicates if the checkpoint is ready for useOptional: {}

ServiceReplicaStatus

ServiceReplicaStatus contains replica information for a single service.

Appears in:

FieldDescriptionDefaultValidation
componentKind ComponentKindComponentKind is the underlying resource kind (e.g., “PodClique”, “PodCliqueScalingGroup”, “Deployment”, “LeaderWorkerSet”).Enum: [PodClique PodCliqueScalingGroup Deployment LeaderWorkerSet]
componentName stringComponentName is the name of the primary underlying resource.
DEPRECATED: Use ComponentNames instead. This field will be removed in a future release.
During rolling updates, this reflects the new (target) component name.
componentNames string arrayComponentNames is the list of underlying resource names for this service.
During normal operation, this contains a single name.
During rolling updates, this contains both old and new component names.
Optional: {}
replicas integerReplicas is the total number of non-terminated replicas.
Required for all component kinds.
Minimum: 0
updatedReplicas integerUpdatedReplicas is the number of replicas at the current/desired revision.
Required for all component kinds.
Minimum: 0
readyReplicas integerReadyReplicas is the number of ready replicas.
Populated for PodClique, Deployment, and LeaderWorkerSet.
Not available for PodCliqueScalingGroup.
When nil, the field is omitted from the API response.
Minimum: 0
Optional: {}
availableReplicas integerAvailableReplicas is the number of available replicas.
For Deployment: replicas ready for >= minReadySeconds.
For PodCliqueScalingGroup: replicas where all constituent PodCliques have >= MinAvailable ready pods.
Not available for PodClique or LeaderWorkerSet.
When nil, the field is omitted from the API response.
Minimum: 0
Optional: {}

SharedMemorySpec

Appears in:

FieldDescriptionDefaultValidation
disabled boolean
size Quantity

VolumeMount

VolumeMount references a PVC defined at the top level for volumes to be mounted by the component

Appears in:

FieldDescriptionDefaultValidation
name stringName references a PVC name defined in the top-level PVCs mapRequired: {}
mountPoint stringMountPoint specifies where to mount the volume.
If useAsCompilationCache is true and mountPoint is not specified,
a backend-specific default will be used.
useAsCompilationCache booleanUseAsCompilationCache indicates this volume should be used as a compilation cache.
When true, backend-specific environment variables will be set and default mount points may be used.
false

nvidia.com/v1beta1

Package v1beta1 contains API Schema definitions for the nvidia.com v1beta1 API group.

Resource Types

BackendType

Underlying type: string

BackendType specifies the inference backend.

Validation:

  • Enum: [auto sglang trtllm vllm]

Appears in:

FieldDescription
auto
sglang
trtllm
vllm

DGDRPhase

Underlying type: string

DGDRPhase represents the lifecycle phase of a DynamoGraphDeploymentRequest.

Validation:

  • Enum: [Pending Profiling Ready Deploying Deployed Failed]

Appears in:

FieldDescription
Pending
Profiling
Ready
Deploying
Deployed
Failed

DeploymentInfoStatus

DeploymentInfoStatus tracks the state of the deployed DynamoGraphDeployment.

Appears in:

FieldDescriptionDefaultValidation
replicas integerReplicas is the desired number of replicas.Optional: {}
availableReplicas integerAvailableReplicas is the number of replicas that are available and ready.Optional: {}

v1beta1 DynamoGraphDeploymentRequest

DynamoGraphDeploymentRequest is the Schema for the dynamographdeploymentrequests API. It provides a simplified, SLA-driven interface for deploying inference models on Dynamo. Users specify a model and optional performance targets; the controller handles profiling, configuration selection, and deployment.

Lifecycle:

  1. Pending: Spec validated, preparing for profiling
  2. Profiling: Profiling job is running to discover optimal configurations
  3. Ready: Profiling complete, generated DGD spec available in status
  4. Deploying: DGD is being created and rolled out (when autoApply=true)
  5. Deployed: DGD is running and healthy
  6. Failed: An unrecoverable error occurred
FieldDescriptionDefaultValidation
apiVersion stringnvidia.com/v1beta1
kind stringDynamoGraphDeploymentRequest
metadata ObjectMetaRefer to Kubernetes API documentation for fields of metadata.
spec DynamoGraphDeploymentRequestSpecSpec defines the desired state for this deployment request.
status DynamoGraphDeploymentRequestStatusStatus reflects the current observed state of this deployment request.

v1beta1 DynamoGraphDeploymentRequestSpec

DynamoGraphDeploymentRequestSpec defines the desired state of a DynamoGraphDeploymentRequest. Only the Model field is required; all other fields are optional and have sensible defaults.

Appears in:

FieldDescriptionDefaultValidation
model stringModel specifies the model to deploy (e.g., “Qwen/Qwen3-0.6B”, “meta-llama/Llama-3-70b”).
Can be a HuggingFace ID or a private model name.
MinLength: 1
Required: {}
backend BackendTypeBackend specifies the inference backend to use for profiling and deployment.autoEnum: [auto sglang trtllm vllm]
Optional: {}
image stringImage is the container image reference for the profiling job (frontend image).
Example: “nvcr.io/nvidia/dynamo-runtime:latest”
backend type automatically; backend images can be overridden via overrides.dgd.
Optional: {}
modelCache ModelCacheSpecModelCache provides optional PVC configuration for pre-downloaded model weights.
When provided, weights are loaded from the PVC instead of downloading from HuggingFace.
Optional: {}
hardware HardwareSpecHardware describes the hardware resources available for profiling and deployment.
Typically auto-filled by the operator from cluster discovery.
Optional: {}
workload WorkloadSpecWorkload defines the expected workload characteristics for SLA-based profiling.Optional: {}
sla SLASpecSLA defines service-level agreement targets that drive profiling optimization.Optional: {}
overrides OverridesSpecOverrides allows customizing the profiling job and the generated DynamoGraphDeployment.Optional: {}
features FeaturesSpecFeatures controls optional Dynamo platform features in the generated deployment.Optional: {}
searchStrategy SearchStrategySearchStrategy controls the profiling search depth.
”rapid” performs a fast sweep; “thorough” explores more configurations.
rapidEnum: [rapid thorough]
Optional: {}
autoApply booleanAutoApply indicates whether to automatically create a DynamoGraphDeployment
after profiling completes. If false, the generated spec is stored in status
for manual review and application.
trueOptional: {}

v1beta1 DynamoGraphDeploymentRequestStatus

DynamoGraphDeploymentRequestStatus represents the observed state of a DynamoGraphDeploymentRequest.

Appears in:

FieldDescriptionDefaultValidation
phase DGDRPhasePhase is the high-level lifecycle phase of the deployment request.Enum: [Pending Profiling Ready Deploying Deployed Failed]
Optional: {}
profilingPhase ProfilingPhaseProfilingPhase indicates the current sub-phase of the profiling pipeline.
Only meaningful when Phase is “Profiling”. Cleared when profiling completes or fails.
Enum: [Initializing SweepingPrefill SweepingDecode SelectingConfig BuildingCurves GeneratingDGD Done]
Optional: {}
dgdName stringDGDName is the name of the generated or created DynamoGraphDeployment.Optional: {}
profilingJobName stringProfilingJobName is the name of the Kubernetes Job running the profiler.Optional: {}
conditions Condition arrayConditions contains the latest observed conditions of the deployment request.
Standard condition types include: Validated, ProfilingComplete, DeploymentReady.
Optional: {}
profilingResults ProfilingResultsStatusProfilingResults contains the output of the profiling process including
Pareto-optimal configurations and the selected deployment configuration.
Optional: {}
deploymentInfo DeploymentInfoStatusDeploymentInfo tracks the state of the deployed DynamoGraphDeployment.
Populated when a DGD has been created (either via autoApply or manually).
Optional: {}
observedGeneration integerObservedGeneration is the most recent generation observed by the controller.Optional: {}

FeaturesSpec

FeaturesSpec controls optional Dynamo platform features in the generated deployment.

Appears in:

FieldDescriptionDefaultValidation
planner RawExtensionPlanner is the raw SLA planner configuration passed to the planner service.
Its schema is defined by dynamo.planner.utils.planner_config.PlannerConfig.
Go treats this as opaque bytes; the Planner service validates it at startup.
The presence of this field (non-null) enables the planner in the generated DGD.
Type: object
Optional: {}
mocker MockerSpecMocker configures the simulated (mocker) backend for testing without GPUs.Optional: {}

HardwareSpec

HardwareSpec describes the hardware resources available for profiling and deployment. These fields are typically auto-filled by the operator from cluster discovery.

Appears in:

FieldDescriptionDefaultValidation
gpuSku stringGPUSKU is the GPU SKU identifier (e.g., “H100_SXM”, “A100_80GB”).Optional: {}
vramMb floatVRAMMB is the VRAM per GPU in MiB.Optional: {}
totalGpus integerTotalGPUs is the total number of GPUs available in the cluster.Optional: {}
numGpusPerNode integerNumGPUsPerNode is the number of GPUs per node.Optional: {}

MockerSpec

MockerSpec configures the simulated (mocker) backend.

Appears in:

FieldDescriptionDefaultValidation
enabled booleanEnabled indicates whether to deploy mocker workers instead of real inference workers.
Useful for large-scale testing without GPUs.
Optional: {}

ModelCacheSpec

ModelCacheSpec references a PVC containing pre-downloaded model weights.

Appears in:

FieldDescriptionDefaultValidation
pvcName stringPVCName is the name of the PersistentVolumeClaim containing model weights.
The PVC must exist in the same namespace as the DGDR.
Optional: {}
pvcModelPath stringPVCModelPath is the path to the model checkpoint directory within the PVC
(e.g. “deepseek-r1” or “models/Llama-3.1-405B-FP8”).
Optional: {}
pvcMountPath stringPVCMountPath is the mount path for the PVC inside the container./opt/model-cacheOptional: {}

OptimizationType

Underlying type: string

OptimizationType specifies the profiling optimization strategy.

Validation:

  • Enum: [latency throughput]

Appears in:

FieldDescription
latency
throughput

OverridesSpec

OverridesSpec allows customizing the profiling job and the generated DynamoGraphDeployment.

Appears in:

FieldDescriptionDefaultValidation
profilingJob JobSpecProfilingJob allows overriding the profiling Job specification.
Fields set here are merged into the controller-generated Job spec.
Optional: {}
dgd RawExtensionDGD allows providing a full or partial nvidia.com/v1alpha1 DynamoGraphDeployment
to use as the base for the generated deployment. Fields from profiling results
are merged on top. Use this to override backend worker images.
The field is stored as a raw embedded resource rather than a typed
*v1alpha1.DynamoGraphDeployment to avoid a circular import: v1alpha1 already
imports v1beta1 as the conversion hub and Go does not allow import cycles.
The EmbeddedResource marker tells the API server to validate that the value is a
well-formed Kubernetes object (has apiVersion/kind), but does not enforce that it
is specifically a DynamoGraphDeployment. Full type validation (correct apiVersion,
kind, and field schema) is performed by the controller during reconciliation.
EmbeddedResource: {}
Optional: {}

ParetoConfig

ParetoConfig represents a single Pareto-optimal deployment configuration discovered during profiling.

Appears in:

FieldDescriptionDefaultValidation
config RawExtensionConfig is the full deployment configuration for this Pareto point.Type: object

ProfilingPhase

Underlying type: string

ProfilingPhase represents a sub-phase within the profiling pipeline. When the DGDR Phase is “Profiling”, this value indicates which step of the profiling pipeline is currently executing.

Validation:

  • Enum: [Initializing SweepingPrefill SweepingDecode SelectingConfig BuildingCurves GeneratingDGD Done]

Appears in:

FieldDescription
InitializingProfiler is loading the DGD template, detecting GPU hardware,
and resolving the model architecture from HuggingFace.
SweepingPrefillSweeping parallelization strategies (TP/TEP/DEP) across GPU counts
for prefill, measuring TTFT at each configuration.
SweepingDecodeSweeping parallelization strategies and concurrency levels
for decode, measuring ITL at each configuration.
SelectingConfigFiltering results against SLA targets and selecting the most
cost-efficient configuration that meets TTFT/ITL requirements.
BuildingCurvesBuilding detailed interpolation curves (ISL→TTFT for prefill,
KV-usage×context-length→ITL for decode) using the selected configs.
GeneratingDGDPackaging profiling data into a ConfigMap and generating
the final DGD YAML with planner integration.
DoneProfiling pipeline finished successfully.

ProfilingResultsStatus

ProfilingResultsStatus contains the output of the profiling process.

Appears in:

FieldDescriptionDefaultValidation
pareto ParetoConfig arrayPareto is the list of Pareto-optimal deployment configurations discovered during profiling.
Each entry represents a different cost/performance trade-off.
Optional: {}
selectedConfig RawExtensionSelectedConfig is the recommended configuration chosen by the profiler
based on the SLA targets. This is the configuration used for deployment
when autoApply is true.
Type: object
Optional: {}

SLASpec

SLASpec defines the service-level agreement targets for profiling optimization. Exactly one mode should be active: ttft+itl (default), e2eLatency, or optimizationType.

Appears in:

FieldDescriptionDefaultValidation
optimizationType OptimizationTypeOptimizationType controls the profiling optimization strategy.
Use when explicit SLA targets (ttft+itl or e2eLatency) are not known.
Enum: [latency throughput]
Optional: {}
ttft floatTTFT is the Time To First Token target in milliseconds.Optional: {}
itl floatITL is the Inter-Token Latency target in milliseconds.Optional: {}
e2eLatency floatE2ELatency is the target end-to-end request latency in milliseconds.
Alternative to specifying TTFT + ITL.
Optional: {}

SearchStrategy

Underlying type: string

SearchStrategy controls the profiling search depth.

Validation:

  • Enum: [rapid thorough]

Appears in:

FieldDescription
rapid
thorough

WorkloadSpec

WorkloadSpec defines the workload characteristics for SLA-based profiling.

Appears in:

FieldDescriptionDefaultValidation
isl integerISL is the Input Sequence Length (number of tokens).4000Optional: {}
osl integerOSL is the Output Sequence Length (number of tokens).1000Optional: {}
concurrency floatConcurrency is the target concurrency level.
Required (or RequestRate) when the planner is disabled.
Optional: {}
requestRate floatRequestRate is the target request rate (req/s).
Required (or Concurrency) when the planner is disabled.
Optional: {}

operator.config.dynamo.nvidia.com/v1alpha1

Resource Types

CheckpointConfiguration

CheckpointConfiguration holds checkpoint/restore settings.

Appears in:

FieldDescriptionDefaultValidation
enabled booleanEnabled indicates if checkpoint functionality is enabled
readyForCheckpointFilePath stringReadyForCheckpointFilePath signals model readiness for checkpoint jobs/tmp/ready-for-checkpoint
storage CheckpointStorageConfigurationStorage holds storage backend configuration

CheckpointOCIConfig

CheckpointOCIConfig holds OCI registry storage configuration.

Appears in:

FieldDescriptionDefaultValidation
uri stringURI is the OCI URI (oci://registry/repository)
credentialsSecretRef stringCredentialsSecretRef is the name of the docker config secret

CheckpointPVCConfig

CheckpointPVCConfig holds PVC storage configuration.

Appears in:

FieldDescriptionDefaultValidation
pvcName stringPVCName is the name of the PVCchrek-pvc
basePath stringBasePath is the base directory within the PVC/checkpoints

CheckpointS3Config

CheckpointS3Config holds S3 storage configuration.

Appears in:

FieldDescriptionDefaultValidation
uri stringURI is the S3 URI (s3://[endpoint/]bucket/prefix)
credentialsSecretRef stringCredentialsSecretRef is the name of the credentials secret

CheckpointStorageConfiguration

CheckpointStorageConfiguration holds storage backend configuration for checkpoints.

Appears in:

FieldDescriptionDefaultValidation
type stringType is the storage backend type: pvc, s3, or ocipvc
pvc CheckpointPVCConfigPVC configuration (used when Type=pvc)
s3 CheckpointS3ConfigS3 configuration (used when Type=s3)
oci CheckpointOCIConfigOCI configuration (used when Type=oci)

DiscoveryBackend

Underlying type: string

DiscoveryBackend is the type for the discovery backend.

Appears in:

FieldDescription
kubernetesDiscoveryBackendKubernetes is the Kubernetes discovery backend
etcdDiscoveryBackendEtcd is the etcd discovery backend

DiscoveryConfiguration

DiscoveryConfiguration holds discovery backend settings.

Appears in:

FieldDescriptionDefaultValidation
backend DiscoveryBackendBackend is the discovery backend: “kubernetes” or “etcd”kubernetes

GPUConfiguration

GPUConfiguration holds GPU discovery settings.

Appears in:

FieldDescriptionDefaultValidation
discoveryEnabled booleanDiscoveryEnabled indicates whether GPU discovery is enabledtrue

GroveConfiguration

GroveConfiguration holds Grove orchestrator settings.

Appears in:

FieldDescriptionDefaultValidation
enabled booleanEnabled overrides auto-detection. nil = auto-detect.
terminationDelay DurationTerminationDelay configures the termination delay for Grove PodCliqueSets15m

InfrastructureConfiguration

InfrastructureConfiguration holds service mesh and backend addresses.

Appears in:

FieldDescriptionDefaultValidation
natsAddress stringNATSAddress is the address of the NATS server
etcdAddress stringETCDAddress is the address of the etcd server
modelExpressURL stringModelExpressURL is the URL of the Model Express server to inject into all pods
prometheusEndpoint stringPrometheusEndpoint is the URL of the Prometheus endpoint to use for metrics

IngressConfiguration

IngressConfiguration holds ingress settings.

Appears in:

FieldDescriptionDefaultValidation
virtualServiceGateway stringVirtualServiceGateway is the name of the Istio virtual service gateway
controllerClassName stringControllerClassName is the ingress controller class name
controllerTLSSecretName stringControllerTLSSecretName is the TLS secret for the ingress controller
hostSuffix stringHostSuffix is the suffix for ingress hostnames

KaiSchedulerConfiguration

KaiSchedulerConfiguration holds Kai-scheduler settings.

Appears in:

FieldDescriptionDefaultValidation
enabled booleanEnabled overrides auto-detection. nil = auto-detect.

LWSConfiguration

LWSConfiguration holds LWS orchestrator settings.

Appears in:

FieldDescriptionDefaultValidation
enabled booleanEnabled overrides auto-detection. nil = auto-detect.

LeaderElectionConfiguration

LeaderElectionConfiguration holds leader election settings.

Appears in:

FieldDescriptionDefaultValidation
enabled booleanEnabled enables leader election for controller managerfalse
id stringID is the leader election resource identity
namespace stringNamespace is the namespace for the leader election resource

LoggingConfiguration

LoggingConfiguration holds logging settings.

Appears in:

FieldDescriptionDefaultValidation
level stringLevel is the log level (e.g., “info”, “debug”)info
format stringFormat is the log format (e.g., “json”, “text”)json

MPIConfiguration

MPIConfiguration holds MPI SSH secret settings.

Appears in:

FieldDescriptionDefaultValidation
sshSecretName stringSSHSecretName is the name of the secret containing the SSH key for MPI
sshSecretNamespace stringSSHSecretNamespace is the namespace where the MPI SSH secret is located

MetricsServer

MetricsServer extends Server with secure serving option.

Appears in:

FieldDescriptionDefaultValidation
bindAddress stringBindAddress is the address the server binds to
port integerPort is the port the server listens on
secure booleanSecure enables secure serving for the metrics endpoint

NamespaceConfiguration

NamespaceConfiguration determines operator namespace mode.

Appears in:

FieldDescriptionDefaultValidation
restricted stringRestricted is the namespace to restrict to. Empty = cluster-wide mode.
scope NamespaceScopeConfigurationScope holds namespace scope lease settings (namespace-restricted mode only)

NamespaceScopeConfiguration

NamespaceScopeConfiguration holds lease settings for namespace-restricted mode.

Appears in:

FieldDescriptionDefaultValidation
leaseDuration DurationLeaseDuration is the duration of namespace scope marker lease before expiration30s
leaseRenewInterval DurationLeaseRenewInterval is the interval for renewing namespace scope marker lease10s

OperatorConfiguration

OperatorConfiguration is the Schema for the operator configuration.

FieldDescriptionDefaultValidation
apiVersion stringoperator.config.dynamo.nvidia.com/v1alpha1
kind stringOperatorConfiguration
server ServerConfigurationServer configuration (metrics, health probes, webhooks)
leaderElection LeaderElectionConfigurationLeader election configuration
namespace NamespaceConfigurationNamespace configuration (restricted vs cluster-wide)
orchestrators OrchestratorConfigurationOrchestrator configuration with optional overrides
infrastructure InfrastructureConfigurationService mesh and infrastructure addresses
ingress IngressConfigurationIngress configuration
rbac RBACConfigurationRBAC configuration for cross-namespace resource management (cluster-wide mode)
mpi MPIConfigurationMPI SSH secret configuration
checkpoint CheckpointConfigurationCheckpoint/restore configuration
discovery DiscoveryConfigurationDiscovery backend configuration
gpu GPUConfigurationGPU discovery configuration
logging LoggingConfigurationLogging configuration
security SecurityConfigurationHTTP/2 and TLS settings

OrchestratorConfiguration

OrchestratorConfiguration holds orchestrator override settings.

Appears in:

FieldDescriptionDefaultValidation
grove GroveConfigurationGrove orchestrator configuration
lws LWSConfigurationLWS orchestrator configuration
kaiScheduler KaiSchedulerConfigurationKaiScheduler configuration

RBACConfiguration

RBACConfiguration holds RBAC settings for cluster-wide mode.

Appears in:

FieldDescriptionDefaultValidation
plannerClusterRoleName stringPlannerClusterRoleName is the ClusterRole for planner
dgdrProfilingClusterRoleName stringDGDRProfilingClusterRoleName is the ClusterRole for DGDR profiling jobs
eppClusterRoleName stringEPPClusterRoleName is the ClusterRole for EPP

SecurityConfiguration

SecurityConfiguration holds HTTP/2 and TLS settings.

Appears in:

FieldDescriptionDefaultValidation
enableHTTP2 booleanEnableHTTP2 enables HTTP/2 for metrics and webhook serversfalse

Server

Server holds a bind address and port.

Appears in:

FieldDescriptionDefaultValidation
bindAddress stringBindAddress is the address the server binds to
port integerPort is the port the server listens on

ServerConfiguration

ServerConfiguration holds server bind addresses and ports.

Appears in:

FieldDescriptionDefaultValidation
metrics MetricsServerMetrics server configuration{ bindAddress:127.0.0.1 port:8080 }
healthProbe ServerHealth probe server configuration{ bindAddress:0.0.0.0 port:8081 }
webhook WebhookServerWebhook server configuration{ certDir:/tmp/k8s-webhook-server/serving-certs host:0.0.0.0 port:9443 }

WebhookServer

WebhookServer extends Server with host and certificate directory.

Appears in:

FieldDescriptionDefaultValidation
bindAddress stringBindAddress is the address the server binds to
port integerPort is the port the server listens on
host stringHost is the address the webhook server binds to
certDir stringCertDir is the directory containing TLS certificates

Operator Default Values Injection

The Dynamo operator automatically applies default values to various fields when they are not explicitly specified in your deployments. These defaults include:

  • Health Probes: Startup, liveness, and readiness probes are configured differently for frontend, worker, and planner components. For example, worker components receive a startup probe with a 2-hour timeout (720 failures × 10 seconds) to accommodate long model loading times.

  • Security Context: All components receive fsGroup: 1000 by default to ensure proper file permissions for mounted volumes. This can be overridden via the extraPodSpec.securityContext field.

  • Shared Memory: All components receive an 8Gi shared memory volume mounted at /dev/shm by default (can be disabled or resized via the sharedMemory field).

  • Environment Variables: Components automatically receive environment variables like DYN_NAMESPACE, DYN_PARENT_DGD_K8S_NAME, DYNAMO_PORT, and backend-specific variables.

  • Pod Configuration: Default terminationGracePeriodSeconds of 60 seconds and restartPolicy: Always.

  • Autoscaling: When enabled without explicit metrics, defaults to CPU-based autoscaling with 80% target utilization.

  • Backend-Specific Behavior: For multinode deployments, probes are automatically modified or removed for worker nodes depending on the backend framework (VLLM, SGLang, or TensorRT-LLM).

Pod Specification Defaults

All components receive the following pod-level defaults unless overridden:

  • terminationGracePeriodSeconds: 60 seconds
  • restartPolicy: Always

Security Context

The operator automatically applies default security context settings to all components to ensure proper file permissions, particularly for mounted volumes:

  • fsGroup: 1000 - Sets the group ownership of mounted volumes and any files created in those volumes

This default ensures that non-root containers can write to mounted volumes (like model caches or persistent storage) without permission issues. The fsGroup setting is particularly important for:

  • Model downloads and caching
  • Compilation cache directories
  • Persistent volume claims (PVCs)
  • SSH key generation in multinode deployments

Overriding Security Context

To override the default security context, specify your own securityContext in the extraPodSpec of your component:

1services:
2 YourWorker:
3 extraPodSpec:
4 securityContext:
5 fsGroup: 2000 # Custom group ID
6 runAsUser: 1000
7 runAsGroup: 1000
8 runAsNonRoot: true

Important: When you provide any securityContext object in extraPodSpec, the operator will not inject any defaults. This gives you complete control over the security context, including the ability to run as root (by omitting runAsNonRoot or setting it to false).

OpenShift and Security Context Constraints

In OpenShift environments with Security Context Constraints (SCCs), you may need to omit explicit UID/GID values to allow OpenShift’s admission controllers to assign them dynamically:

1services:
2 YourWorker:
3 extraPodSpec:
4 securityContext:
5 # Omit fsGroup to let OpenShift assign it based on SCC
6 # OpenShift will inject the appropriate UID range

Alternatively, if you want to keep the default fsGroup: 1000 behavior and are certain your cluster allows it, you don’t need to specify anything - the operator defaults will work.

Shared Memory Configuration

Shared memory is enabled by default for all components:

  • Enabled: true (unless explicitly disabled via sharedMemory.disabled)
  • Size: 8Gi
  • Mount Path: /dev/shm
  • Volume Type: emptyDir with memory medium

To disable shared memory or customize the size, use the sharedMemory field in your component specification.

Health Probes by Component Type

The operator applies different default health probes based on the component type.

Frontend Components

Frontend components receive the following probe configurations:

Liveness Probe:

  • Type: HTTP GET
  • Path: /health
  • Port: http (8000)
  • Initial Delay: 60 seconds
  • Period: 60 seconds
  • Timeout: 30 seconds
  • Failure Threshold: 10

Readiness Probe:

  • Type: Exec command
  • Command: curl -s http://localhost:${DYNAMO_PORT}/health | jq -e ".status == \"healthy\""
  • Initial Delay: 60 seconds
  • Period: 60 seconds
  • Timeout: 30 seconds
  • Failure Threshold: 10

Worker Components

Worker components receive the following probe configurations:

Liveness Probe:

  • Type: HTTP GET
  • Path: /live
  • Port: system (9090)
  • Period: 5 seconds
  • Timeout: 30 seconds
  • Failure Threshold: 1

Readiness Probe:

  • Type: HTTP GET
  • Path: /health
  • Port: system (9090)
  • Period: 10 seconds
  • Timeout: 30 seconds
  • Failure Threshold: 60

Startup Probe:

  • Type: HTTP GET
  • Path: /live
  • Port: system (9090)
  • Period: 10 seconds
  • Timeout: 5 seconds
  • Failure Threshold: 720 (allows up to 2 hours for startup: 10s × 720 = 7200s)

:::{note} For larger models (typically >70B parameters) or slower storage systems, you may need to increase the failureThreshold to allow more time for model loading. Calculate the required threshold based on your expected startup time: failureThreshold = (expected_startup_seconds / period). Override the startup probe in your component specification if the default 2-hour window is insufficient. :::

Multinode Deployment Probe Modifications

For multinode deployments, the operator modifies probes based on the backend framework and node role:

VLLM Backend

The operator automatically selects between two deployment modes based on parallelism configuration:

Tensor/Pipeline Parallel Mode (when world_size > GPUs_per_node):

  • Uses Ray for distributed execution (--distributed-executor-backend ray)
  • Leader nodes: Starts Ray head and runs vLLM; all probes remain active
  • Worker nodes: Run Ray agents only; all probes (liveness, readiness, startup) are removed

Data Parallel Mode (when world_size × data_parallel_size > GPUs_per_node):

  • Worker nodes: All probes (liveness, readiness, startup) are removed
  • Leader nodes: All probes remain active

SGLang Backend

  • Worker nodes: All probes (liveness, readiness, startup) are removed

TensorRT-LLM Backend

  • Leader nodes: All probes remain unchanged
  • Worker nodes:
    • Liveness and startup probes are removed
    • Readiness probe is replaced with a TCP socket check on SSH port (2222):
      • Initial Delay: 20 seconds
      • Period: 20 seconds
      • Timeout: 5 seconds
      • Failure Threshold: 10

Environment Variables

The operator automatically injects environment variables into component containers based on component type, backend framework, and operator configuration. User-provided envs values always take precedence over operator defaults.

All Components

These environment variables are injected into every component container regardless of type.

VariablePurposeDefaultTypeSource
DYN_NAMESPACEDynamo service namespace used for service discovery and routingDerived from DGD specstringDownward API annotation on checkpoint-restored pods
DYN_COMPONENTIdentifies the component type for runtime behaviorOne of: frontend, worker, prefill, decode, planner, eppstringSet from component spec
DYN_PARENT_DGD_K8S_NAMEKubernetes name of the parent DynamoGraphDeployment resourcestringSet from DGD metadata
DYN_PARENT_DGD_K8S_NAMESPACEKubernetes namespace of the parent DynamoGraphDeployment resourcestringSet from DGD metadata
POD_NAMECurrent pod namestringDownward API (metadata.name)
POD_NAMESPACECurrent pod namespacestringDownward API (metadata.namespace)
POD_UIDCurrent pod UIDstringDownward API (metadata.uid)
DYN_DISCOVERY_BACKENDService discovery backend for inter-component communicationkubernetesstringOptions: kubernetes, etcd

Infrastructure (Conditional)

These are injected into all components when the corresponding infrastructure service is configured in the operator’s OperatorConfiguration.

VariablePurposeDefaultTypeCondition
NATS_SERVERNATS messaging server addressstringSet when infrastructure.natsAddress is configured
ETCD_ENDPOINTSetcd endpoint addresses for distributed statestringSet when infrastructure.etcdAddress is configured
MODEL_EXPRESS_URLModel Express service URL for model managementstringSet when infrastructure.modelExpressURL is configured
PROMETHEUS_ENDPOINTPrometheus endpoint for metrics collectionstringSet when infrastructure.prometheusEndpoint is configured

Frontend Components

VariablePurposeDefaultType
DYNAMO_PORTHTTP port the frontend listens on8000int
DYN_HTTP_PORTHTTP port for the frontend service (alias)8000int
DYN_NAMESPACE_PREFIXNamespace prefix used for frontend request routingSame as DYN_NAMESPACEstring

Worker Components

VariablePurposeDefaultType
DYN_SYSTEM_ENABLEDEnables the system HTTP server for health checks and metricstruestring (boolean)
DYN_SYSTEM_USE_ENDPOINT_HEALTH_STATUSEndpoints whose health status is used for readiness["generate"]string (JSON array)
DYN_SYSTEM_PORTPort for the system HTTP server (health, metrics)9090int
DYN_HEALTH_CHECK_ENABLEDDisables the legacy health check mechanism in favor of the system serverfalsestring (boolean)
NIXL_TELEMETRY_ENABLEEnables or disables NIXL telemetry collectionnstring
NIXL_TELEMETRY_EXPORTERTelemetry exporter format for NIXL metricsprometheusstring
NIXL_TELEMETRY_PROMETHEUS_PORTPort for NIXL Prometheus metrics endpoint19090int
DYN_NAMESPACE_WORKER_SUFFIXHash suffix appended to worker namespace for rolling updatesstring

Planner Components

VariablePurposeDefaultType
PLANNER_PROMETHEUS_PORTPort for the planner’s Prometheus metrics endpoint9085int

EPP (Endpoint Picker Plugin) Components

VariablePurposeDefaultType
USE_STREAMINGEnables streaming mode for inference request proxyingtruestring (boolean)
RUST_LOGRust log level and filter configurationdebug,dynamo_llm::kv_router=tracestring

VLLM Backend

VariablePurposeDefaultTypeCondition
VLLM_CACHE_ROOTDirectory for vLLM compilation cache artifactsstringSet when a volume mount has useAsCompilationCache: true
VLLM_NIXL_SIDE_CHANNEL_HOSTHost IP for the NIXL side channel in multiprocessing modePod IPstringMultinode mp backend only (Downward API: status.podIP)

TensorRT-LLM Backend

VariablePurposeDefaultTypeCondition
OMPI_MCA_orte_keep_fqdn_hostnamesInstructs OpenMPI to preserve FQDN hostnames for inter-node communication1stringMultinode deployments only

Checkpoint / Restore

These environment variables are injected when checkpoint/restore is enabled for a component.

VariablePurposeDefaultTypeCondition
DYN_CHECKPOINT_PATHBase directory where checkpoint data is storedFrom operator checkpoint config storage.pvc.basePathstringPVC storage type
DYN_CHECKPOINT_LOCATIONFull checkpoint URI (for non-PVC backends)stringS3 or OCI storage type
DYN_CHECKPOINT_HASHIdentity hash that uniquely identifies the checkpointstringAlways set when checkpoint is enabled
SKIP_WAIT_FOR_CHECKPOINTSkips the checkpoint readiness polling loop; checks once and proceedsstringSet on restored and DGD pods

Service Accounts

The following component types automatically receive dedicated service accounts:

  • Planner: planner-serviceaccount
  • EPP: epp-serviceaccount

Image Pull Secrets

The operator automatically discovers and injects image pull secrets for container images. When a component specifies a container image, the operator:

  1. Scans all Kubernetes secrets of type kubernetes.io/dockerconfigjson in the component’s namespace
  2. Extracts the docker registry server URLs from each secret’s authentication configuration
  3. Matches the container image’s registry host against the discovered registry URLs
  4. Automatically injects matching secrets as imagePullSecrets in the pod specification

This eliminates the need to manually specify image pull secrets for each component. The operator maintains an internal index of docker secrets and their associated registries, refreshing this index periodically.

To disable automatic image pull secret discovery for a specific component, add the following annotation:

1annotations:
2 nvidia.com/disable-image-pull-secret-discovery: "true"

Autoscaling Defaults

When autoscaling is enabled but no metrics are specified, the operator applies:

  • Default Metric: CPU utilization
  • Target Average Utilization: 80%

Port Configurations

Default container ports are configured based on component type:

Frontend Components

  • Port: 8000
  • Protocol: TCP
  • Name: http

Worker Components

  • Port: 9090 (system)
  • Protocol: TCP
  • Name: system
  • Port: 19090 (NIXL)
  • Protocol: TCP
  • Name: nixl

Planner Components

  • Port: 9085
  • Protocol: TCP
  • Name: metrics

EPP Components

  • Port: 9002 (gRPC)
  • Protocol: TCP
  • Name: grpc
  • Port: 9003 (gRPC health)
  • Protocol: TCP
  • Name: grpc-health
  • Port: 9090 (metrics)
  • Protocol: TCP
  • Name: metrics

Backend-Specific Configurations

VLLM

  • Ray Head Port: 6379 (for Ray cluster coordination in multinode TP/PP deployments)
  • Data Parallel RPC Port: 13445 (for data parallel multinode deployments)

SGLang

  • Distribution Init Port: 29500 (for multinode deployments)

TensorRT-LLM

  • SSH Port: 2222 (for multinode MPI communication)
  • OpenMPI Environment: OMPI_MCA_orte_keep_fqdn_hostnames=1

Implementation Reference

For users who want to understand the implementation details or contribute to the operator, the default values described in this document are set in the following source files:

Notes

  • All these defaults can be overridden by explicitly specifying values in your DynamoComponentDeployment or DynamoGraphDeployment resources
  • User-specified probes (via livenessProbe, readinessProbe, or startupProbe fields) take precedence over operator defaults
  • For security context, if you provide any securityContext in extraPodSpec, no defaults will be injected, giving you full control
  • For multinode deployments, some defaults are modified or removed as described above to accommodate distributed execution patterns
  • The extraPodSpec.mainContainer field can be used to override probe configurations set by the operator