DGDR Reference
A DynamoGraphDeploymentRequest (DGDR) is Dynamo’s deploy-by-intent API.
You describe what you want to run and your performance targets; the profiler
determines the optimal configuration and creates the live deployment.
For a step-by-step walkthrough of deploying your model — including strategy selection, model caching, planner setup, and common pitfalls — see the Model Deployment Guide.
DGDR vs DGD
Dynamo provides two Custom Resources for deploying inference graphs:
When to use DGD instead: Use DGD when you have a hand-crafted configuration
for a specific model/hardware combination (e.g., from recipes/). These configs
may be more optimal for known setups but require understanding of what
parallelism parameters (TP, PP, EP) are appropriate and don’t generalize across
different hardware.
For DGD deployment details, see Creating Deployments.
Spec Reference
Minimal Example
Field Reference
For the complete CRD spec, see the API Reference.
Generated DGD Overrides
Use spec.overrides.dgd when the generated DynamoGraphDeployment needs a
field that DGDR does not expose directly. The value is a partial
nvidia.com/v1alpha1 DGD object that is merged into the profiler-generated
deployment after Dynamo selects a configuration.
For example, to inject an environment variable into every generated service:
Use spec.envs for variables that should apply to all generated services. To
target a single service, override that service’s envs entry instead:
overrides.profilingJob only customizes the profiling Job. Use
overrides.dgd for settings that must appear on the deployed worker pods.
SKU Format
When providing hardware configuration manually, use lowercase underscore format:
All supported values: gb200_sxm, b200_sxm, h200_sxm, h100_sxm,
h100_pcie, a100_sxm, a100_pcie, a30, l40s, l40, l4,
v100_sxm, v100_pcie, t4, mi200, mi300.
Not all SKUs are supported by the AIC profiler for rapid mode. See
AIC Support Matrix for details.
PCIe variants not yet supported by profiler. The CRD admits PCIe SKUs
(h100_pcie, a100_pcie, v100_pcie), but the profiler does not currently
ship training data for them. You can submit a DGDR with a PCIe value; the
operator will accept it but profiler-assisted sizing will fall back to
defaults. Profiler support for PCIe SKUs is tracked as an engineering
follow-up.
Lifecycle
When you create a DGDR, it progresses through these phases:
Conditions
The operator maintains these conditions on the DGDR status:
Monitoring
Resource Ownership
- The DGDR does not set an owner reference on the DGD it creates. Deleting a DGDR does not delete the DGD — it persists independently so it can continue serving traffic.
- The relationship is tracked via labels:
dgdr.nvidia.com/nameanddgdr.nvidia.com/namespace. - Additional resources (planner ConfigMaps) are created in the same namespace
and labeled with
dgdr.nvidia.com/name.
Known Issues
pareto_analysis.pyproduces NaN for some configurations. Tracked as an engineering follow-up. Workaround: re-run with a narrower sweep; narrow sweeps bypass the NaN path in practice.- PCIe profiler data not yet available. See the PCIe callout under SKU Format.
Further Reading
- Model Deployment Guide — How to deploy your model, strategy selection, pitfalls, examples
- Profiler Guide — Profiling algorithms, picking modes, gate checks
- Profiler Examples — Ready-to-use YAML for SLA targets, private models, MoE, overrides
- Planner Guide — Scaling modes, PlannerConfig reference
- API Reference — Complete CRD field specifications
- Creating Deployments — DGD spec for full manual control