Managing Models with DynamoModel

View as Markdown

Overview

DynamoModel is a Kubernetes Custom Resource that represents a machine learning model deployed on Dynamo. It enables you to:

  • Deploy LoRA adapters on top of running base models
  • Track model endpoints and their readiness across your cluster
  • Manage model lifecycle declaratively with Kubernetes

DynamoModel works alongside DynamoGraphDeployment (DGD) or DynamoComponentDeployment (DCD) resources. While DGD/DCD deploy the inference infrastructure (pods, services), DynamoModel handles model-specific operations like loading LoRA adapters.

Quick Start

Prerequisites

Before creating a DynamoModel, you need:

  1. A running DynamoGraphDeployment or DynamoComponentDeployment
  2. Components configured with modelRef pointing to your base model
  3. Pods are ready and serving your base model

For complete setup including DGD configuration, see Integration with DynamoGraphDeployment.

Deploy a LoRA Adapter

1. Create your DynamoModel:

1apiVersion: nvidia.com/v1alpha1
2kind: DynamoModel
3metadata:
4 name: my-lora
5 namespace: dynamo-system
6spec:
7 modelName: my-custom-lora
8 baseModelName: Qwen/Qwen3-0.6B # Must match modelRef.name in your DGD
9 modelType: lora
10 source:
11 uri: s3://my-bucket/loras/my-lora

2. Apply and verify:

$# Apply the DynamoModel
$kubectl apply -f my-lora.yaml
$
$# Check status
$kubectl get dynamomodel my-lora

Expected output:

NAME TOTAL READY AGE
my-lora 2 2 30s

That’s it! The operator automatically discovers endpoints and loads the LoRA.

For detailed status monitoring, see Monitoring & Operations.

Understanding DynamoModel

Model Types

DynamoModel supports three model types:

TypeDescriptionUse Case
baseReference to an existing base modelTracking endpoints for a base model (default)
loraLoRA adapter that extends a base modelDeploy fine-tuned adapters on existing models
adapterGeneric model adapterFuture extensibility for other adapter types

Most users will use lora to deploy fine-tuned models on top of their base model deployments.

How It Works

When you create a DynamoModel, the operator:

  1. Discovers endpoints: Finds all pods running your baseModelName (by matching modelRef.name in DGD/DCD)
  2. Creates service: Automatically creates a Kubernetes Service to track these pods
  3. Loads LoRA: Calls the LoRA load API on each endpoint (for lora type)
  4. Updates status: Reports which endpoints are ready

Key linkage:

1# DGD modelRef.name ↔ DynamoModel baseModelName must match
2Worker:
3 modelRef:
4 name: Qwen/Qwen3-0.6B
5---
6spec:
7 baseModelName: Qwen/Qwen3-0.6B

Configuration Overview

DynamoModel requires just a few key fields to deploy a model or adapter:

FieldRequiredPurposeExample
modelNameYesModel identifiermy-custom-lora
baseModelNameYesLinks to DGD modelRefQwen/Qwen3-0.6B
modelTypeNoType: base/lora/adapterlora (default: base)
source.uriFor LoRAModel locations3://bucket/path or hf://org/model

Example minimal LoRA configuration:

1apiVersion: nvidia.com/v1alpha1
2kind: DynamoModel
3metadata:
4 name: my-lora
5spec:
6 modelName: my-custom-lora
7 baseModelName: Qwen/Qwen3-0.6B
8 modelType: lora
9 source:
10 uri: s3://my-bucket/my-lora

For complete field specifications, validation rules, and all options, see: šŸ“– DynamoModel API Reference

Status Summary

The status shows discovered endpoints and their readiness:

$kubectl get dynamomodel my-lora

Key status fields:

  • totalEndpoints / readyEndpoints: Counts of discovered vs ready endpoints
  • endpoints[]: List with addresses, pod names, and ready status
  • conditions: Standard Kubernetes conditions (EndpointsReady, ServicesFound)

For detailed status usage, see the Monitoring & Operations section below

Common Use Cases

Use Case 1: S3-Hosted LoRA Adapter

Deploy a LoRA adapter stored in an S3 bucket.

1apiVersion: nvidia.com/v1alpha1
2kind: DynamoModel
3metadata:
4 name: customer-support-lora
5 namespace: production
6spec:
7 modelName: customer-support-adapter-v1
8 baseModelName: meta-llama/Llama-3.3-70B-Instruct
9 modelType: lora
10 source:
11 uri: s3://my-models-bucket/loras/customer-support/v1

Prerequisites:

  • S3 bucket accessible from your pods (IAM role or credentials)
  • Base model meta-llama/Llama-3.3-70B-Instruct running via DGD/DCD

Verification:

$# Check LoRA is loaded
$kubectl get dynamomodel customer-support-lora -o jsonpath='{.status.readyEndpoints}'
$# Should output: 2 (or your number of replicas)
$
$# View which pods are serving
$kubectl get dynamomodel customer-support-lora -o jsonpath='{.status.endpoints[*].podName}'

Use Case 2: HuggingFace-Hosted LoRA

Deploy a LoRA adapter from HuggingFace Hub.

1apiVersion: nvidia.com/v1alpha1
2kind: DynamoModel
3metadata:
4 name: multilingual-lora
5 namespace: dynamo-system
6spec:
7 modelName: multilingual-adapter
8 baseModelName: Qwen/Qwen3-0.6B
9 modelType: lora
10 source:
11 uri: hf://myorg/qwen-multilingual-lora@v1.0.0 # Optional: @revision

Prerequisites:

  • HuggingFace Hub accessible from your pods
  • If private repo: HF token configured as secret and mounted in pods
  • Base model Qwen/Qwen3-0.6B running via DGD/DCD

With HuggingFace token:

1# In your DGD/DCD
2spec:
3 services:
4 worker:
5 envFromSecret: hf-token-secret # Provides HF_TOKEN env var
6 modelRef:
7 name: Qwen/Qwen3-0.6B
8 # ... rest of config

Use Case 3: Multiple LoRAs on Same Base Model

Deploy multiple LoRA adapters on the same base model deployment.

1---
2# LoRA for customer support
3apiVersion: nvidia.com/v1alpha1
4kind: DynamoModel
5metadata:
6 name: support-lora
7spec:
8 modelName: support-adapter
9 baseModelName: Qwen/Qwen3-0.6B
10 modelType: lora
11 source:
12 uri: s3://models/support-lora
13
14---
15# LoRA for code generation
16apiVersion: nvidia.com/v1alpha1
17kind: DynamoModel
18metadata:
19 name: code-lora
20spec:
21 modelName: code-adapter
22 baseModelName: Qwen/Qwen3-0.6B # Same base model
23 modelType: lora
24 source:
25 uri: s3://models/code-lora

Both LoRAs will be loaded on all pods serving Qwen/Qwen3-0.6B. Your application can then route requests to the appropriate adapter.

Monitoring & Operations

Checking Status

Quick status check:

$kubectl get dynamomodel

Example output:

NAME TOTAL READY AGE
my-lora 2 2 5m
customer-lora 4 3 2h

Detailed status:

$kubectl describe dynamomodel my-lora

Example output:

Name: my-lora
Namespace: dynamo-system
Spec:
Model Name: my-custom-lora
Base Model Name: Qwen/Qwen3-0.6B
Model Type: lora
Source:
Uri: s3://my-bucket/my-lora
Status:
Ready Endpoints: 2
Total Endpoints: 2
Endpoints:
Address: http://10.0.1.5:9090
Pod Name: worker-0
Ready: true
Address: http://10.0.1.6:9090
Pod Name: worker-1
Ready: true
Conditions:
Type: EndpointsReady
Status: True
Reason: EndpointsDiscovered
Events:
Type Reason Message
---- ------ -------
Normal EndpointsReady Discovered 2 ready endpoints for base model Qwen/Qwen3-0.6B

Understanding Readiness

An endpoint is ready when:

  1. The pod is running and healthy
  2. The LoRA load API call succeeded

Condition states:

  • EndpointsReady=True: All endpoints are ready (full availability)
  • EndpointsReady=False, Reason=NotReady: Not all endpoints ready (check message for counts)
  • EndpointsReady=False, Reason=NoEndpoints: No endpoints found

When readyEndpoints < totalEndpoints, the operator automatically retries loading every 30 seconds.

Viewing Endpoints

Get endpoint addresses:

$kubectl get dynamomodel my-lora -o jsonpath='{.status.endpoints[*].address}' | tr ' ' '\n'

Output:

http://10.0.1.5:9090
http://10.0.1.6:9090

Get endpoint pod names:

$kubectl get dynamomodel my-lora -o jsonpath='{.status.endpoints[*].podName}' | tr ' ' '\n'

Check readiness of each endpoint:

$kubectl get dynamomodel my-lora -o json | jq '.status.endpoints[] | {podName, ready}'

Output:

1{
2 "podName": "worker-0",
3 "ready": true
4}
5{
6 "podName": "worker-1",
7 "ready": true
8}

Updating a Model

To update a LoRA (e.g., deploy a new version):

$# Edit the source URI
$kubectl edit dynamomodel my-lora
$
$# Or apply an updated YAML
$kubectl apply -f my-lora-v2.yaml

The operator will detect the change and reload the LoRA on all endpoints.

Deleting a Model

$kubectl delete dynamomodel my-lora

For LoRA models, the operator will:

  1. Unload the LoRA from all endpoints
  2. Clean up associated resources
  3. Remove the DynamoModel CR

The base model deployment (DGD/DCD) continues running normally.

Troubleshooting

No Endpoints Found

Symptom:

1status:
2 totalEndpoints: 0
3 readyEndpoints: 0
4 conditions:
5 - type: EndpointsReady
6 status: "False"
7 reason: NoEndpoints
8 message: "No endpoint slices found for base model Qwen/Qwen3-0.6B"

Common Causes:

  1. Base model deployment not running

    $# Check if pods exist
    $kubectl get pods -l nvidia.com/dynamo-component-type=worker

    Solution: Deploy your DGD/DCD first, wait for pods to be ready.

  2. baseModelName mismatch

    $# Check modelRef in your DGD
    $kubectl get dynamographdeployment my-deployment -o yaml | grep -A2 modelRef

    Solution: Ensure baseModelName in DynamoModel exactly matches modelRef.name in DGD.

  3. Pods not ready

    $# Check pod status
    $kubectl get pods -l nvidia.com/dynamo-component-type=worker

    Solution: Wait for pods to reach Running and Ready state.

  4. Wrong namespace Solution: Ensure DynamoModel is in the same namespace as your DGD/DCD.

LoRA Load Failures

Symptom:

1status:
2 totalEndpoints: 2
3 readyEndpoints: 0 # ← No endpoints ready despite pods existing
4 conditions:
5 - type: EndpointsReady
6 status: "False"
7 reason: NoReadyEndpoints

Common Causes:

  1. Source URI not accessible

    $# Check operator logs
    $kubectl logs -n dynamo-system deployment/dynamo-operator-controller-manager -f | grep "Failed to load"

    Solution:

    • For S3: Verify bucket permissions, IAM role, credentials
    • For HuggingFace: Verify token is valid, repo exists and is accessible
  2. Invalid LoRA format Solution: Ensure your LoRA weights are in the format expected by your backend framework (vLLM, SGLang, etc.)

  3. Endpoint API errors

    $# Check operator logs for HTTP errors
    $kubectl logs -n dynamo-system deployment/dynamo-operator-controller-manager | grep "error"

    Solution: Check the backend framework’s logs in the worker pods:

    $kubectl logs worker-0
  4. Out of memory Solution: LoRA adapters require additional memory. Increase memory limits in your DGD:

    1resources:
    2 limits:
    3 memory: "32Gi" # Increase if needed

Status Shows Not Ready

Symptom: Some endpoints remain not ready for extended periods.

Diagnosis:

$# Check which endpoints are not ready
$kubectl get dynamomodel my-lora -o json | jq '.status.endpoints[] | select(.ready == false)'
$
$# View operator logs for that specific pod
$kubectl logs -n dynamo-system deployment/dynamo-operator-controller-manager | grep "worker-0"
$
$# Check the worker pod logs
$kubectl logs worker-0 | tail -50

Common Causes:

  1. Network issues: Pod can’t reach S3/HuggingFace
  2. Resource constraints: Pod is OOMing or being throttled
  3. API endpoint not responding: Backend framework isn’t serving the LoRA API

When to wait vs investigate:

  • Wait: If readyEndpoints is increasing over time (LoRAs loading progressively)
  • Investigate: If stuck at same readyEndpoints for >5 minutes

Viewing Events and Logs

Check events:

$kubectl describe dynamomodel my-lora | tail -20

View operator logs:

$# Follow logs
$kubectl logs -n dynamo-system deployment/dynamo-operator-controller-manager -f
$
$# Filter for specific model
$kubectl logs -n dynamo-system deployment/dynamo-operator-controller-manager | grep "my-lora"

Common events and messages:

Event/MessageMeaningAction
EndpointsReadyAll endpoints are readyāœ… Good - full service availability
NotReadyNot all endpoints readyāš ļø Check readyEndpoints count - operator will retry
PartialEndpointFailureSome endpoints failed to loadCheck logs for errors
NoEndpointsFoundNo pods discoveredVerify DGD running and modelRef matches
EndpointDiscoveryFailedCan’t query endpointsCheck operator RBAC permissions
Successfully reconciledReconciliation completeāœ… Good

Integration with DynamoGraphDeployment

This section shows the complete end-to-end workflow for deploying base models and LoRA adapters together.

DynamoModel and DynamoGraphDeployment work together to provide complete model deployment:

  • DGD: Deploys the infrastructure (pods, services, resources)
  • DynamoModel: Manages model-specific operations (LoRA loading)

Linking Models to Components

The connection is established through the modelRef field in your DGD:

Complete example:

1---
2# 1. Deploy the base model infrastructure
3apiVersion: nvidia.com/v1alpha1
4kind: DynamoGraphDeployment
5metadata:
6 name: my-deployment
7spec:
8 backendFramework: vllm
9 services:
10 Frontend:
11 componentType: frontend
12 replicas: 1
13 dynamoNamespace: my-app
14 extraPodSpec:
15 mainContainer:
16 image: nvcr.io/nvidia/ai-dynamo/vllm-runtime:latest
17
18 Worker:
19 # This modelRef creates the link to DynamoModel
20 modelRef:
21 name: Qwen/Qwen3-0.6B # ← Key linking field
22
23 componentType: worker
24 replicas: 2
25 resources:
26 limits:
27 gpu: "1"
28 extraPodSpec:
29 mainContainer:
30 image: nvcr.io/nvidia/ai-dynamo/vllm-runtime:latest
31 args:
32 - --model
33 - Qwen/Qwen3-0.6B
34 - --tensor-parallel-size
35 - "1"
36
37---
38# 2. Deploy LoRA adapters on top
39apiVersion: nvidia.com/v1alpha1
40kind: DynamoModel
41metadata:
42 name: my-lora
43spec:
44 modelName: my-custom-lora
45 baseModelName: Qwen/Qwen3-0.6B # ← Must match modelRef.name above
46 modelType: lora
47 source:
48 uri: s3://my-bucket/loras/my-lora

Deployment Workflow

Recommended order:

$# 1. Deploy base model infrastructure
$kubectl apply -f my-deployment.yaml
$
$# 2. Wait for pods to be ready
$kubectl wait --for=condition=ready pod -l nvidia.com/dynamo-component-type=worker --timeout=5m
$
$# 3. Deploy LoRA adapters
$kubectl apply -f my-lora.yaml
$
$# 4. Verify LoRA is loaded
$kubectl get dynamomodel my-lora

What happens behind the scenes:

StepDGDDynamoModel
1Creates pods with modelRef-
2Pods become running and ready-
3-CR created, discovers endpoints via auto-created Service
4-Calls LoRA load API on each endpoint
5-All endpoints ready āœ“

The operator automatically handles all service discovery - you don’t configure services, labels, or selectors manually.

API Reference

For complete field specifications, validation rules, and detailed type definitions, see:

šŸ“– Dynamo CRD API Reference

Summary

DynamoModel provides declarative model management for Dynamo deployments:

āœ… Simple: 2-step deployment of LoRA adapters āœ… Automatic: Endpoint discovery and loading handled by operator āœ… Observable: Rich status reporting and conditions āœ… Integrated: Works seamlessly with DynamoGraphDeployment

Next Steps: