# Dynamo Health Checks ## Overview Dynamo provides health check and liveness HTTP endpoints for each component which can be used to configure startup, liveness and readiness probes in orchestration frameworks such as Kubernetes. ## Environment Variables | Variable | Description | Default | Example | |----------|-------------|---------|---------| | `DYN_SYSTEM_PORT` | System status server port | `8081` | `9090` | | `DYN_SYSTEM_STARTING_HEALTH_STATUS` | Initial health status | `notready` | `ready`, `notready` | | `DYN_SYSTEM_HEALTH_PATH` | Custom health endpoint path | `/health` | `/custom/health` | | `DYN_SYSTEM_LIVE_PATH` | Custom liveness endpoint path | `/live` | `/custom/live` | | `DYN_SYSTEM_USE_ENDPOINT_HEALTH_STATUS` | Endpoints required for ready state | none | `["generate"]` | | `DYN_HEALTH_CHECK_ENABLED` | Enable canary health checks | `false` (K8s: `true`) | `true`, `false` | | `DYN_CANARY_WAIT_TIME` | Seconds before sending canary health check | `10` | `5`, `30` | | `DYN_HEALTH_CHECK_REQUEST_TIMEOUT` | Health check request timeout in seconds | `3` | `5`, `10` | ## Getting Started Quickly Enable health checks and query endpoints: ```bash # Start your Dynamo components (default port 8000, override with --http-port or DYN_HTTP_PORT env var) python -m dynamo.frontend & # Enable system status server on port 8081 DYN_SYSTEM_PORT=8081 python -m dynamo.vllm --model Qwen/Qwen3-0.6B --enforce-eager & ``` Check health status: ```bash # Frontend health (port 8000) curl -s localhost:8000/health | jq # Worker health (port 8081) curl -s localhost:8081/health | jq ``` ## Frontend Liveness Check The frontend liveness endpoint reports a status of `live` as long as the service is running. Frontend liveness doesn't depend on worker health or liveness only on the Frontend service itself. ### Example Request ``` curl -s localhost:8080/live -q | jq ``` ### Example Response ``` { "message": "Service is live", "status": "live" } ``` ## Frontend Health Check The frontend health endpoint reports a status of `healthy` as long as the service is running. Once workers have been registered, the `health` endpoint will also list registered endpoints and instances. Frontend liveness doesn't depend on worker health or liveness only on the Frontend service itself. ### Example Request ``` curl -v localhost:8080/health -q | jq ``` ### Example Response Before workers are registered: ``` HTTP/1.1 200 OK content-type: application/json content-length: 72 date: Wed, 03 Sep 2025 13:31:44 GMT { "instances": [], "message": "No endpoints available", "status": "unhealthy" } ``` After workers are registered: ``` HTTP/1.1 200 OK content-type: application/json content-length: 609 date: Wed, 03 Sep 2025 13:32:03 GMT { "endpoints": [ "dyn://dynamo.backend.generate" ], "instances": [ { "component": "backend", "endpoint": "clear_kv_blocks", "instance_id": 7587888160958628000, "namespace": "dynamo", "transport": { "nats_tcp": "dynamo_backend.clear_kv_blocks-694d98147d54be25" } }, { "component": "backend", "endpoint": "generate", "instance_id": 7587888160958628000, "namespace": "dynamo", "transport": { "nats_tcp": "dynamo_backend.generate-694d98147d54be25" } }, { "component": "backend", "endpoint": "load_metrics", "instance_id": 7587888160958628000, "namespace": "dynamo", "transport": { "nats_tcp": "dynamo_backend.load_metrics-694d98147d54be25" } } ], "status": "healthy" } ``` ## Worker Liveness and Health Check Health checks for components other than the frontend are enabled selectively based on environment variables. If a health check for a component is enabled the starting status can be set along with the set of endpoints that are required to be served before the component is declared `ready`. Once all endpoints declared in `DYN_SYSTEM_USE_ENDPOINT_HEALTH_STATUS` are served the component transitions to a `ready` state until the component is shutdown. The endpoints return HTTP status code of `HTTP/1.1 503 Service Unavailable` when initializing and HTTP status code `HTTP/1.1 200 OK` once ready. Both /live and /ready return the same information ### Example Environment Setting ``` export DYN_SYSTEM_PORT=9090 export DYN_SYSTEM_STARTING_HEALTH_STATUS="notready" export DYN_SYSTEM_USE_ENDPOINT_HEALTH_STATUS="[\"generate\"]" ``` #### Example Request ``` curl -v localhost:9090/health | jq ``` #### Example Response Before endpoints are being served: ``` HTTP/1.1 503 Service Unavailable content-type: text/plain; charset=utf-8 content-length: 96 date: Wed, 03 Sep 2025 13:42:39 GMT { "endpoints": { "generate": "notready" }, "status": "notready", "uptime": { "nanos": 313803539, "secs": 12 } } ``` After endpoints are being served: ``` HTTP/1.1 200 OK content-type: text/plain; charset=utf-8 content-length: 139 date: Wed, 03 Sep 2025 13:42:45 GMT { "endpoints": { "clear_kv_blocks": "ready", "generate": "ready", "load_metrics": "ready" }, "status": "ready", "uptime": { "nanos": 356504530, "secs": 18 } } ``` ## Canary Health Checks (Active Monitoring) In addition to the HTTP endpoints described above, Dynamo includes a **canary health check** system that actively monitors worker endpoints. ### Overview The canary health check system: - **Monitors endpoint health** by sending periodic test requests to worker endpoints - **Only activates during idle periods** - if there's ongoing traffic, health checks are skipped to avoid overhead - **Automatically enabled in Kubernetes** deployments via the operator - **Disabled by default** in local/development environments ### How It Works 1. **Idle Detection**: After no activity on an endpoint for a configurable wait time (default: 10 seconds), a canary health check is triggered 2. **Health Check Request**: A lightweight test request is sent to the endpoint with a minimal payload (generates 1 token) 3. **Activity Resets Timer**: If normal requests arrive, the canary timer resets and no health check is sent 4. **Timeout Handling**: If a health check doesn't respond within the timeout (default: 3 seconds), the endpoint is marked as unhealthy ### Configuration #### In Kubernetes (Enabled by Default) Health checks are automatically enabled by the Dynamo operator. No additional configuration is required. ```yaml apiVersion: nvidia.com/v1alpha1 kind: DynamoGraphDeployment metadata: name: my-deployment spec: services: VllmWorker: componentType: worker replicas: 2 # Health checks automatically enabled by operator ``` #### In Local/Development Environments (Disabled by Default) To enable health checks locally: ```bash # Enable health checks export DYN_HEALTH_CHECK_ENABLED=true # Optional: Customize timing export DYN_CANARY_WAIT_TIME=5 # Wait 5 seconds before sending health check export DYN_HEALTH_CHECK_REQUEST_TIMEOUT=5 # 5 second timeout # Start worker python -m dynamo.vllm --model Qwen/Qwen3-0.6B ``` #### Configuration Options | Environment Variable | Description | Default | Notes | |---------------------|-------------|---------|-------| | `DYN_HEALTH_CHECK_ENABLED` | Enable/disable canary health checks | `false` (K8s: `true`) | Automatically set to `true` in K8s | | `DYN_CANARY_WAIT_TIME` | Seconds to wait (during idle) before sending health check | `10` | Lower values = more frequent checks | | `DYN_HEALTH_CHECK_REQUEST_TIMEOUT` | Max seconds to wait for health check response | `3` | Higher values = more tolerance for slow responses | ### Health Check Payloads Each backend defines its own minimal health check payload: - **vLLM**: Single token generation with minimal sampling options - **TensorRT-LLM**: Single token with BOS token ID - **SGLang**: Single token generation request These payloads are designed to: - Complete quickly (\< 100ms typically) - Minimize GPU overhead - Verify the full inference stack is working ### Observing Health Checks When health checks are enabled, you'll see logs like: ``` INFO Health check manager started (canary_wait_time: 10s, request_timeout: 3s) INFO Spawned health check task for endpoint: generate INFO Canary timer expired for generate, sending health check INFO Health check successful for generate ``` If an endpoint fails: ``` WARN Health check timeout for generate ERROR Health check request failed for generate: connection refused ``` ### When to Use Canary Health Checks **Enable in production (Kubernetes):** - ✅ Detect unhealthy workers before they affect user traffic - ✅ Enable faster failure detection and recovery - ✅ Monitor worker availability continuously **Disable in development:** - ✅ Reduce log noise during debugging - ✅ Avoid overhead when not needed - ✅ Simplify local testing ### Troubleshooting **Health checks timing out:** - Increase `DYN_HEALTH_CHECK_REQUEST_TIMEOUT` - Check worker logs for errors - Verify network connectivity **Too many health check logs:** - Increase `DYN_CANARY_WAIT_TIME` to reduce frequency - Or disable with `DYN_HEALTH_CHECK_ENABLED=false` in dev **Health checks not running:** - Verify `DYN_HEALTH_CHECK_ENABLED=true` is set - Check that `DYN_SYSTEM_USE_ENDPOINT_HEALTH_STATUS` includes the endpoint - Ensure the worker is serving the endpoint ## Related Documentation - [Distributed Runtime Architecture](/dynamo/v-0-9-0/design-docs/distributed-runtime) - [Dynamo Architecture Overview](/dynamo/v-0-9-0/design-docs/overall-architecture) - [Backend Guide](/dynamo/v-0-9-0/user-guides/writing-python-workers-in-dynamo)