# Dynamo Health Checks
## Overview
Dynamo provides health check and liveness HTTP endpoints for each component which
can be used to configure startup, liveness and readiness probes in
orchestration frameworks such as Kubernetes.
## Environment Variables
| Variable | Description | Default | Example |
|----------|-------------|---------|---------|
| `DYN_SYSTEM_PORT` | System status server port | `8081` | `9090` |
| `DYN_SYSTEM_STARTING_HEALTH_STATUS` | Initial health status | `notready` | `ready`, `notready` |
| `DYN_SYSTEM_HEALTH_PATH` | Custom health endpoint path | `/health` | `/custom/health` |
| `DYN_SYSTEM_LIVE_PATH` | Custom liveness endpoint path | `/live` | `/custom/live` |
| `DYN_SYSTEM_USE_ENDPOINT_HEALTH_STATUS` | Endpoints required for ready state | none | `["generate"]` |
| `DYN_HEALTH_CHECK_ENABLED` | Enable canary health checks | `false` (K8s: `true`) | `true`, `false` |
| `DYN_CANARY_WAIT_TIME` | Seconds before sending canary health check | `10` | `5`, `30` |
| `DYN_HEALTH_CHECK_REQUEST_TIMEOUT` | Health check request timeout in seconds | `3` | `5`, `10` |
## Getting Started Quickly
Enable health checks and query endpoints:
```bash
# Start your Dynamo components (default port 8000, override with --http-port or DYN_HTTP_PORT env var)
python -m dynamo.frontend &
# Enable system status server on port 8081
DYN_SYSTEM_PORT=8081 python -m dynamo.vllm --model Qwen/Qwen3-0.6B --enforce-eager &
```
Check health status:
```bash
# Frontend health (port 8000)
curl -s localhost:8000/health | jq
# Worker health (port 8081)
curl -s localhost:8081/health | jq
```
## Frontend Liveness Check
The frontend liveness endpoint reports a status of `live` as long as
the service is running.
Frontend liveness doesn't depend on worker health or liveness only on the Frontend service itself.
### Example Request
```
curl -s localhost:8080/live -q | jq
```
### Example Response
```
{
"message": "Service is live",
"status": "live"
}
```
## Frontend Health Check
The frontend health endpoint reports a status of `healthy` as long as
the service is running. Once workers have been registered, the
`health` endpoint will also list registered endpoints and instances.
Frontend liveness doesn't depend on worker health or liveness only on the Frontend service itself.
### Example Request
```
curl -v localhost:8080/health -q | jq
```
### Example Response
Before workers are registered:
```
HTTP/1.1 200 OK
content-type: application/json
content-length: 72
date: Wed, 03 Sep 2025 13:31:44 GMT
{
"instances": [],
"message": "No endpoints available",
"status": "unhealthy"
}
```
After workers are registered:
```
HTTP/1.1 200 OK
content-type: application/json
content-length: 609
date: Wed, 03 Sep 2025 13:32:03 GMT
{
"endpoints": [
"dyn://dynamo.backend.generate"
],
"instances": [
{
"component": "backend",
"endpoint": "clear_kv_blocks",
"instance_id": 7587888160958628000,
"namespace": "dynamo",
"transport": {
"nats_tcp": "dynamo_backend.clear_kv_blocks-694d98147d54be25"
}
},
{
"component": "backend",
"endpoint": "generate",
"instance_id": 7587888160958628000,
"namespace": "dynamo",
"transport": {
"nats_tcp": "dynamo_backend.generate-694d98147d54be25"
}
},
{
"component": "backend",
"endpoint": "load_metrics",
"instance_id": 7587888160958628000,
"namespace": "dynamo",
"transport": {
"nats_tcp": "dynamo_backend.load_metrics-694d98147d54be25"
}
}
],
"status": "healthy"
}
```
## Worker Liveness and Health Check
Health checks for components other than the frontend are enabled
selectively based on environment variables. If a health check for a
component is enabled the starting status can be set along with the set
of endpoints that are required to be served before the component is
declared `ready`.
Once all endpoints declared in `DYN_SYSTEM_USE_ENDPOINT_HEALTH_STATUS`
are served the component transitions to a `ready` state until the
component is shutdown. The endpoints return HTTP status code of `HTTP/1.1 503 Service Unavailable`
when initializing and HTTP status code `HTTP/1.1 200 OK` once ready.
Both /live and /ready return the same information
### Example Environment Setting
```
export DYN_SYSTEM_PORT=9090
export DYN_SYSTEM_STARTING_HEALTH_STATUS="notready"
export DYN_SYSTEM_USE_ENDPOINT_HEALTH_STATUS="[\"generate\"]"
```
#### Example Request
```
curl -v localhost:9090/health | jq
```
#### Example Response
Before endpoints are being served:
```
HTTP/1.1 503 Service Unavailable
content-type: text/plain; charset=utf-8
content-length: 96
date: Wed, 03 Sep 2025 13:42:39 GMT
{
"endpoints": {
"generate": "notready"
},
"status": "notready",
"uptime": {
"nanos": 313803539,
"secs": 12
}
}
```
After endpoints are being served:
```
HTTP/1.1 200 OK
content-type: text/plain; charset=utf-8
content-length: 139
date: Wed, 03 Sep 2025 13:42:45 GMT
{
"endpoints": {
"clear_kv_blocks": "ready",
"generate": "ready",
"load_metrics": "ready"
},
"status": "ready",
"uptime": {
"nanos": 356504530,
"secs": 18
}
}
```
## Canary Health Checks (Active Monitoring)
In addition to the HTTP endpoints described above, Dynamo includes a **canary health check** system that actively monitors worker endpoints.
### Overview
The canary health check system:
- **Monitors endpoint health** by sending periodic test requests to worker endpoints
- **Only activates during idle periods** - if there's ongoing traffic, health checks are skipped to avoid overhead
- **Automatically enabled in Kubernetes** deployments via the operator
- **Disabled by default** in local/development environments
### How It Works
1. **Idle Detection**: After no activity on an endpoint for a configurable wait time (default: 10 seconds), a canary health check is triggered
2. **Health Check Request**: A lightweight test request is sent to the endpoint with a minimal payload (generates 1 token)
3. **Activity Resets Timer**: If normal requests arrive, the canary timer resets and no health check is sent
4. **Timeout Handling**: If a health check doesn't respond within the timeout (default: 3 seconds), the endpoint is marked as unhealthy
### Configuration
#### In Kubernetes (Enabled by Default)
Health checks are automatically enabled by the Dynamo operator. No additional configuration is required.
```yaml
apiVersion: nvidia.com/v1alpha1
kind: DynamoGraphDeployment
metadata:
name: my-deployment
spec:
services:
VllmWorker:
componentType: worker
replicas: 2
# Health checks automatically enabled by operator
```
#### In Local/Development Environments (Disabled by Default)
To enable health checks locally:
```bash
# Enable health checks
export DYN_HEALTH_CHECK_ENABLED=true
# Optional: Customize timing
export DYN_CANARY_WAIT_TIME=5 # Wait 5 seconds before sending health check
export DYN_HEALTH_CHECK_REQUEST_TIMEOUT=5 # 5 second timeout
# Start worker
python -m dynamo.vllm --model Qwen/Qwen3-0.6B
```
#### Configuration Options
| Environment Variable | Description | Default | Notes |
|---------------------|-------------|---------|-------|
| `DYN_HEALTH_CHECK_ENABLED` | Enable/disable canary health checks | `false` (K8s: `true`) | Automatically set to `true` in K8s |
| `DYN_CANARY_WAIT_TIME` | Seconds to wait (during idle) before sending health check | `10` | Lower values = more frequent checks |
| `DYN_HEALTH_CHECK_REQUEST_TIMEOUT` | Max seconds to wait for health check response | `3` | Higher values = more tolerance for slow responses |
### Health Check Payloads
Each backend defines its own minimal health check payload:
- **vLLM**: Single token generation with minimal sampling options
- **TensorRT-LLM**: Single token with BOS token ID
- **SGLang**: Single token generation request
These payloads are designed to:
- Complete quickly (\< 100ms typically)
- Minimize GPU overhead
- Verify the full inference stack is working
### Observing Health Checks
When health checks are enabled, you'll see logs like:
```
INFO Health check manager started (canary_wait_time: 10s, request_timeout: 3s)
INFO Spawned health check task for endpoint: generate
INFO Canary timer expired for generate, sending health check
INFO Health check successful for generate
```
If an endpoint fails:
```
WARN Health check timeout for generate
ERROR Health check request failed for generate: connection refused
```
### When to Use Canary Health Checks
**Enable in production (Kubernetes):**
- ✅ Detect unhealthy workers before they affect user traffic
- ✅ Enable faster failure detection and recovery
- ✅ Monitor worker availability continuously
**Disable in development:**
- ✅ Reduce log noise during debugging
- ✅ Avoid overhead when not needed
- ✅ Simplify local testing
### Troubleshooting
**Health checks timing out:**
- Increase `DYN_HEALTH_CHECK_REQUEST_TIMEOUT`
- Check worker logs for errors
- Verify network connectivity
**Too many health check logs:**
- Increase `DYN_CANARY_WAIT_TIME` to reduce frequency
- Or disable with `DYN_HEALTH_CHECK_ENABLED=false` in dev
**Health checks not running:**
- Verify `DYN_HEALTH_CHECK_ENABLED=true` is set
- Check that `DYN_SYSTEM_USE_ENDPOINT_HEALTH_STATUS` includes the endpoint
- Ensure the worker is serving the endpoint
## Related Documentation
- [Distributed Runtime Architecture](/dynamo/v-0-9-0/design-docs/distributed-runtime)
- [Dynamo Architecture Overview](/dynamo/v-0-9-0/design-docs/overall-architecture)
- [Backend Guide](/dynamo/v-0-9-0/user-guides/writing-python-workers-in-dynamo)