Frontend Configuration Reference | NVIDIA Dynamo Documentation

This page documents all configuration options for the Dynamo Frontend (python -m dynamo.frontend).

Every CLI argument has a corresponding environment variable. CLI arguments take precedence over environment variables.

HTTP & Networking

CLI Argument	Env Var	Default	Description
`--http-host`	`DYN_HTTP_HOST`	`0.0.0.0`	HTTP listen address
`--http-port`	`DYN_HTTP_PORT`	`8000`	HTTP listen port
`--tls-cert-path`	`DYN_TLS_CERT_PATH`	—	TLS certificate path (PEM). Must be paired with `--tls-key-path`
`--tls-key-path`	`DYN_TLS_KEY_PATH`	—	TLS private key path (PEM). Must be paired with `--tls-cert-path`

The Rust HTTP server also reads these environment variables (not exposed as CLI args):

Env Var	Default	Description
`DYN_HTTP_BODY_LIMIT_MB`	`192`	Maximum request body size in MB
`DYN_HTTP_GRACEFUL_SHUTDOWN_TIMEOUT_SECS`	`5`	Graceful shutdown timeout in seconds

Router

CLI Argument	Env Var	Default	Description
`--router-mode`	`DYN_ROUTER_MODE`	`round-robin`	Routing strategy: `round-robin`, `random`, `kv`, `direct`, `least-loaded`, `device-aware-weighted`
`--load-aware` / `--no-load-aware`	`DYN_ROUTER_LOAD_AWARE`	`false`	Preset for KV load-aware routing without cache-reuse signals; implies `--router-mode kv`
`--router-kv-overlap-score-credit`	`DYN_ROUTER_KV_OVERLAP_SCORE_CREDIT`	`1.0`	Credit multiplier for device-local prefix overlap, from 0.0 to 1.0
`--router-prefill-load-scale`	`DYN_ROUTER_PREFILL_LOAD_SCALE`	`1.0`	Scale adjusted prompt-side prefill load before adding decode blocks
`--router-temperature`	`DYN_ROUTER_TEMPERATURE`	`0.0`	Softmax temperature for normalized worker sampling. 0 = deterministic
`--router-kv-events` / `--no-router-kv-events`	`DYN_ROUTER_USE_KV_EVENTS`	`true`	Enable KV cache state events from workers. Disable for prediction-based routing
`--router-ttl-secs`	`DYN_ROUTER_TTL_SECS`	`120.0`	Block TTL when KV events are disabled
`--router-replica-sync` / `--no-router-replica-sync`	`DYN_ROUTER_REPLICA_SYNC`	`false`	Sync state across multiple router instances
`--router-snapshot-threshold`	`DYN_ROUTER_SNAPSHOT_THRESHOLD`	`1000000`	Messages before triggering a snapshot
`--router-reset-states` / `--no-router-reset-states`	`DYN_ROUTER_RESET_STATES`	`false`	Reset router state on startup. Warning: affects existing replicas
`--router-track-active-blocks` / `--no-router-track-active-blocks`	`DYN_ROUTER_TRACK_ACTIVE_BLOCKS`	`true`	Track blocks used by in-progress requests for load balancing
`--router-assume-kv-reuse` / `--no-router-assume-kv-reuse`	`DYN_ROUTER_ASSUME_KV_REUSE`	`true`	Assume KV cache reuse when tracking active blocks
`--router-track-output-blocks` / `--no-router-track-output-blocks`	`DYN_ROUTER_TRACK_OUTPUT_BLOCKS`	`false`	Track output blocks with fractional decay during generation
`--router-track-prefill-tokens` / `--no-router-track-prefill-tokens`	`DYN_ROUTER_TRACK_PREFILL_TOKENS`	`true`	Track prompt-side prefill load in worker load accounting
`--router-prefill-load-model`	`DYN_ROUTER_PREFILL_LOAD_MODEL`	`none`	Prompt-side load model: `none` for static load, `aic` for oldest-prefill decay using an AIC prediction
`--router-event-threads`	`DYN_ROUTER_EVENT_THREADS`	`4`	KV indexer worker threads. >1 enables the concurrent radix tree, including with `--no-router-kv-events`
`--router-queue-threshold`	`DYN_ROUTER_QUEUE_THRESHOLD`	`16.0`	Queue threshold fraction of prefill capacity. Priority hints only affect requests waiting in this queue
`--router-queue-policy`	`DYN_ROUTER_QUEUE_POLICY`	`fcfs`	Queue scheduling policy: `fcfs` (tail TTFT), `wspt` (avg TTFT), or `lcfs` (comparison-only reverse ordering)
`--decode-fallback` / `--no-decode-fallback`	`DYN_DECODE_FALLBACK`	`false`	Fall back to aggregated mode when prefill workers unavailable

AIC Prefill Load Model

These options are used only when --router-mode kv is combined with --router-prefill-load-model aic.

CLI Argument	Env Var	Default	Description
`--aic-backend`	`DYN_AIC_BACKEND`	—	Backend family to model in AIC, for example `vllm` or `sglang`
`--aic-system`	`DYN_AIC_SYSTEM`	—	AIC hardware/system identifier, for example `h200_sxm`
`--aic-model-path`	`DYN_AIC_MODEL_PATH`	—	Model path or model identifier used for AIC perf lookup
`--aic-backend-version`	`DYN_AIC_BACKEND_VERSION`	backend-specific	Pinned AIC database version. If omitted, Dynamo uses the backend default
`--aic-tp-size`	`DYN_AIC_TP_SIZE`	`1`	Tensor-parallel size to model in AIC
`--aic-moe-tp-size`	`DYN_AIC_MOE_TP_SIZE`	—	MoE tensor-parallel size for models that require AIC MoE parallelism
`--aic-moe-ep-size`	`DYN_AIC_MOE_EP_SIZE`	—	MoE expert-parallel size for models that require AIC MoE parallelism
`--aic-attention-dp-size`	`DYN_AIC_ATTENTION_DP_SIZE`	—	Attention data-parallel size for models that require AIC MoE parallelism

When enabled, the frontend’s embedded KV router predicts one expected prefill duration per admitted request, using the selected worker’s overlap-derived cached prefix. The router then decays only the oldest active prefill request on each worker for prompt-side load accounting.

For MoE models, AIC requires aic_tp_size * aic_attention_dp_size == aic_moe_tp_size * aic_moe_ep_size. For Kimi-style TP-only MoE runs, set --aic-moe-tp-size to the same value as --aic-tp-size, with --aic-moe-ep-size 1 and --aic-attention-dp-size 1.

Fault Tolerance

CLI Argument	Env Var	Default	Description
`--migration-limit`	`DYN_MIGRATION_LIMIT`	`0`	Max request migrations per worker disconnect. 0 = disabled
`--active-decode-blocks-threshold`	`DYN_ACTIVE_DECODE_BLOCKS_THRESHOLD`	`1.0`	KV cache utilization fraction (0.0–1.0) for busy detection. Pass `None` to disable
`--active-prefill-tokens-threshold`	`DYN_ACTIVE_PREFILL_TOKENS_THRESHOLD`	`10000000`	Absolute token count for prefill busy detection. Pass `None` to disable
`--active-prefill-tokens-threshold-frac`	`DYN_ACTIVE_PREFILL_TOKENS_THRESHOLD_FRAC`	`64.0`	Fraction of `max_num_batched_tokens` for prefill busy detection. OR logic with absolute threshold. Pass `None` to disable
`--admission-control`	`DYN_ADMISSION_CONTROL`	`none`	Admission control mode. `token-capacity` applies the busy thresholds above; `none` clears them. Router queueing remains controlled by `--router-queue-threshold`

Model Discovery

CLI Argument	Env Var	Default	Description
`--namespace`	`DYN_NAMESPACE`	—	Exact namespace for model discovery scoping
`--namespace-prefix`	`DYN_NAMESPACE_PREFIX`	—	Namespace prefix for discovery (e.g., `ns` matches `ns`, `ns-abc123`). Takes precedence over `--namespace`
`--model-name`	`DYN_MODEL_NAME`	—	Override model name string
`--model-path`	`DYN_MODEL_PATH`	—	Path to local model directory (for private/custom models)
`--kv-cache-block-size`	`DYN_KV_CACHE_BLOCK_SIZE`	—	KV cache block size override

Infrastructure

CLI Argument	Env Var	Default	Description
`--discovery-backend`	`DYN_DISCOVERY_BACKEND`	`etcd`	Service discovery: `kubernetes`, `etcd`, `file`, `mem`
`--request-plane`	`DYN_REQUEST_PLANE`	`tcp`	Request distribution: `tcp` (fastest), `nats`
`--event-plane`	`DYN_EVENT_PLANE`	auto	Event publishing: `nats`, `zmq`; defaults to `zmq` for `file`/`mem` discovery and `nats` for `etcd`/`kubernetes`

KServe gRPC

CLI Argument	Env Var	Default	Description
`--kserve-grpc-server` / `--no-kserve-grpc-server`	`DYN_KSERVE_GRPC_SERVER`	`false`	Start KServe gRPC v2 server
`--grpc-metrics-port`	`DYN_GRPC_METRICS_PORT`	`8788`	HTTP metrics port for gRPC service

See the Frontend Guide for KServe message formats and integration details.

Monitoring

CLI Argument	Env Var	Default	Description
`--metrics-prefix`	`DYN_METRICS_PREFIX`	`dynamo_frontend`	Prefix for frontend Prometheus metrics
`--dump-config-to`	`DYN_DUMP_CONFIG_TO`	—	Dump resolved config to file path

Tokenizer

CLI Argument	Env Var	Default	Description
`--tokenizer`	`DYN_TOKENIZER`	`default`	Tokenizer: `default` (HuggingFace) or `fastokens` (high-performance Rust tokenizer). See Tokenizer

Experimental

CLI Argument	Env Var	Default	Description
`--enable-anthropic-api`	`DYN_ENABLE_ANTHROPIC_API`	`false`	Enable `/v1/messages` (Anthropic Messages API)
`--dyn-chat-processor`	`DYN_CHAT_PROCESSOR`	`dynamo`	Chat processor: `dynamo` (default), `vllm`, or `sglang`. See Parser Configuration for how this combines with the parser flags.
`--dyn-debug-perf`	`DYN_DEBUG_PERF`	`false`	Log per-function timing for preprocessing (vllm processor only)
`--dyn-preprocess-workers`	`DYN_PREPROCESS_WORKERS`	`0`	Worker processes for CPU-bound preprocessing. 0 = main event loop (vllm processor only)
`-i` / `--interactive`	`DYN_INTERACTIVE`	`false`	Interactive text chat mode

HTTP Endpoints

The frontend exposes the following HTTP endpoints:

OpenAI-Compatible

Method	Path	Description
`POST`	`/v1/chat/completions`	Chat completions (streaming and non-streaming)
`POST`	`/v1/completions`	Text completions
`POST`	`/v1/embeddings`	Text embeddings
`POST`	`/v1/responses`	Responses API
`POST`	`/v1/images/generations`	Image generation
`POST`	`/v1/videos/generations`	Video generation
`POST`	`/v1/videos/generations/stream`	Video generation (streaming)
`GET`	`/v1/models`	List available models

Anthropic (Experimental)

Method	Path	Description
`POST`	`/v1/messages`	Anthropic Messages API (requires `--enable-anthropic-api`)
`POST`	`/v1/messages/count_tokens`	Token counting for Anthropic API

Infrastructure

Method	Path	Description
`GET`	`/health`	Health check
`GET`	`/live`	Liveness check
`GET`	`/metrics`	Prometheus metrics
`GET`	`/openapi.json`	OpenAPI specification
`GET`	`/docs`	Swagger UI
`POST`	`/busy_threshold`	Set busy thresholds (gated by `DYN_ENABLE_FRONTEND_ADMIN_API`, see below)
`GET`	`/busy_threshold`	Get current busy thresholds (gated by `DYN_ENABLE_FRONTEND_ADMIN_API`, see below)

Frontend feature switches

Environment variables controlling frontend extensions. Extensions are enabled by default. When deploying, consider whether each is needed for your use case; if not, disable it to prevent accidental abuse.

Set an env value of 0 / false / no / off (case-insensitive) to disable.

Env Var	Default	Behavior when `false`
`DYN_ENABLE_FRONTEND_NVEXT`	`true`	Frontend drops `request.nvext` at handler entry on `/v1/chat/completions`, `/v1/completions`, `/v1/responses`, and `/v1/embeddings`; ignores routing-override headers (`x-worker-instance-id`, `x-prefill-instance-id`, `x-dp-rank`, `x-data-parallel-rank`, `x-prefill-dp-rank`); silently ignores the response-side `nvext.extra_fields` opt-in. Note: disabling this breaks EPP / GAIE serving, Prime-RL-style training that uses `nvext.cache_salt`, multi-tenant agent platforms that forward `nvext.agent_hints` / `nvext.agent_context`, and clients that opt into response disclosure via `nvext.extra_fields`.
`DYN_ENABLE_FRONTEND_ADMIN_API`	`true`	`GET /busy_threshold` and `POST /busy_threshold` are not registered (404 instead of 503). Inference, metrics, models, health, and liveness routes are unaffected.

Endpoint Path Customization

All endpoint paths can be overridden via environment variables:

Env Var	Default Path
`DYN_HTTP_SVC_CHAT_PATH_ENV`	`/v1/chat/completions`
`DYN_HTTP_SVC_CMP_PATH_ENV`	`/v1/completions`
`DYN_HTTP_SVC_EMB_PATH_ENV`	`/v1/embeddings`
`DYN_HTTP_SVC_RESPONSES_PATH_ENV`	`/v1/responses`
`DYN_HTTP_SVC_MODELS_PATH_ENV`	`/v1/models`
`DYN_HTTP_SVC_ANTHROPIC_PATH_ENV`	`/v1/messages`
`DYN_HTTP_SVC_HEALTH_PATH_ENV`	`/health`
`DYN_HTTP_SVC_LIVE_PATH_ENV`	`/live`
`DYN_HTTP_SVC_METRICS_PATH_ENV`	`/metrics`

Deprecated

CLI Argument	Env Var	Description
`--router-durable-kv-events`	`DYN_ROUTER_DURABLE_KV_EVENTS`	Use event-plane local indexer instead

HTTP & Networking

Router

AIC Prefill Load Model

Fault Tolerance

Model Discovery

Infrastructure

KServe gRPC

Monitoring

Tokenizer

Experimental

HTTP Endpoints

OpenAI-Compatible

Anthropic (Experimental)

Infrastructure

Frontend feature switches

Endpoint Path Customization

Deprecated

See Also