Agent Hints | NVIDIA Dynamo Documentation

Agent hints are optional per-request metadata that a harness sends under nvext.agent_hints. Dynamo parses these hints in the frontend and passes them to the router and, where supported, backend runtimes.

Use hints only for serving-relevant intent. Use session IDs for passive trace identity.

Request Schema

1 {
2     "model": "my-model",
3     "messages": [
4         { "role": "user", "content": "Continue the report." }
5     ],
6     "nvext": {
7         "agent_hints": {
8             "priority": 5,
9             "strict_priority": 1,
10             "osl": 1024,
11             "speculative_prefill": true
12         }
13     }
14 }

Hint	Description
`priority`	Unified request priority. Higher values mean higher priority at the Dynamo API layer; see Priority Scheduling for router and backend requirements.
`strict_priority`	Router pending-queue tier. Higher values always precede lower values before the configured queue policy is applied.
`osl`	Expected output sequence length in tokens. Used by the router for output block tracking and load-balancing accuracy when `--router-track-output-blocks` is enabled.
`speculative_prefill`	When true, Dynamo can prefill the predicted next-turn prefix after the current turn completes to warm the KV cache for the next request.

Request Flow

The frontend parses nvext.agent_hints, the router uses hints for queueing and worker selection, and supported backends use forwarded hints for engine-level scheduling and cache policy. For priority-specific semantics, see Priority Scheduling.

Backend Support

Backend support is runtime-specific. For SGLang flags and behavior, see SGLang for Agentic Workloads.

Feature	vLLM	SGLang	TensorRT-LLM
Priority-aware routing	Yes	Yes	Yes
Priority-based cache eviction	Planned	Yes	Planned
Speculative prefill	Yes	Yes	Yes

agent_hints is separate from session identity:

Session IDs are passive identity for traces and joins.
agent_hints is active serving intent for routing, scheduling, and cache behavior.

Neither the presence of a session ID nor agent_hints enables sticky sessions. Configure any session-aware routing policy separately.

Request Schema

Request Flow

Backend Support

Related Request Extensions