For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
Digest
  • Getting Started
    • Quickstart
    • Introduction
    • Local Installation
    • Building from Source
    • Kubernetes Deployment
    • Contribution Guide
  • Resources
    • Support Matrix
    • Feature Matrix
    • Release Artifacts
    • Examples
    • Glossary
  • Digest
    • Dynamo Day 0 support for TokenSpeed
    • Multi-Turn Agentic Harnesses
    • Full-Stack Optimizations for Agentic Inference
    • Flash Indexer: Inter-Galactic KV Routing
  • Kubernetes Deployment
  • User Guides
    • Disaggregated Serving
    • KV Cache Aware Routing
    • KV Cache Offloading
    • Tool Calling
    • Reasoning
    • Agents
      • Agent Tracing
      • Agent Hints
      • Use Pi-Mono with Dynamo
    • Multimodal
    • Diffusion
    • LoRA Adapters
    • Fastokens Tokenizer
    • Observability (Local)
    • Fault Tolerance
    • Benchmarking
    • Writing Python Workers
    • Writing Python Unified Backends
    • Writing Rust Unified Backends
  • Backends
    • SGLang
    • TensorRT-LLM
    • vLLM
  • Components
    • Frontend
    • Router
    • Planner
    • Profiler
    • KVBM
  • Integrations
    • LMCache
    • FlexKV
    • KV Events for Custom Engines
  • Design Docs
    • Overall Architecture
    • Architecture Flow
    • Disaggregated Serving
    • Distributed Runtime
  • Documentation
    • Dynamo Docs Guide
NVIDIANVIDIA
Developer-friendly docs for your API
Privacy Policy | Your Privacy Choices | Terms of Service | Accessibility | Corporate Policies | Product Security | Contact

Copyright © 2026, NVIDIA Corporation.

LogoLogoDocumentation
Digest
On this page
  • Adding trace context to each LLM call
  • Enable output
  • Tool events (ZMQ)
  • Dynamo request_end record
  • Viewing traces in Perfetto
  • [Experimental] Replaying agent traces using agentic Mooncake replay
User GuidesAgents

Agent Tracing

Attach trajectory identity and export Dynamo request and tool-event telemetry

||View as Markdown|
Edit this page
Previous

Agents

Next

Agent Hints

Agent tracing records who called (nvext.agent_context), what Dynamo measured on each LLM request (request_end), and optional harness tool spans (tool_*). Context is passive—it does not steer routing or caching. Output is best-effort profiling data, not an audit log.

Flow: Harness sends chat completions with agent_context → Dynamo emits request_end to trace sinks. Harness sends tool events over ZMQ → same sinks.

Adding trace context to each LLM call

Direct LLM call

Inject agent_context into each LLM request

1{
2 "model": "my-model",
3 "messages": [{ "role": "user", "content": "..." }],
4 "nvext": {
5 "agent_context": {
6 "session_type_id": "deep_research",
7 "session_id": "research-run-42",
8 "trajectory_id": "research-run-42:researcher",
9 "parent_trajectory_id": "research-run-42:planner"
10 }
11 }
12}
FieldRequiredMeaning
session_type_idYesWorkload class (e.g. deep_research).
session_idYesWhole agent run.
trajectory_idYesOne reasoning/tool chain inside the run.
parent_trajectory_idNoParent trajectory when using subagents.

OpenAI client: merge into extra_body / extra_headers:

1import uuid
2
3def instrument_llm_request(kwargs, agent_context):
4 body = dict(kwargs.get("extra_body") or {})
5 nvext = dict(body.get("nvext") or {})
6 nvext["agent_context"] = dict(agent_context)
7 body["nvext"] = nvext
8
9 headers = dict(kwargs.get("extra_headers") or {})
10 headers.setdefault("x-request-id", str(uuid.uuid4()))
11
12 out = dict(kwargs)
13 out["extra_body"] = body
14 out["extra_headers"] = headers
15 return out

x-request-id is your logical per-call id; Dynamo stores it as request.x_request_id (distinct from Dynamo’s internal request_id). No Dynamo imports are required in the harness. Keep context in a contextvar, attach before each completion, and propagate across threads/processes when those paths call the model or emit tools.

Enable output

The fast path is one environment variable:

$export DYN_AGENT_TRACE=1

That picks jsonl_gz output at /tmp/dynamo-agent-trace.*.jsonl.gz and binds the harness tool-event ZMQ endpoint at tcp://127.0.0.1:20390. Any of the per-knob variables below still wins when set explicitly, so you only need to reach for them to relocate output, add stderr, or tune buffers.

To relocate captures only:

$export DYN_AGENT_TRACE=1
$export DYN_AGENT_TRACE_OUTPUT_PATH=/mnt/captures/run-42
All agent trace environment variables
VariableRequiredDefault (when DYN_AGENT_TRACE=1)Notes
DYN_AGENT_TRACEMaster switchunsetTruthy (1, true, on, yes) enables tracing with all defaults below.
DYN_AGENT_TRACE_SINKSNojsonl_gzjsonl, jsonl_gz, stderr, or comma-separated (e.g. jsonl_gz,stderr).
DYN_AGENT_TRACE_OUTPUT_PATHNo/tmp/dynamo-agent-traceFile path for jsonl; segment prefix for jsonl_gz → prefix.NNNNNN.jsonl.gz.
DYN_AGENT_TRACE_CAPACITYNo1024Trace bus capacity.
DYN_AGENT_TRACE_JSONL_BUFFER_BYTESNo1048576Buffer / gzip batch threshold.
DYN_AGENT_TRACE_JSONL_FLUSH_INTERVAL_MSNo1000Flush interval.
DYN_AGENT_TRACE_JSONL_GZ_ROLL_BYTESNo268435456Roll gzip segment by uncompressed bytes.
DYN_AGENT_TRACE_JSONL_GZ_ROLL_LINESNounsetOptional roll by line count.
DYN_AGENT_TRACE_REPLAY_HASHESNoonFalsey (0, no, …) disables replay hashes on requests.
DYN_AGENT_TRACE_TOOL_EVENTS_ZMQ_ENDPOINTNotcp://127.0.0.1:20390PULL bind address for tool records.
DYN_AGENT_TRACE_TOOL_EVENTS_ZMQ_TOPICNounsetIf set, first ZMQ frame must match.

Without DYN_AGENT_TRACE=1, tracing is off; the other variables only take effect once the master switch is on.

Tool events (ZMQ)

Wire format: [topic, seq_be_u64, msgpack(AgentTraceRecord)]. To publish to Dynamo, use a background publisher, bounded queue, monotonic sequence, and PUSH with HWM. Terminal tool_end / tool_error rows should carry timing (started_at_unix_ms, ended_at_unix_ms, duration_ms) even if tool_start was dropped.

Same agent_context as the surrounding LLM calls; tool_call_id unique per trajectory. Join offline on session_id, trajectory_id, tool_call_id.

Example tool_end:

1{
2 "schema": "dynamo.agent.trace.v1",
3 "event_type": "tool_end",
4 "event_time_unix_ms": 1777312801500,
5 "event_source": "harness",
6 "agent_context": {
7 "session_type_id": "deep_research",
8 "session_id": "research-run-42",
9 "trajectory_id": "research-run-42:researcher"
10 },
11 "tool": {
12 "tool_call_id": "call-abc",
13 "tool_class": "web_search",
14 "status": "succeeded",
15 "started_at_unix_ms": 1777312801080,
16 "ended_at_unix_ms": 1777312801500,
17 "duration_ms": 420.5
18 }
19}

Optional tool keys: output_tokens, output_bytes, tool_name_hash, error_type (useful on tool_error). Status values: running, succeeded, error, cancelled; synonyms ok/success, failed, timeout/canceled also deserialize.

Dynamo request_end record

Emitted after the response stream finishes or is dropped. Omitted keys were not recorded on that path; see AgentTraceRecord / AgentRequestMetrics in lib/llm/src/agents/trace/types.rs for the full Rust schema.

1{
2 "schema": "dynamo.agent.trace.v1",
3 "event_type": "request_end",
4 "event_time_unix_ms": 1777312801000,
5 "event_source": "dynamo",
6 "agent_context": {
7 "session_type_id": "deep_research",
8 "session_id": "research-run-42",
9 "trajectory_id": "research-run-42:researcher",
10 "parent_trajectory_id": "research-run-42:planner"
11 },
12 "request": {
13 "request_id": "dynamo-request-id",
14 "x_request_id": "llm-call-42",
15 "model": "my-model",
16 "output_tokens": 16,
17 "replay": {
18 "trace_block_size": 64,
19 "input_length": 128,
20 "input_sequence_hashes": [14879255164371896291, 274632075616497421]
21 }
22 }
23}

By default we do not save the input/ouput payloads. In order to view these, use the built in Dynamo audit_sink functionality.

Audit side-by-side (same gzip/jsonl machinery):

$# enable agent trace sinks
$export DYN_AGENT_TRACE_SINKS=jsonl_gz
$export DYN_AGENT_TRACE_OUTPUT_PATH=/tmp/dynamo-trace
$# enable audit sinks
$export DYN_AUDIT_SINKS=jsonl_gz
$export DYN_AUDIT_OUTPUT_PATH=/tmp/dynamo-audit
$export DYN_AUDIT_FORCE_LOGGING=true

After the run, correlate by id:

$gzip -cd /tmp/dynamo-audit.*.jsonl.gz | jq -c '.event' > /tmp/audit.jsonl
$gzip -cd /tmp/dynamo-trace.*.jsonl.gz | jq -c '.event' > /tmp/trace.jsonl
$jq -s 'group_by(.request_id // .request.request_id)' /tmp/audit.jsonl /tmp/trace.jsonl

The result is a JSONL file where each line wraps the record:

1{
2 "timestamp": 1234,
3 "event": { "schema": "dynamo.agent.trace.v1", "...": "..." }
4}

timestamp is sink-relative elapsed ms; use event.event_time_unix_ms for wall-clock ordering.

Viewing traces in Perfetto

In order to visualize and optimize your agentic graph, we provide a utility to convert the agent trace JSONL files into a Perfetto trace file. We have found this to be extremely useful to pipeline agents that our team writes!

$uv run --no-project python benchmarks/agent_trace/convert_to_perfetto.py \
> "${DYN_AGENT_TRACE_OUTPUT_PATH}".*.jsonl.gz \
> --output "${DYN_AGENT_TRACE_OUTPUT_PATH}.perfetto.json"

Open in Perfetto UI. Flags: --include-markers, --no-stages, --separate-stage-tracks.

[Experimental] Replaying agent traces using agentic Mooncake replay

You can convert a collected agent trace into an agentic Mooncake trace and replay it with python -m dynamo.replay. The converter uses Dynamo request_end rows for request timing, token lengths, worker placement, and replay hashes. It also uses terminal harness tool rows (tool_end / tool_error) to preserve tool-wait time between dependent LLM requests.

$cargo run -p dynamo-bench --bin agent_trace_to_mooncake -- \
> --agentic \
> --input-path "${DYN_AGENT_TRACE_OUTPUT_PATH}".*.jsonl.gz \
> --output-file /tmp/dynamo-agent-trace.agentic-mooncake.jsonl

The binary prints trace_block_size. Use that exact value for replay so hash segmentation matches what Dynamo recorded. Align the mock engine block size with the same number in --extra-engine-args.

$TRACE_BLOCK_SIZE=128
$uv run --no-sync python -m dynamo.replay /tmp/dynamo-agent-trace.agentic-mooncake.jsonl \
> --trace-format agentic_mooncake \
> --trace-block-size "${TRACE_BLOCK_SIZE}" \
> --replay-mode offline \
> --router-mode kv_router \
> --num-workers 4 \
> --extra-engine-args "{\"block_size\":${TRACE_BLOCK_SIZE}}" \
> --report-json /tmp/dynamo-agent-trace.replay-report.json

kv_router needs at least two mock workers; for a single-worker smoke test use --router-mode round_robin --num-workers 1.

Agentic Mooncake rows preserve:

  • request_id: the LLM request row identity.
  • session_id: the Dynamo trajectory_id.
  • wait_for: request ids that must complete before this row becomes eligible.
  • branches: child request ids spawned from this row.
  • prefix_reset: first request in a trajectory.
  • delay: non-tool delay after dependencies finish.
  • tool_wait_ms: tool time after dependencies finish, parallel-aware (the union of overlapping spans rather than their sum).
  • tool_events: per-tool spans attributed to this LLM request, each carrying tool_call_id, tool_class, status, started_at_unix_ms, ended_at_unix_ms, duration_ms, and optional output_bytes / output_tokens / error_type.
  • hash_ids, input_length, and output_length: prompt-prefix and length data for mocker replay.

Rows with no wait_for use their timestamp as the replay start time. Rows with dependencies wait for all listed requests to complete, then wait delay + tool_wait_ms before dispatch. For more flags and engine settings, see Mocker trace replay.

ATIF alignment

Dynamo emits dynamo.agent.trace.v1, not full ATIF logs—but identifiers match ATIF / Harbor so you can join harness trajectories to Dynamo rows on session_id + trajectory_id. Dynamo omits conversational payload by design.

DynamoRole
session_idShared run id
trajectory_idBranch within run
parent_trajectory_idSubagent link
session_type_idProfile / workload type