Agent Tracing
Attach trajectory identity and export Dynamo request and tool-event telemetry
Attach trajectory identity and export Dynamo request and tool-event telemetry
Agent tracing records who called (nvext.agent_context), what Dynamo measured on each LLM request (request_end), and optional harness tool spans (tool_*). Context is passive—it does not steer routing or caching. Output is best-effort profiling data, not an audit log.
Flow: Harness sends chat completions with agent_context → Dynamo emits request_end to trace sinks. Harness sends tool events over ZMQ → same sinks.
Direct LLM call
Inject agent_context into each LLM request
OpenAI client: merge into extra_body / extra_headers:
x-request-id is your logical per-call id; Dynamo stores it as request.x_request_id (distinct from Dynamo’s internal request_id). No Dynamo imports are required in the harness. Keep context in a contextvar, attach before each completion, and propagate across threads/processes when those paths call the model or emit tools.
The fast path is one environment variable:
That picks jsonl_gz output at /tmp/dynamo-agent-trace.*.jsonl.gz and binds
the harness tool-event ZMQ endpoint at tcp://127.0.0.1:20390. Any of the
per-knob variables below still wins when set explicitly, so you only need to
reach for them to relocate output, add stderr, or tune buffers.
To relocate captures only:
Without DYN_AGENT_TRACE=1, tracing is off; the other variables only
take effect once the master switch is on.
Wire format: [topic, seq_be_u64, msgpack(AgentTraceRecord)]. To publish to Dynamo, use a background publisher, bounded queue, monotonic sequence, and PUSH with HWM. Terminal tool_end / tool_error rows should carry timing (started_at_unix_ms, ended_at_unix_ms, duration_ms) even if tool_start was dropped.
Same agent_context as the surrounding LLM calls; tool_call_id unique per trajectory. Join offline on session_id, trajectory_id, tool_call_id.
Example tool_end:
Optional tool keys: output_tokens, output_bytes, tool_name_hash, error_type (useful on tool_error). Status values: running, succeeded, error, cancelled; synonyms ok/success, failed, timeout/canceled also deserialize.
request_end recordEmitted after the response stream finishes or is dropped. Omitted keys were not recorded on that path; see AgentTraceRecord / AgentRequestMetrics in lib/llm/src/agents/trace/types.rs for the full Rust schema.
By default we do not save the input/ouput payloads. In order to view these, use the built in Dynamo audit_sink functionality.
Audit side-by-side (same gzip/jsonl machinery):
After the run, correlate by id:
The result is a JSONL file where each line wraps the record:
timestamp is sink-relative elapsed ms; use event.event_time_unix_ms for wall-clock ordering.
In order to visualize and optimize your agentic graph, we provide a utility to convert the agent trace JSONL files into a Perfetto trace file. We have found this to be extremely useful to pipeline agents that our team writes!
Open in Perfetto UI. Flags: --include-markers, --no-stages, --separate-stage-tracks.
You can convert a collected agent trace into an agentic Mooncake trace and replay it with
python -m dynamo.replay. The converter uses Dynamo request_end rows for request timing, token
lengths, worker placement, and replay hashes. It also uses terminal harness tool rows
(tool_end / tool_error) to preserve tool-wait time between dependent LLM requests.
The binary prints trace_block_size. Use that exact value for replay so hash segmentation
matches what Dynamo recorded. Align the mock engine block size with the same number in
--extra-engine-args.
kv_router needs at least two mock workers; for a single-worker smoke test use
--router-mode round_robin --num-workers 1.
Agentic Mooncake rows preserve:
request_id: the LLM request row identity.session_id: the Dynamo trajectory_id.wait_for: request ids that must complete before this row becomes eligible.branches: child request ids spawned from this row.prefix_reset: first request in a trajectory.delay: non-tool delay after dependencies finish.tool_wait_ms: tool time after dependencies finish, parallel-aware (the union
of overlapping spans rather than their sum).tool_events: per-tool spans attributed to this LLM request, each carrying
tool_call_id, tool_class, status, started_at_unix_ms, ended_at_unix_ms,
duration_ms, and optional output_bytes / output_tokens / error_type.hash_ids, input_length, and output_length: prompt-prefix and length data for mocker replay.Rows with no wait_for use their timestamp as the replay start time. Rows with dependencies wait
for all listed requests to complete, then wait delay + tool_wait_ms before dispatch. For more
flags and engine settings, see Mocker trace replay.