Metrics Developer Guide | NVIDIA Dynamo Documentation

This guide explains how to create and use custom metrics in Dynamo components using the Dynamo metrics API.

Metrics Exposure

All metrics created via the Dynamo metrics API are automatically exposed on the /metrics HTTP endpoint in Prometheus Exposition Format text when the following environment variable is set:

DYN_SYSTEM_PORT=<port> - Port for the metrics endpoint (set to positive value to enable, default: -1 disabled)

Example:

$ DYN_SYSTEM_PORT=8081 python -m dynamo.vllm --model <model>

Prometheus Exposition Format text metrics will be available at: http://localhost:8081/metrics

Metric Name Constants

The prometheus_names.rs module provides centralized metric name constants and sanitization functions to ensure consistency across all Dynamo components.

Metrics API in Rust

The metrics API is accessible through the .metrics() method on runtime, namespace, component, and endpoint objects. See Runtime Hierarchy for details on the hierarchical structure.

Available Methods

.metrics().create_counter(): Create a counter metric
.metrics().create_gauge(): Create a gauge metric
.metrics().create_histogram(): Create a histogram metric
.metrics().create_countervec(): Create a counter with labels
.metrics().create_gaugevec(): Create a gauge with labels
.metrics().create_histogramvec(): Create a histogram with labels

Creating Metrics

1 use dynamo_runtime::DistributedRuntime;
2 
3 let runtime = DistributedRuntime::new()?;
4 let endpoint = runtime.namespace("my_namespace").component("my_component").endpoint("my_endpoint");
5 
6 // Simple metrics
7 let requests_total = endpoint.metrics().create_counter(
8     "requests_total",
9     "Total requests",
10     &[]
11 )?;
12 
13 let active_connections = endpoint.metrics().create_gauge(
14     "active_connections",
15     "Active connections",
16     &[]
17 )?;
18 
19 let latency = endpoint.metrics().create_histogram(
20     "latency_seconds",
21     "Request latency",
22     &[],
23     Some(vec![0.001, 0.01, 0.1, 1.0, 10.0])
24 )?;

Using Metrics

1 // Counters
2 requests_total.inc();
3 
4 // Gauges
5 active_connections.set(42.0);
6 active_connections.inc();
7 active_connections.dec();
8 
9 // Histograms
10 latency.observe(0.023);  // 23ms

Vector Metrics with Labels

1 // Create vector metrics with label names
2 let requests_by_model = endpoint.metrics().create_countervec(
3     "requests_by_model",
4     "Requests by model",
5     &["model_type", "model_size"],
6     &[]
7 )?;
8 
9 let memory_by_gpu = endpoint.metrics().create_gaugevec(
10     "gpu_memory_bytes",
11     "GPU memory by device",
12     &["gpu_id", "memory_type"],
13     &[]
14 )?;
15 
16 // Use with specific label values
17 requests_by_model.with_label_values(&["llama", "7b"]).inc();
18 memory_by_gpu.with_label_values(&["0", "allocated"]).set(8192.0);

Advanced Features

Custom histogram buckets:

1 let latency = endpoint.metrics().create_histogram(
2     "latency_seconds",
3     "Request latency",
4     &[],
5     Some(vec![0.001, 0.01, 0.1, 1.0, 10.0])
6 )?;

Constant labels:

1 let counter = endpoint.metrics().create_counter(
2     "requests_total",
3     "Total requests",
4     &[("region", "us-west"), ("env", "prod")]
5 )?;

Metrics API in Python

Python components can create and manage Prometheus metrics using the same metrics API through Python bindings.

Available Methods

endpoint.metrics.create_counter() / create_intcounter(): Create a counter metric
endpoint.metrics.create_gauge() / create_intgauge(): Create a gauge metric
endpoint.metrics.create_histogram(): Create a histogram metric
endpoint.metrics.create_countervec() / create_intcountervec(): Create a counter with labels
endpoint.metrics.create_gaugevec() / create_intgaugevec(): Create a gauge with labels
endpoint.metrics.create_histogramvec(): Create a histogram with labels

All metrics are imported from dynamo.prometheus_metrics.

Creating Metrics

1 from dynamo.runtime import DistributedRuntime
2 
3 drt = DistributedRuntime()
4 endpoint = drt.namespace("my_namespace").component("my_component").endpoint("my_endpoint")
5 
6 # Simple metrics
7 requests_total = endpoint.metrics.create_intcounter(
8     "requests_total",
9     "Total requests"
10 )
11 
12 active_connections = endpoint.metrics.create_intgauge(
13     "active_connections",
14     "Active connections"
15 )
16 
17 latency = endpoint.metrics.create_histogram(
18     "latency_seconds",
19     "Request latency",
20     buckets=[0.001, 0.01, 0.1, 1.0, 10.0]
21 )

Using Metrics

1 # Counters
2 requests_total.inc()
3 requests_total.inc_by(5)
4 
5 # Gauges
6 active_connections.set(42)
7 active_connections.inc()
8 active_connections.dec()
9 
10 # Histograms
11 latency.observe(0.023)  # 23ms

Vector Metrics with Labels

1 # Create vector metrics with label names
2 requests_by_model = endpoint.metrics.create_intcountervec(
3     "requests_by_model",
4     "Requests by model",
5     ["model_type", "model_size"]
6 )
7 
8 memory_by_gpu = endpoint.metrics.create_intgaugevec(
9     "gpu_memory_bytes",
10     "GPU memory by device",
11     ["gpu_id", "memory_type"]
12 )
13 
14 # Use with specific label values
15 requests_by_model.inc({"model_type": "llama", "model_size": "7b"})
16 memory_by_gpu.set(8192, {"gpu_id": "0", "memory_type": "allocated"})

Advanced Features

Constant labels:

1 counter = endpoint.metrics.create_intcounter(
2     "requests_total",
3     "Total requests",
4     [("region", "us-west"), ("env", "prod")]
5 )

Metric introspection:

1 print(counter.name())            # "my_namespace_my_component_my_endpoint_requests_total"
2 print(counter.const_labels())    # {"dynamo_namespace": "my_namespace", ...}
3 print(gauge_vec.variable_labels())  # ["model_type", "model_size"]

Update patterns:

Background thread updates:

1 import threading
2 import time
3 
4 def update_loop():
5     while True:
6         active_connections.set(compute_current_connections())
7         time.sleep(2)
8 
9 threading.Thread(target=update_loop, daemon=True).start()

Callback-based updates (called before each /metrics scrape):

1 def update_metrics():
2     active_connections.set(compute_current_connections())
3 
4 endpoint.metrics.register_callback(update_metrics)

Examples

Example scripts: lib/bindings/python/examples/metrics/

$ cd ~/dynamo/lib/bindings/python/examples/metrics
$ DYN_SYSTEM_PORT=8081 ./server_with_loop.py
$ DYN_SYSTEM_PORT=8081 ./server_with_callback.py

Metrics Exposure

Metric Name Constants

Metrics API in Rust

Available Methods

Creating Metrics

Using Metrics

Vector Metrics with Labels

Advanced Features

Metrics API in Python

Available Methods

Creating Metrics

Using Metrics

Vector Metrics with Labels

Advanced Features

Examples

Related Documentation