Metrics Developer Guide

View as Markdown

This guide explains how to create and use custom metrics in Dynamo components using the Dynamo metrics API.

Metrics Exposure

All metrics created via the Dynamo metrics API are automatically exposed on the /metrics HTTP endpoint in Prometheus Exposition Format text when the following environment variable is set:

  • DYN_SYSTEM_PORT=<port> - Port for the metrics endpoint (set to positive value to enable, default: -1 disabled)

Example:

$DYN_SYSTEM_PORT=8081 python -m dynamo.vllm --model <model>

Prometheus Exposition Format text metrics will be available at: http://localhost:8081/metrics

Metric Name Constants

The prometheus_names.rs module provides centralized metric name constants and sanitization functions to ensure consistency across all Dynamo components.


Metrics API in Rust

The metrics API is accessible through the .metrics() method on runtime, namespace, component, and endpoint objects. See Runtime Hierarchy for details on the hierarchical structure.

Available Methods

  • .metrics().create_counter(): Create a counter metric
  • .metrics().create_gauge(): Create a gauge metric
  • .metrics().create_histogram(): Create a histogram metric
  • .metrics().create_countervec(): Create a counter with labels
  • .metrics().create_gaugevec(): Create a gauge with labels
  • .metrics().create_histogramvec(): Create a histogram with labels

Creating Metrics

1use dynamo_runtime::DistributedRuntime;
2
3let runtime = DistributedRuntime::new()?;
4let endpoint = runtime.namespace("my_namespace").component("my_component").endpoint("my_endpoint");
5
6// Simple metrics
7let requests_total = endpoint.metrics().create_counter(
8 "requests_total",
9 "Total requests",
10 &[]
11)?;
12
13let active_connections = endpoint.metrics().create_gauge(
14 "active_connections",
15 "Active connections",
16 &[]
17)?;
18
19let latency = endpoint.metrics().create_histogram(
20 "latency_seconds",
21 "Request latency",
22 &[],
23 Some(vec![0.001, 0.01, 0.1, 1.0, 10.0])
24)?;

Using Metrics

1// Counters
2requests_total.inc();
3
4// Gauges
5active_connections.set(42.0);
6active_connections.inc();
7active_connections.dec();
8
9// Histograms
10latency.observe(0.023); // 23ms

Vector Metrics with Labels

1// Create vector metrics with label names
2let requests_by_model = endpoint.metrics().create_countervec(
3 "requests_by_model",
4 "Requests by model",
5 &["model_type", "model_size"],
6 &[]
7)?;
8
9let memory_by_gpu = endpoint.metrics().create_gaugevec(
10 "gpu_memory_bytes",
11 "GPU memory by device",
12 &["gpu_id", "memory_type"],
13 &[]
14)?;
15
16// Use with specific label values
17requests_by_model.with_label_values(&["llama", "7b"]).inc();
18memory_by_gpu.with_label_values(&["0", "allocated"]).set(8192.0);

Advanced Features

Custom histogram buckets:

1let latency = endpoint.metrics().create_histogram(
2 "latency_seconds",
3 "Request latency",
4 &[],
5 Some(vec![0.001, 0.01, 0.1, 1.0, 10.0])
6)?;

Constant labels:

1let counter = endpoint.metrics().create_counter(
2 "requests_total",
3 "Total requests",
4 &[("region", "us-west"), ("env", "prod")]
5)?;

Metrics API in Python

Python components can create and manage Prometheus metrics using the same metrics API through Python bindings.

Available Methods

  • endpoint.metrics.create_counter() / create_intcounter(): Create a counter metric
  • endpoint.metrics.create_gauge() / create_intgauge(): Create a gauge metric
  • endpoint.metrics.create_histogram(): Create a histogram metric
  • endpoint.metrics.create_countervec() / create_intcountervec(): Create a counter with labels
  • endpoint.metrics.create_gaugevec() / create_intgaugevec(): Create a gauge with labels
  • endpoint.metrics.create_histogramvec(): Create a histogram with labels

All metrics are imported from dynamo.prometheus_metrics.

Creating Metrics

1from dynamo.runtime import DistributedRuntime
2
3drt = DistributedRuntime()
4endpoint = drt.namespace("my_namespace").component("my_component").endpoint("my_endpoint")
5
6# Simple metrics
7requests_total = endpoint.metrics.create_intcounter(
8 "requests_total",
9 "Total requests"
10)
11
12active_connections = endpoint.metrics.create_intgauge(
13 "active_connections",
14 "Active connections"
15)
16
17latency = endpoint.metrics.create_histogram(
18 "latency_seconds",
19 "Request latency",
20 buckets=[0.001, 0.01, 0.1, 1.0, 10.0]
21)

Using Metrics

1# Counters
2requests_total.inc()
3requests_total.inc_by(5)
4
5# Gauges
6active_connections.set(42)
7active_connections.inc()
8active_connections.dec()
9
10# Histograms
11latency.observe(0.023) # 23ms

Vector Metrics with Labels

1# Create vector metrics with label names
2requests_by_model = endpoint.metrics.create_intcountervec(
3 "requests_by_model",
4 "Requests by model",
5 ["model_type", "model_size"]
6)
7
8memory_by_gpu = endpoint.metrics.create_intgaugevec(
9 "gpu_memory_bytes",
10 "GPU memory by device",
11 ["gpu_id", "memory_type"]
12)
13
14# Use with specific label values
15requests_by_model.inc({"model_type": "llama", "model_size": "7b"})
16memory_by_gpu.set(8192, {"gpu_id": "0", "memory_type": "allocated"})

Advanced Features

Constant labels:

1counter = endpoint.metrics.create_intcounter(
2 "requests_total",
3 "Total requests",
4 [("region", "us-west"), ("env", "prod")]
5)

Metric introspection:

1print(counter.name()) # "my_namespace_my_component_my_endpoint_requests_total"
2print(counter.const_labels()) # {"dynamo_namespace": "my_namespace", ...}
3print(gauge_vec.variable_labels()) # ["model_type", "model_size"]

Update patterns:

Background thread updates:

1import threading
2import time
3
4def update_loop():
5 while True:
6 active_connections.set(compute_current_connections())
7 time.sleep(2)
8
9threading.Thread(target=update_loop, daemon=True).start()

Callback-based updates (called before each /metrics scrape):

1def update_metrics():
2 active_connections.set(compute_current_connections())
3
4endpoint.metrics.register_callback(update_metrics)

Examples

Example scripts: lib/bindings/python/examples/metrics/

$cd ~/dynamo/lib/bindings/python/examples/metrics
$DYN_SYSTEM_PORT=8081 ./server_with_loop.py
$DYN_SYSTEM_PORT=8081 ./server_with_callback.py