vLLM Prometheus Metrics
📚 Official Documentation: vLLM Metrics Design
This document describes how vLLM Prometheus metrics are exposed in Dynamo.
Overview
When running vLLM through Dynamo, vLLM engine metrics are automatically passed through and exposed on Dynamo’s /metrics endpoint (default port 8081). This allows you to access both vLLM engine metrics (prefixed with vllm:) and Dynamo runtime metrics (prefixed with dynamo_*) from a single worker backend endpoint.
For the complete and authoritative list of all vLLM metrics, always refer to the official documentation linked above.
Dynamo runtime metrics are documented in docs/observability/metrics.md.
Metric Reference
The official documentation includes:
- Complete metric definitions with detailed explanations
- Counter, Gauge, and Histogram metrics
- Metric labels (e.g.,
model_name,finished_reason,scheduling_event) - Design rationale and implementation details
- Information about v1 metrics migration
- Future work and deprecated metrics
Metric Categories
vLLM provides metrics in the following categories (all prefixed with vllm:):
- Request metrics
- Performance metrics
- Resource usage
- Scheduler metrics
- Disaggregation metrics (when enabled)
Note: Specific metrics are subject to change between vLLM versions. Always refer to the official documentation or inspect the /metrics endpoint for your vLLM version.
Enabling Metrics in Dynamo
vLLM metrics are automatically exposed when running vLLM through Dynamo with metrics enabled.
Inspecting Metrics
To see the actual metrics available in your vLLM version:
1. Launch vLLM with Metrics Enabled
Metrics will be available at: http://localhost:8081/metrics
2. Fetch Metrics via curl
3. Example Output
Note: The specific metrics shown below are examples and may vary depending on your vLLM version. Always inspect your actual /metrics endpoint for the current list.
Implementation Details
- vLLM v1 uses multiprocess metrics collection via
prometheus_client.multiprocess PROMETHEUS_MULTIPROC_DIR: (optional). By default, Dynamo automatically manages this environment variable, setting it to a temporary directory where multiprocess metrics are stored as memory-mapped files. Each worker process writes its metrics to separate files in this directory, which are aggregated when/metricsis scraped. Users only need to set this explicitly where complete control over the metrics directory is required.- Dynamo uses
MultiProcessCollectorto aggregate metrics from all worker processes - Metrics are filtered by the
vllm:andlmcache:prefixes before being exposed (when LMCache is enabled) - The integration uses Dynamo’s
register_engine_metrics_callback()function with the globalREGISTRY - Metrics appear after vLLM engine initialization completes
- vLLM v1 metrics are different from v0 - see the official documentation for migration details
See Also
vLLM Metrics
- Official vLLM Metrics Design Documentation
- vLLM Production Metrics User Guide
- vLLM GitHub - Metrics Implementation
Dynamo Metrics
- Dynamo Metrics Guide: See docs/observability/metrics.md for complete documentation on Dynamo runtime metrics
- Dynamo Runtime Metrics: Metrics prefixed with
dynamo_*for runtime, components, endpoints, and namespaces- Implementation:
lib/runtime/src/metrics.rs(Rust runtime metrics) - Metric names:
lib/runtime/src/metrics/prometheus_names.rs(metric name constants) - Available at the same
/metricsendpoint alongside vLLM metrics
- Implementation:
- Integration Code:
components/src/dynamo/common/utils/prometheus.py- Prometheus utilities and callback registration