vLLM Prometheus Metrics
Overview
When running vLLM through Dynamo, vLLM engine metrics are automatically passed through and exposed on Dynamo’s /metrics endpoint (default port 8081). This allows you to access both vLLM engine metrics (prefixed with vllm:) and Dynamo runtime metrics (prefixed with dynamo_*) from a single worker backend endpoint.
For the complete and authoritative list of all vLLM metrics, always refer to the official vLLM Metrics Design documentation.
For LMCache metrics and integration, see the LMCache Integration Guide.
For Dynamo runtime metrics, see the Dynamo Metrics Guide.
For visualization setup instructions, see the Prometheus and Grafana Setup Guide.
Environment Variables and Flags
Getting Started Quickly
This is a single machine example.
Start Observability Stack
For visualizing metrics with Prometheus and Grafana, start the observability stack. See Observability Getting Started for instructions.
Launch Dynamo Components
Launch a frontend and vLLM backend to test metrics:
Wait for the vLLM worker to start, then send requests and check metrics:
Exposed Metrics
vLLM exposes metrics in Prometheus Exposition Format text at the /metrics HTTP endpoint. All vLLM engine metrics use the vllm: prefix and include labels (e.g., model_name, finished_reason, scheduling_event) to identify the source.
Example Prometheus Exposition Format text:
Note: The specific metrics shown above are examples and may vary depending on your vLLM version. Always inspect your actual /metrics endpoint or refer to the official documentation for the current list.
Metric Categories
vLLM provides metrics in the following categories (all prefixed with vllm:):
- Request metrics - Request success, failure, and completion tracking
- Performance metrics - Latency, throughput, and timing measurements
- Resource usage - System resource consumption
- Scheduler metrics - Scheduling and queue management
- Disaggregation metrics - Metrics specific to disaggregated deployments (when enabled)
Note: Specific metrics are subject to change between vLLM versions. Always refer to the official documentation or inspect the /metrics endpoint for your vLLM version.
Available Metrics
The official vLLM documentation includes complete metric definitions with:
- Detailed explanations and design rationale
- Counter, Gauge, and Histogram metric types
- Metric labels (e.g.,
model_name,finished_reason,scheduling_event) - Information about v1 metrics migration
- Future work and deprecated metrics
For the complete and authoritative list of all vLLM metrics, see the official vLLM Metrics Design documentation.
LMCache Metrics
When LMCache is enabled with --connector lmcache and DYN_SYSTEM_PORT is set, LMCache metrics (prefixed with lmcache:) are automatically exposed via Dynamo’s /metrics endpoint alongside vLLM and Dynamo metrics.
Minimum Requirements
To access LMCache metrics, both of these are required:
--connector lmcache- Enables LMCache in vLLMDYN_SYSTEM_PORT=8081- Enables Dynamo’s metrics HTTP endpoint
Example:
Viewing LMCache Metrics
Troubleshooting
Troubleshooting LMCache-related metrics and logs (including PrometheusLogger instance already created with different metadata and PROMETHEUS_MULTIPROC_DIR warnings) is documented in:
For complete LMCache configuration and metric details, see:
- LMCache Integration Guide - Setup and configuration
- LMCache Observability Documentation - Complete metrics reference
Implementation Details
- vLLM v1 uses multiprocess metrics collection via
prometheus_client.multiprocess PROMETHEUS_MULTIPROC_DIR: (optional). By default, Dynamo automatically manages this environment variable, setting it to a temporary directory where multiprocess metrics are stored as memory-mapped files. Each worker process writes its metrics to separate files in this directory, which are aggregated when/metricsis scraped. Users only need to set this explicitly where complete control over the metrics directory is required.- Dynamo uses
MultiProcessCollectorto aggregate metrics from all worker processes - Metrics are filtered by the
vllm:andlmcache:prefixes before being exposed (when LMCache is enabled) - The integration uses Dynamo’s
register_engine_metrics_callback()function with the globalREGISTRY - Metrics appear after vLLM engine initialization completes
- vLLM v1 metrics are different from v0 - see the official documentation for migration details
Related Documentation
vLLM Metrics
- Official vLLM Metrics Design Documentation
- vLLM Production Metrics User Guide
- vLLM GitHub - Metrics Implementation
Dynamo Metrics
- Dynamo Metrics Guide - Complete documentation on Dynamo runtime metrics
- Prometheus and Grafana Setup - Visualization setup instructions
- Dynamo runtime metrics (prefixed with
dynamo_*) are available at the same/metricsendpoint alongside vLLM metrics- Implementation:
lib/runtime/src/metrics.rs(Rust runtime metrics) - Metric names:
lib/runtime/src/metrics/prometheus_names.rs(metric name constants) - Integration code:
components/src/dynamo/common/utils/prometheus.py- Prometheus utilities and callback registration
- Implementation: