SGLang Prometheus Metrics
📚 Official Documentation: SGLang Production Metrics
This document describes how SGLang Prometheus metrics are exposed in Dynamo.
Overview
When running SGLang through Dynamo, SGLang engine metrics are automatically passed through and exposed on Dynamo’s /metrics endpoint (default port 8081). This allows you to access both SGLang engine metrics (prefixed with sglang:) and Dynamo runtime metrics (prefixed with dynamo_*) from a single worker backend endpoint.
For the complete and authoritative list of all SGLang metrics, always refer to the official documentation linked above.
Dynamo runtime metrics are documented in docs/observability/metrics.md.
Metric Reference
The official documentation includes:
- Complete metric definitions with HELP and TYPE descriptions
- Example metric output in Prometheus exposition format
- Counter, Gauge, and Histogram metrics
- Metric labels (e.g.,
model_name,engine_type,tp_rank,pp_rank) - Setup guide for Prometheus + Grafana monitoring
- Troubleshooting tips and configuration examples
Metric Categories
SGLang provides metrics in the following categories (all prefixed with sglang:):
- Throughput metrics
- Resource usage
- Latency metrics
- Disaggregation metrics (when enabled)
Note: Specific metrics are subject to change between SGLang versions. Always refer to the official documentation or inspect the /metrics endpoint for your SGLang version.
Enabling Metrics in Dynamo
SGLang metrics are automatically exposed when running SGLang through Dynamo with metrics enabled.
Inspecting Metrics
To see the actual metrics available in your SGLang version:
1. Launch SGLang with Metrics Enabled
Metrics will be available at: http://localhost:8081/metrics
2. Fetch Metrics via curl
3. Example Output
Note: The specific metrics shown below are examples and may vary depending on your SGLang version. Always inspect your actual /metrics endpoint for the current list.
Implementation Details
- SGLang uses multiprocess metrics collection via
prometheus_client.multiprocess.MultiProcessCollector - Metrics are filtered by the
sglang:prefix before being exposed - The integration uses Dynamo’s
register_engine_metrics_callback()function - Metrics appear after SGLang engine initialization completes
See Also
SGLang Metrics
Dynamo Metrics
- Dynamo Metrics Guide: See docs/observability/metrics.md for complete documentation on Dynamo runtime metrics
- Dynamo Runtime Metrics: Metrics prefixed with
dynamo_*for runtime, components, endpoints, and namespaces- Implementation:
lib/runtime/src/metrics.rs(Rust runtime metrics) - Metric names:
lib/runtime/src/metrics/prometheus_names.rs(metric name constants) - Available at the same
/metricsendpoint alongside SGLang metrics
- Implementation:
- Integration Code:
components/src/dynamo/common/utils/prometheus.py- Prometheus utilities and callback registration