SGLang Prometheus Metrics
Overview
When running SGLang through Dynamo, SGLang engine metrics are automatically passed through and exposed on Dynamo’s /metrics endpoint (default port 8081). This allows you to access both SGLang engine metrics (prefixed with sglang:) and Dynamo runtime metrics (prefixed with dynamo_*) from a single worker backend endpoint.
For the complete and authoritative list of all SGLang metrics, always refer to the official SGLang Production Metrics documentation.
For Dynamo runtime metrics, see the Dynamo Metrics Guide.
For visualization setup instructions, see the Prometheus and Grafana Setup Guide.
Environment Variables
Getting Started Quickly
This is a single machine example.
Start Observability Stack
For visualizing metrics with Prometheus and Grafana, start the observability stack. See Observability Getting Started for instructions.
Launch Dynamo Components
Launch a frontend and SGLang backend to test metrics:
Wait for the SGLang worker to start, then send requests and check metrics:
Exposed Metrics
SGLang exposes metrics in Prometheus Exposition Format text at the /metrics HTTP endpoint. All SGLang engine metrics use the sglang: prefix and include labels (e.g., model_name, engine_type, tp_rank, pp_rank) to identify the source.
Example Prometheus Exposition Format text:
Note: The specific metrics shown above are examples and may vary depending on your SGLang version. Always inspect your actual /metrics endpoint or refer to the official documentation for the current list.
Metric Categories
SGLang provides metrics in the following categories (all prefixed with sglang:):
- Throughput metrics - Token processing rates
- Resource usage - System resource consumption
- Latency metrics - Request and token latency measurements
- Disaggregation metrics - Metrics specific to disaggregated deployments (when enabled)
Note: Specific metrics are subject to change between SGLang versions. Always refer to the official documentation or inspect the /metrics endpoint for your SGLang version.
Available Metrics
The official SGLang documentation includes complete metric definitions with:
- HELP and TYPE descriptions
- Counter, Gauge, and Histogram metric types
- Metric labels (e.g.,
model_name,engine_type,tp_rank,pp_rank) - Setup guide for Prometheus + Grafana monitoring
- Troubleshooting tips and configuration examples
For the complete and authoritative list of all SGLang metrics, see the official SGLang Production Metrics documentation.
Implementation Details
- SGLang uses multiprocess metrics collection via
prometheus_client.multiprocess.MultiProcessCollector - Metrics are filtered by the
sglang:prefix before being exposed - The integration uses Dynamo’s
register_engine_metrics_callback()function - Metrics appear after SGLang engine initialization completes
Related Documentation
SGLang Metrics
Dynamo Metrics
- Dynamo Metrics Guide - Complete documentation on Dynamo runtime metrics
- Prometheus and Grafana Setup - Visualization setup instructions
- Dynamo runtime metrics (prefixed with
dynamo_*) are available at the same/metricsendpoint alongside SGLang metrics- Implementation:
lib/runtime/src/metrics.rs(Rust runtime metrics) - Metric names:
lib/runtime/src/metrics/prometheus_names.rs(metric name constants) - Integration code:
components/src/dynamo/common/utils/prometheus.py- Prometheus utilities and callback registration
- Implementation: