Profiler Examples
Complete examples for profiling with DGDRs, the interactive WebUI, and direct script usage.
DGDR Examples
Dense Model: AIPerf on Real Engines
Standard online profiling with real GPU measurements:
Dense Model: AI Configurator Simulation
Fast offline profiling (~30 seconds, TensorRT-LLM only):
MoE Model
Multi-node MoE profiling with SGLang:
Using Existing DGD Config (ConfigMap)
Reference a custom DGD configuration via ConfigMap:
Interactive WebUI
Launch an interactive configuration selection interface:
The WebUI launches on port 8000 by default (configurable with --webui-port).
Features
- Interactive Charts: Visualize prefill TTFT, decode ITL, and GPU hours analysis with hover-to-highlight synchronization between charts and tables
- Pareto-Optimal Analysis: The GPU Hours table shows pareto-optimal configurations balancing latency and throughput
- DGD Config Preview: Click “Show Config” on any row to view the corresponding DynamoGraphDeployment YAML
- GPU Cost Estimation: Toggle GPU cost display to convert GPU hours to cost ($/1000 requests)
- SLA Visualization: Red dashed lines indicate your TTFT and ITL targets
Selection Methods
- GPU Hours Table (recommended): Click any row to select both prefill and decode configurations at once based on the pareto-optimal combination
- Individual Selection: Click one row in the Prefill table AND one row in the Decode table to manually choose each
Example DGD Config Output
When you click “Show Config”, you see a DynamoGraphDeployment configuration:
Once you select a configuration, the full DGD CRD is saved as config_with_planner.yaml.
Direct Script Examples
Basic Profiling
With GPU Constraints
AI Configurator (Offline)
SGLang Runtime Profiling
Profile SGLang workers at runtime via HTTP endpoints:
A test script is provided at examples/backends/sglang/test_sglang_profile.py:
View traces using Chrome’s chrome://tracing, Perfetto UI, or TensorBoard.