Enable SGLang Hierarchical Cache (HiCache)
This guide shows how to enable SGLang’s Hierarchical Cache (HiCache) inside Dynamo.
1) Start the SGLang worker with HiCache enabled
- —enable-hierarchical-cache: Enables hierarchical KV cache/offload
- —hicache-ratio: The ratio of the size of host KV cache memory pool to the size of device pool. Lower this number if your machine has less CPU memory.
- —hicache-write-policy: Write policy (e.g.,
write_throughfor synchronous host writes) - —hicache-storage-backend: Host storage backend for HiCache (e.g.,
nixl). NIXL selects the concrete store automatically; see PR #8488
Then, start the frontend:
2) Send a single request
3) (Optional) Benchmarking
Run the perf script: