Multinode Examples
Multi-node sized models
SGLang allows you to deploy multi-node sized models by adding in the dist-init-addr, nnodes, and node-rank arguments. Below we demonstrate and example of deploying DeepSeek R1 for disaggregated serving across 4 nodes. This example requires 4 nodes of 8xH100 GPUs.
Prerequisite: Building the Dynamo container.
You can use a specific tag from the lmsys dockerhub by adding --build-arg SGLANG_IMAGE_TAG=<tag> to the build command.
Step 1: Ensure that your configuration file has the required arguments. Here’s an example configuration that runs prefill and the model in TP16:
Node 1: Run HTTP ingress, processor, and 8 shards of the prefill worker
Node 2: Run the remaining 8 shards of the prefill worker
Node 3: Run the first 8 shards of the decode worker
Node 4: Run the remaining 8 shards of the decode worker
Step 2: Run inference SGLang typically requires a warmup period to ensure the DeepGEMM kernels are loaded. We recommend running a few warmup requests and ensuring that the DeepGEMM kernels load in.