Add new LHS datapoint for SGL-GB200-FP8-1k1k#148
Conversation
|
Caution Review failedThe pull request is closed. 📝 WalkthroughWalkthroughAdds a new YAML config Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant DynamoFE as Dynamo Frontends (3)
participant Prefill as Prefill Worker
participant Decode as Decode Worker
participant Nixl as Disaggregation Backend (nixl)
participant Model as Model Replica
participant Bench as sa-bench
Client->>DynamoFE: request (inference)
DynamoFE->>Prefill: route prefill stage (sglang_config.prefill)
Prefill->>Nixl: dispatch prefill ops (disaggregation: prefill)
Nixl->>Model: fetch/compute prefill tensors
Model-->>Nixl: prefill results (KV cache)
Nixl-->>Prefill: return populated cache
Prefill-->>DynamoFE: prefill complete
DynamoFE->>Decode: start decode stage (sglang_config.decode)
Decode->>Nixl: dispatch decode ops (disaggregation: decode)
Nixl->>Model: decode token steps (CUDA-graph / DeepEP / MoE)
Model-->>Nixl: decode token outputs
Nixl-->>Decode: token results
Decode-->>DynamoFE: final response
Bench->>DynamoFE: benchmark traffic (req_rate / concurrencies)
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@recipes/gb200-fp8/1k1k/ultra-tpt.yaml`:
- Line 163: The YAML uses the deprecated key prefill-round-robin-balance; remove
that key and instead add router-based load‑balancing settings under the Model
Gateway / Router section—for example set prefill_policy: round_robin and specify
decode_policy (or other documented router policies) in the router configuration
for the model gateway/router entry; update any related routing blocks that
reference prefill-round-robin-balance to use the new prefill_policy and
decode_policy keys so the gateway/router enforces round‑robin prefill behavior.
- Around line 6-18: The YAML uses a third-party SGLang image in model.container
which may mismatch the declared dynamo.version (0.8.1); update model.container
to the official NVIDIA Dynamo SGLang runtime that matches Dynamo v0.8.1 (for
example nvcr.io/nvidia/ai-dynamo/sglang-runtime:0.8.1) so model.container aligns
with dynamo.version and the SGLang runtime bundled with that Dynamo release,
keeping model.path and precision unchanged.
🧹 Nitpick comments (1)
recipes/gb200-fp8/1k1k/ultra-tpt.yaml (1)
77-77: Minor: Trailing whitespace.Line 77 has trailing whitespace after
ep-size: 8.🧹 Proposed fix
- ep-size: 8 + ep-size: 8
| dynamo: | ||
| version: 0.8.1 | ||
|
|
||
| frontend: | ||
| type: dynamo | ||
| enable_multiple_frontends: true | ||
| num_additional_frontends: 3 | ||
| nginx_container: nginx | ||
|
|
||
| model: | ||
| path: "dsr1-fp8" | ||
| container: "lmsysorg/sglang:v0.5.8-cu130" | ||
| precision: "fp8" |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, check if the file exists and read the relevant lines
head -20 recipes/gb200-fp8/1k1k/ultra-tpt.yamlRepository: ishandhanani/srt-slurm
Length of output: 442
🏁 Script executed:
# Check Docker Hub for the sglang container image tag
curl -s "https://hub.docker.com/v2/repositories/lmsysorg/sglang/tags/?page_size=100&name=v0.5.8-cu130" | jq '.results[] | select(.name=="v0.5.8-cu130") | {name, tag_status}'Repository: ishandhanani/srt-slurm
Length of output: 120
🌐 Web query:
dynamo version 0.8.1 release
💡 Result:
“Dynamo 0.8.1” most commonly refers to Autodesk Dynamo (for Revit):
- Release date: July 3, 2015. [1]
- Highlights: new hardware-accelerated graphics pipeline + colored geometry (e.g.,
Display.ByGeometryColor), Groups for graph organization, more robust Excel read/write, list/function updates (e.g.,List.Clean, improvedIFlacing), UI search/browse improvements, localization, and Revit node additions (e.g.,FamilyInstance.SetRotation). [1][2]
If instead you meant NVIDIA Dynamo (AI inference platform), there is also a v0.8.1 line, but it’s a different product. [3][4]
Sources: [1] [2] [3] [4]
🏁 Script executed:
# Search for dynamo references in the repository
rg "dynamo" --type yaml --type markdown --type txt -A 2 -B 2 | head -100Repository: ishandhanani/srt-slurm
Length of output: 5985
🌐 Web query:
sglang dynamo 0.8.1 GB200
💡 Result:
NVIDIA Dynamo v0.8.1 includes an SGLang backend/runtime that explicitly lists GB200 support = ✅ (and WideEP full support on H100s/GB200). [1]
For Dynamo 0.8.1 (release date Jan 23, 2026), NVIDIA publishes these relevant artifacts: [2]
- Container (recommended):
nvcr.io/nvidia/ai-dynamo/sglang-runtime:0.8.1(CUDA 12.9), which bundles SGLang v0.5.6.post2. [2] - Experimental CUDA 13 container:
nvcr.io/nvidia/ai-dynamo/sglang-runtime:0.8.1-cuda13(CUDA 13.0), also SGLang v0.5.6.post2. [2] - PyPI wheel:
ai-dynamo==0.8.1. [2]
If you meant SGLang (pip) directly on B200/GB200 without Dynamo, there have been compatibility issues discussed (e.g., PyTorch CUDA capability support for B200, sm_100). [3]
🏁 Script executed:
# Check if there's documentation or README about dynamo
fd -e md -e rst -e txt | xargs grep -l "dynamo" 2>/dev/null | head -10Repository: ishandhanani/srt-slurm
Length of output: 210
Use official NVIDIA Dynamo container for version alignment.
The container image lmsysorg/sglang:v0.5.8-cu130 exists and is active on Docker Hub, and Dynamo v0.8.1 (released Jan 23, 2026) includes explicit GB200 support. However, consider using the official NVIDIA container nvcr.io/nvidia/ai-dynamo/sglang-runtime:0.8.1 instead, which aligns with Dynamo v0.8.1 and bundles the tested SGLang v0.5.6.post2. Using a third-party SGLang image may introduce version skew between your Dynamo and SGLang installations.
🤖 Prompt for AI Agents
In `@recipes/gb200-fp8/1k1k/ultra-tpt.yaml` around lines 6 - 18, The YAML uses a
third-party SGLang image in model.container which may mismatch the declared
dynamo.version (0.8.1); update model.container to the official NVIDIA Dynamo
SGLang runtime that matches Dynamo v0.8.1 (for example
nvcr.io/nvidia/ai-dynamo/sglang-runtime:0.8.1) so model.container aligns with
dynamo.version and the SGLang runtime bundled with that Dynamo release, keeping
model.path and precision unchanged.
| ep-dispatch-algorithm: "static" | ||
| moe-dense-tp-size: 1 | ||
| enable-dp-lm-head: true | ||
| prefill-round-robin-balance: true |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, locate and examine the file to see the context around line 163
find . -name "ultra-tpt.yaml" -type fRepository: ishandhanani/srt-slurm
Length of output: 105
🌐 Web query:
SGLang prefill-round-robin-balance configuration flag documentation
💡 Result:
--prefill-round-robin-balance is documented in SGLang’s Server Arguments page under Deprecated arguments, with the note that it “is deprecated now” (no further behavior/details are provided in the docs). [1]
If your goal is round-robin balancing for prefill in Prefill/Decode (PD) disaggregation, the current documented way is to set the router’s load-balancing policy (e.g., prefill_policy: round_robin, decode_policy: ...) via the Model Gateway / Router configuration and its round_robin policy. [2]
Sources:
[1] SGLang docs — Server Arguments (Deprecated arguments list) https://docs.sglang.io/advanced_features/server_arguments.html
[2] SGLang docs — Model Gateway / Router (Load Balancing Policies, PD config knobs) https://docs.sglang.io/advanced_features/router.html
🏁 Script executed:
cat -n ./recipes/gb200-fp8/1k1k/ultra-tpt.yaml | sed -n '150,175p'Repository: ishandhanani/srt-slurm
Length of output: 1163
Replace deprecated prefill-round-robin-balance with router-based load-balancing configuration.
The flag prefill-round-robin-balance is deprecated in SGLang. The modern approach is to configure load-balancing policies via the router configuration (e.g., prefill_policy: round_robin, decode_policy: ...) in the Model Gateway / Router section instead. Remove this flag and migrate to the documented router-based policy configuration.
🤖 Prompt for AI Agents
In `@recipes/gb200-fp8/1k1k/ultra-tpt.yaml` at line 163, The YAML uses the
deprecated key prefill-round-robin-balance; remove that key and instead add
router-based load‑balancing settings under the Model Gateway / Router
section—for example set prefill_policy: round_robin and specify decode_policy
(or other documented router policies) in the router configuration for the model
gateway/router entry; update any related routing blocks that reference
prefill-round-robin-balance to use the new prefill_policy and decode_policy keys
so the gateway/router enforces round‑robin prefill behavior.
…p8-1k1k Add new LHS datapoint for SGL-GB200-FP8-1k1k
Summary by CodeRabbit