[R3] Avoid implicit CUDA sync in routed experts DP slicing by zyzshishui · Pull Request #24550 · sgl-project/sglang

zyzshishui · 2026-05-06T21:18:02Z

Motivation

Modifications

Compute the routed-experts DP local slice indices from forward_batch.global_num_tokens_cpu instead of forward_batch.global_num_tokens_gpu.

This keeps the slice bounds as Python integers while preserving the existing indexing semantics:

non-CUDA-graph path: use the prefix sum of per-DP token counts
CUDA graph path: use dp_rank * cuda_graph_batch
DeepEP path remains unchanged, since capture already all-gathers into the head of the per-rank buffer

Accuracy Tests

Speed Tests and Profiling

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.
Follow the SGLang code style guidance.

Review and Merge Process

Ping Merge Oncalls to start the process. See the PR Merge Process.
Get approvals from CODEOWNERS and other reviewers.
Trigger CI tests with comments or contact authorized users to do so.
- Common commands include /tag-and-rerun-ci, /tag-run-ci-label, /rerun-failed-ci
After green CI and required approvals, ask Merge Oncalls or people with Write permission to merge the PR.

gemini-code-assist · 2026-05-06T21:18:06Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: dbefc8d428

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

hnyls2002 · 2026-05-06T23:34:35Z

/rerun-test test_return_routed_experts.py

github-actions · 2026-05-06T23:35:11Z

✅ 4-gpu-h100 (1 test): View workflow run

cd test/ && python3 registered/rl/test_return_routed_experts.py

* main: (894 commits) [Bug Fix] Fix RunAI streamer: corrupted weights, missing quant init, and broken URIs for multimodal models (sgl-project#22715) [Kernel] Deprecate DeepGemm in sgl kernel and apply custom wheel sgl-deep-gemm (sgl-project#24268) propagate pytest exit code from test __main__ entries (sgl-project#24487) [R3] Avoid implicit CUDA sync in routed experts DP slicing (sgl-project#24550) Add ChatCompletionRequest-style support to /v1/tokenize (sgl-project#23981) Support Triton MLA FP8 KV cache (sgl-project#20479) [diffusion] chore: align LTX-2 with official (sgl-project#24313) Expand support matrix for pypi wheel release (sgl-project#24565) [codex] Optimize Z-Image packed QKV (sgl-project#24117) [Misc] Fix breaking weight checker test (sgl-project#24553) [LoRA] Fix qkv_proj LoRA buffer sizing when tp_size > num_key_value_heads (sgl-project#24420) ci: bump test_mimo_models.py est_time 330 → 610 (sgl-project#24551) [CI] Temporarily disable marco/mcdse-2b-v1 in test_embedding_models (sgl-project#24279) Improve metrics, observability, and PD deploy tooling (sgl-project#24521) Fix diffusion fallback guards and validation (sgl-project#23335) [PD] Prevent update_status to Failed from cleared entries (sgl-project#24539) [CP] Register KV cache allgather buffer with symmetric memory (sgl-project#24040) Support getting checksums in weight checker (sgl-project#24537) Refactor buffer patterns in weight checker (sgl-project#24538) Add unit and end-to-end tests for weight checker (sgl-project#24536) ... # Conflicts: # python/sglang/srt/managers/scheduler.py # python/sglang/srt/model_executor/model_runner.py

…ct#24550) Co-authored-by: Byron Hsu <byronhsu1230@gmail.com>

1

dbefc8d

chatgpt-codex-connector Bot reviewed May 6, 2026

View reviewed changes

Comment thread python/sglang/srt/state_capturer/routed_experts.py Outdated

ByronHsu reviewed May 6, 2026

View reviewed changes

Comment thread python/sglang/srt/state_capturer/routed_experts.py Outdated

refactor

3cc732b

zyzshishui requested review from BBuf, Edwardf0t1, Fridge003, HaiShaw, Ying1123, ch-wan, ispobock and merrymercy as code owners May 6, 2026 22:04

ByronHsu reviewed May 6, 2026

View reviewed changes

Comment thread python/sglang/srt/state_capturer/routed_experts.py Outdated

Apply suggestion from @ByronHsu

6c20bf3

ByronHsu approved these changes May 6, 2026

View reviewed changes

hnyls2002 added the run-ci label May 6, 2026

hnyls2002 and others added 2 commits May 6, 2026 16:33

rename helper to slice_cpu

30dbb06

Merge branch 'main' into fix-r3-overlap

4b1eafd

hnyls2002 merged commit 4a279d9 into sgl-project:main May 7, 2026
85 of 101 checks passed

zyzshishui deleted the fix-r3-overlap branch May 7, 2026 02:20

LLThomas pushed a commit to LLThomas/sglang that referenced this pull request May 8, 2026

[R3] Avoid implicit CUDA sync in routed experts DP slicing (sgl-proje…

ab70887

…ct#24550) Co-authored-by: Byron Hsu <byronhsu1230@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[R3] Avoid implicit CUDA sync in routed experts DP slicing#24550

[R3] Avoid implicit CUDA sync in routed experts DP slicing#24550
hnyls2002 merged 5 commits into
sgl-project:mainfrom
zyzshishui:fix-r3-overlap

zyzshishui commented May 6, 2026

Uh oh!

gemini-code-assist Bot commented May 6, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

hnyls2002 commented May 6, 2026

Uh oh!

github-actions Bot commented May 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

zyzshishui commented May 6, 2026

Motivation

Modifications

Accuracy Tests

Speed Tests and Profiling

Checklist

Review and Merge Process

Uh oh!

gemini-code-assist Bot commented May 6, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

hnyls2002 commented May 6, 2026

Uh oh!

github-actions Bot commented May 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants