Feat/dp attn 0514 by shiyu7 · Pull Request #490 · bytedance-iaas/sglang

shiyu7 · 2026-05-14T09:33:34Z

Motivation

Modifications

Accuracy Tests

Speed Tests and Profiling

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.
Follow the SGLang code style guidance.

Review and Merge Process

Ping Merge Oncalls to start the process. See the PR Merge Process.
Get approvals from CODEOWNERS and other reviewers.
Trigger CI tests with comments or contact authorized users to do so.
- Common commands include /tag-and-rerun-ci, /tag-run-ci-label, /rerun-failed-ci
After green CI and required approvals, ask Merge Oncalls or people with Write permission to merge the PR.

gemini-code-assist

Code Review

This pull request updates the distributed communication logic in communicator.py and deepseek_v4.py to handle DP buffer group selection more dynamically. Specifically, it introduces a check to determine whether to use the tensor parallel group or the attention tensor parallel group based on the relationship between the TP world size and the attention DP size. Feedback was provided to ensure that the group selection logic in the DeepSeek-V4 model remains consistent with the implementation in the layer communicator to avoid potential issues with symmetric memory allocation.

gemini-code-assist · 2026-05-14T09:36:00Z

+            hidden_states, global_hidden_states = (
+                get_local_dp_buffer(get_attention_tp_group()),
+                hidden_states,
+            )


For consistency with the logic in LayerCommunicator and to ensure correct symmetric memory allocation when the tensor parallel size equals the attention data parallel size, the group for the local DP buffer should be selected based on whether tp_size == dp_size. This pattern is followed in communicator.py for all dp_scatter operations.

Suggested change

hidden_states, global_hidden_states = (

get_local_dp_buffer(get_attention_tp_group()),

hidden_states,

)

if get_tensor_model_parallel_world_size() == get_attention_dp_size():

group = get_tp_group()

else:

group = get_attention_tp_group()

hidden_states, global_hidden_states = (

get_local_dp_buffer(group),

hidden_states,

)

fix: fix get_global_dp_buffer function param

15adcc0

github-actions Bot added the deepseek label May 14, 2026

zhangxiaolei123456 merged commit 5ec497c into bytedance-iaas:main_deepseek_dp_attention May 14, 2026
1 check passed

gemini-code-assist Bot reviewed May 14, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat/dp attn 0514#490

Feat/dp attn 0514#490
zhangxiaolei123456 merged 1 commit into
bytedance-iaas:main_deepseek_dp_attentionfrom
shiyu7:feat/dp_attn_0514

shiyu7 commented May 14, 2026

Uh oh!

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot May 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

shiyu7 commented May 14, 2026

Motivation

Modifications

Accuracy Tests

Speed Tests and Profiling

Checklist

Review and Merge Process

Uh oh!

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 14, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants