Skip to content

[NPU] fix normal DeepEP mode num_tokens_per_rdma_rank error caused by none#22972

Open
ZeyuanChen2000 wants to merge 2 commits into
sgl-project:mainfrom
ZeyuanChen2000:sgl-deepep
Open

[NPU] fix normal DeepEP mode num_tokens_per_rdma_rank error caused by none#22972
ZeyuanChen2000 wants to merge 2 commits into
sgl-project:mainfrom
ZeyuanChen2000:sgl-deepep

Conversation

@ZeyuanChen2000
Copy link
Copy Markdown
Contributor

Motivation

If expert_distribution_recorder_mode = "per_token", attributeError is reported in the on_deepep_dispatch_normal method of the expert distribution recorder when the NPU is deployed on a single server.
image

Modifications

A check for the None value of num_tokens_per_rdma_rank is added to ensure that the RDMA parameter is correctly processed in the single-node environment.

Accuracy Tests

Speed Tests and Profiling

Checklist

Review and Merge Process

  1. Ping Merge Oncalls to start the process. See the PR Merge Process.
  2. Get approvals from CODEOWNERS and other reviewers.
  3. Trigger CI tests with comments or contact authorized users to do so.
    • Common commands include /tag-and-rerun-ci, /tag-run-ci-label, /rerun-failed-ci
  4. After green CI and required approvals, ask Merge Oncalls or people with Write permission to merge the PR.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds a null check for num_tokens_per_rdma_rank in the on_deepep_dispatch_normal function to prevent potential attribute errors in environments where RDMA is not initialized. I have no feedback to provide.

@sglang-npu-bot
Copy link
Copy Markdown
Collaborator

/tag-and-rerun-ci

@ZeyuanChen2000 ZeyuanChen2000 changed the title [NPU] fix normal DeepEP mode num_tokens_per_rdma_rank error caused by… [NPU] fix normal DeepEP mode num_tokens_per_rdma_rank error caused by None Apr 27, 2026
@ZeyuanChen2000 ZeyuanChen2000 changed the title [NPU] fix normal DeepEP mode num_tokens_per_rdma_rank error caused by None [NPU] fix normal DeepEP mode num_tokens_per_rdma_rank error caused by none Apr 27, 2026
@ZeyuanChen2000
Copy link
Copy Markdown
Contributor Author

/rerun-failed-ci

2 similar comments
@ZeyuanChen2000
Copy link
Copy Markdown
Contributor Author

/rerun-failed-ci

@ZeyuanChen2000
Copy link
Copy Markdown
Contributor Author

/rerun-failed-ci

@ZeyuanChen2000
Copy link
Copy Markdown
Contributor Author

/rerun-failed-ci

1 similar comment
@ZeyuanChen2000
Copy link
Copy Markdown
Contributor Author

/rerun-failed-ci

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants