Skip to content

[Bugfix] Fix the bug in initializing the shared_weight communication domain in sfa-cp, and fix the mtp weight load in pp>1 situation#4913

Merged
jianzs merged 3 commits intovllm-project:mainfrom
zzhx1:flashcomm_oshared
Dec 15, 2025
Merged

[Bugfix] Fix the bug in initializing the shared_weight communication domain in sfa-cp, and fix the mtp weight load in pp>1 situation#4913
jianzs merged 3 commits intovllm-project:mainfrom
zzhx1:flashcomm_oshared

Conversation

@zzhx1
Copy link
Copy Markdown
Contributor

@zzhx1 zzhx1 commented Dec 11, 2025

What this PR does / why we need it?

In PR #4188, a small bug was introduced that caused sfa-cp to be unable to find the global_pp_size parameter during initialization, and this PR fixed the issue.

Does this PR introduce any user-facing change?

How was this patch tested?

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request correctly fixes a bug where initializing the shared_weight communication domain would fail due to undefined variables. The change moves the variable definitions to a higher scope, ensuring they are always available when needed. While this fixes the immediate bug, I've identified a potential resource leak related to the _SHARED_WEIGHT group initialization that this change makes more likely to occur. Please see my detailed comment.

Comment thread vllm_ascend/distributed/parallel_state.py Outdated
@github-actions
Copy link
Copy Markdown
Contributor

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

  • A PR should do only one thing, smaller PRs enable faster reviews.
  • Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
  • Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

@zzhx1 zzhx1 force-pushed the flashcomm_oshared branch from efcb21d to 5cc0f2c Compare December 11, 2025 10:07
@ApsarasX ApsarasX added ready read for review ready-for-test start test by label for PR labels Dec 11, 2025
@zzhx1 zzhx1 force-pushed the flashcomm_oshared branch from 57a9d61 to 9ecd1ee Compare December 12, 2025 05:41
@zzhx1 zzhx1 changed the title [Bugfix] Fix the bug in initializing the shared_weight communication domain in sfa-cp. [Bugfix] Fix the bug in initializing the shared_weight communication domain in sfa-cp, and fix the mtp weight load in pp>1 situation Dec 12, 2025
@github-actions
Copy link
Copy Markdown
Contributor

This pull request has conflicts, please resolve those before we can evaluate the pull request.

Signed-off-by: zzhx1 <zzh_201018@outlook.com>
@zzhx1 zzhx1 force-pushed the flashcomm_oshared branch from 4ab52cc to fffe348 Compare December 14, 2025 17:36
@jianzs jianzs merged commit e16444f into vllm-project:main Dec 15, 2025
14 of 16 checks passed
chenaoxuan pushed a commit to chenaoxuan/vllm-ascend that referenced this pull request Dec 20, 2025
…domain in sfa-cp, and fix the mtp weight load in pp>1 situation (vllm-project#4913)

### What this PR does / why we need it?
In PR vllm-project#4188, a small bug was introduced that caused sfa-cp to be unable
to find the global_pp_size parameter during initialization, and this PR
fixed the issue.

- vLLM version: v0.12.0
- vLLM main:
vllm-project/vllm@ad32e3e

Signed-off-by: zzhx1 <zzh_201018@outlook.com>
Co-authored-by: Jade Zheng <zheng.shoujian@outlook.com>
ZRJ026 pushed a commit to ZRJ026/vllm-ascend that referenced this pull request Feb 28, 2026
…domain in sfa-cp, and fix the mtp weight load in pp>1 situation (vllm-project#4913)

### What this PR does / why we need it?
In PR vllm-project#4188, a small bug was introduced that caused sfa-cp to be unable
to find the global_pp_size parameter during initialization, and this PR
fixed the issue.

- vLLM version: v0.12.0
- vLLM main:
vllm-project/vllm@ad32e3e

Signed-off-by: zzhx1 <zzh_201018@outlook.com>
Co-authored-by: Jade Zheng <zheng.shoujian@outlook.com>
Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
ZRJ026 pushed a commit to ZRJ026/vllm-ascend that referenced this pull request Mar 4, 2026
…domain in sfa-cp, and fix the mtp weight load in pp>1 situation (vllm-project#4913)

### What this PR does / why we need it?
In PR vllm-project#4188, a small bug was introduced that caused sfa-cp to be unable
to find the global_pp_size parameter during initialization, and this PR
fixed the issue.

- vLLM version: v0.12.0
- vLLM main:
vllm-project/vllm@ad32e3e

Signed-off-by: zzhx1 <zzh_201018@outlook.com>
Co-authored-by: Jade Zheng <zheng.shoujian@outlook.com>
Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready read for review ready-for-test start test by label for PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants