Skip to content

[Feat] Adapt FlashComm2 with PCP#5114

Closed
dsxsteven wants to merge 9 commits intovllm-project:mainfrom
dsxsteven:main_1217_adaptFlashComm2
Closed

[Feat] Adapt FlashComm2 with PCP#5114
dsxsteven wants to merge 9 commits intovllm-project:mainfrom
dsxsteven:main_1217_adaptFlashComm2

Conversation

@dsxsteven
Copy link
Copy Markdown
Contributor

@dsxsteven dsxsteven commented Dec 17, 2025

What this PR does / why we need it?

Currently, enabling FlashComm2 with o_shared linear does not support enabling PCP simultaneously. This pull request addresses this issue.
To enable Flashcomm2 use o_shared linear, we reference on #4188

Does this PR introduce any user-facing change?

NO

How was this patch tested?

Signed-off-by: daishixun <dsxsteven@sina.com>
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adapts FlashComm2 to work with Prefill Context Parallelism (PCP) by updating the logic for creating shared weight groups. The core change correctly incorporates the PCP size into the global rank calculation. I've suggested a minor refactoring to improve the readability of this calculation. A key concern is the lack of unit tests for the modified _create_shared_weight_group function. The existing tests do not seem to cover the conditions under which this function is called, which is a significant gap for such a critical part of the distributed setup. I strongly recommend adding tests to validate the new rank calculation logic with PCP enabled.

Comment on lines +173 to +177
base = (
dp_idx * global_pp_size * global_pcp_size * global_tp_size
+ pp_idx * global_pcp_size * global_tp_size
+ pcp_idx * global_tp_size
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The calculation for base is functionally correct, but it can be refactored for better readability and maintainability. By factoring out global_tp_size and nesting the multiplications, the logic for calculating the rank based on the (dp, pp, pcp) indices becomes clearer. This also slightly improves efficiency by reducing the number of multiplications.

                    base = ((dp_idx * global_pp_size + pp_idx) * global_pcp_size + pcp_idx) * global_tp_size

@github-actions
Copy link
Copy Markdown
Contributor

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

  • A PR should do only one thing, smaller PRs enable faster reviews.
  • Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
  • Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

Signed-off-by: daishixun <dsxsteven@sina.com>
@weijinqian0 weijinqian0 added ready read for review ready-for-test start test by label for PR labels Dec 17, 2025
Signed-off-by: daishixun <dsxsteven@sina.com>
Signed-off-by: daishixun <dsxsteven@sina.com>
Signed-off-by: daishixun <dsxsteven@sina.com>
Signed-off-by: daishixun <dsxsteven@sina.com>
Signed-off-by: daishixun <dsxsteven@sina.com>
Signed-off-by: daishixun <dsxsteven@sina.com>
@dsxsteven
Copy link
Copy Markdown
Contributor Author

dsxsteven commented Dec 18, 2025

https://github.com/vllm-project/vllm-ascend/actions/runs/20332866607/job/58421576990?pr=5114
All e2e tests were passed, save to prevent a rerun due to conflict

@github-actions
Copy link
Copy Markdown
Contributor

This pull request has conflicts, please resolve those before we can evaluate the pull request.

wangxiyuan
wangxiyuan previously approved these changes Dec 19, 2025
@zzhx1
Copy link
Copy Markdown
Contributor

zzhx1 commented Dec 19, 2025

@wangxiyuan @dsxsteven
Can you temporarily pause merging this PR? I have been refactoring the interfaces of flashcomm2 #5181 , and I have also refactored the _create_shared_weight_group in parallel_state. please plan to connect pcp later

@wangxiyuan
Copy link
Copy Markdown
Collaborator

sure

@wangxiyuan wangxiyuan dismissed their stale review December 19, 2025 06:17

wait for refactor

@github-actions
Copy link
Copy Markdown
Contributor

This pull request has conflicts, please resolve those before we can evaluate the pull request.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Jan 7, 2026

This pull request has conflicts, please resolve those before we can evaluate the pull request.

@dsxsteven dsxsteven closed this Jan 26, 2026
@dsxsteven dsxsteven deleted the main_1217_adaptFlashComm2 branch March 10, 2026 03:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

merge-conflicts ready read for review ready-for-test start test by label for PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants