[Feat] Adapt FlashComm2 with PCP#5114
Conversation
Signed-off-by: daishixun <dsxsteven@sina.com>
There was a problem hiding this comment.
Code Review
This pull request adapts FlashComm2 to work with Prefill Context Parallelism (PCP) by updating the logic for creating shared weight groups. The core change correctly incorporates the PCP size into the global rank calculation. I've suggested a minor refactoring to improve the readability of this calculation. A key concern is the lack of unit tests for the modified _create_shared_weight_group function. The existing tests do not seem to cover the conditions under which this function is called, which is a significant gap for such a critical part of the distributed setup. I strongly recommend adding tests to validate the new rank calculation logic with PCP enabled.
| base = ( | ||
| dp_idx * global_pp_size * global_pcp_size * global_tp_size | ||
| + pp_idx * global_pcp_size * global_tp_size | ||
| + pcp_idx * global_tp_size | ||
| ) |
There was a problem hiding this comment.
The calculation for base is functionally correct, but it can be refactored for better readability and maintainability. By factoring out global_tp_size and nesting the multiplications, the logic for calculating the rank based on the (dp, pp, pcp) indices becomes clearer. This also slightly improves efficiency by reducing the number of multiplications.
base = ((dp_idx * global_pp_size + pp_idx) * global_pcp_size + pcp_idx) * global_tp_size|
👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:
If CI fails, you can run linting and testing checks locally according Contributing and Testing. |
Signed-off-by: daishixun <dsxsteven@sina.com>
Signed-off-by: daishixun <dsxsteven@sina.com>
Signed-off-by: daishixun <dsxsteven@sina.com>
Signed-off-by: daishixun <dsxsteven@sina.com>
Signed-off-by: daishixun <dsxsteven@sina.com>
Signed-off-by: daishixun <dsxsteven@sina.com>
|
https://github.com/vllm-project/vllm-ascend/actions/runs/20332866607/job/58421576990?pr=5114 |
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
|
@wangxiyuan @dsxsteven |
|
sure |
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
What this PR does / why we need it?
Currently, enabling FlashComm2 with
o_shared lineardoes not support enabling PCP simultaneously. This pull request addresses this issue.To enable Flashcomm2 use o_shared linear, we reference on #4188
Does this PR introduce any user-facing change?
NO
How was this patch tested?