[Bugfix]: fix pp errors when applying flashcomm1#6282
[Bugfix]: fix pp errors when applying flashcomm1#6282zxdukki wants to merge 2 commits intovllm-project:mainfrom
Conversation
|
👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:
If CI fails, you can run linting and testing checks locally according Contributing and Testing. |
There was a problem hiding this comment.
Code Review
The pull request introduces a new method sync_and_slice_intermediate_tensors to correctly handle intermediate tensor synchronization and slicing, particularly when sequence parallelism (flashcomm1) is enabled. This addresses the bug where the intermediate tensor's copy length was incorrectly calculated, leading to errors in pipeline parallelism setups. The implementation correctly uses ceiling division for copy_len to ensure proper sharding across tensor parallel ranks.
|
@lidenghui1110 PTAL |
It looks like refactor in #6191 missed this part. Maybe we need a test case to cover pp+fc1 situation. And I have some questions here:
|
b7bdc37 to
c804ae7
Compare
-1. It seems that -2. Yes. i try to remove the duplicated code in dummy_run. please take a look at the latest commit. @lidenghui1110 |
LGTM. I'm not familiar with ubatch/dbo, let's see code refactor in the future. |
c804ae7 to
805a6b5
Compare
|
Run CI again. |
805a6b5 to
ef8c973
Compare
|
It seems that the community has refactor the model runner using |
ef8c973 to
9054174
Compare
Good!That's fine. |
4153d4d to
17ef6b0
Compare
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
756cd0d to
3c35934
Compare
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
|
@zxdukki Could U please make this PR forward? PP is not available for a long time in the main branch due to this bug. |
Sorry for the late reply, i will check it today. |
58ddf1d to
cc43032
Compare
Signed-off-by: zhuohuan <zxdu1997@gmail.com>
e9d9620 to
1910991
Compare
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
|
It should be fixed in #7896 , close it |
What this PR does / why we need it?
Currently vllm-ascend cannot correctly calculate the size of intermediate tensors in
preprocesswhen enabling PP + SP/flashcomm1 after merging #6191 .We should replace
is_residual_scattered_for_spwithenable_spinsync_and_slice_intermediate_tensors.Does this PR introduce any user-facing change?
No
How was this patch tested?
vllm: v0.14.0
vllm-ascend: main