Skip to content

[Bugfix]: fix pp errors when applying flashcomm1#6282

Closed
zxdukki wants to merge 2 commits intovllm-project:mainfrom
zxdukki:dev-v0.14.0-pp-fix
Closed

[Bugfix]: fix pp errors when applying flashcomm1#6282
zxdukki wants to merge 2 commits intovllm-project:mainfrom
zxdukki:dev-v0.14.0-pp-fix

Conversation

@zxdukki
Copy link
Copy Markdown
Contributor

@zxdukki zxdukki commented Jan 26, 2026

What this PR does / why we need it?

Currently vllm-ascend cannot correctly calculate the size of intermediate tensors in preprocess when enabling PP + SP/flashcomm1 after merging #6191 .

We should replace is_residual_scattered_for_sp with enable_sp in sync_and_slice_intermediate_tensors.

image

Does this PR introduce any user-facing change?

No

How was this patch tested?

vllm: v0.14.0
vllm-ascend: main

@zxdukki zxdukki requested a review from MengqingCao as a code owner January 26, 2026 13:31
@github-actions
Copy link
Copy Markdown
Contributor

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

  • A PR should do only one thing, smaller PRs enable faster reviews.
  • Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
  • Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The pull request introduces a new method sync_and_slice_intermediate_tensors to correctly handle intermediate tensor synchronization and slicing, particularly when sequence parallelism (flashcomm1) is enabled. This addresses the bug where the intermediate tensor's copy length was incorrectly calculated, leading to errors in pipeline parallelism setups. The implementation correctly uses ceiling division for copy_len to ensure proper sharding across tensor parallel ranks.

@jianzs
Copy link
Copy Markdown
Collaborator

jianzs commented Jan 27, 2026

@lidenghui1110 PTAL

@jianzs jianzs added ready read for review ready-for-test start test by label for PR labels Jan 27, 2026
@lidenghui1110
Copy link
Copy Markdown
Contributor

lidenghui1110 commented Jan 27, 2026

@lidenghui1110 PTAL

It looks like refactor in #6191 missed this part. Maybe we need a test case to cover pp+fc1 situation. And I have some questions here:

  1. For we are trying to refactor modelrunner more like GPUModelRunner, shall we use the same logic as vLLM using _determine_batch_execution_and_padding?
  2. In this PR only using sync_and_slice_intermediate_tensors in preprocess, can we use same logic in dummy_run using sync_and_slice_intermediate_tensors? The code is duplicated and in vLLM they all using sync_and_slice_intermediate_tensors

@zxdukki zxdukki force-pushed the dev-v0.14.0-pp-fix branch from b7bdc37 to c804ae7 Compare January 27, 2026 10:01
@zxdukki
Copy link
Copy Markdown
Contributor Author

zxdukki commented Jan 27, 2026

@lidenghui1110 PTAL

It looks like refactor in #6191 missed this part. Maybe we need a test case to cover pp+fc1 situation. And I have some questions here:

  1. For we are trying to refactor modelrunner more like GPUModelRunner, shall we use the same logic as vLLM using _determine_batch_execution_and_padding?
  2. In this PR only using sync_and_slice_intermediate_tensors in preprocess, can we use same logic in dummy_run using sync_and_slice_intermediate_tensors? The code is duplicated and in vLLM they all using sync_and_slice_intermediate_tensors

-1. It seems that _determine_batch_execution_and_padding in VLLM includes the logic of (1) padding for sp (2) dispatch_cudagraph (3) ubatch sync/handshake for dp (4) _sync_metadata_across_dp. In my opinion, maybe we can refactor modelrunner using this func after integrating the logic of ubatch/dbo in vllm-ascend? i.e.,use coordinate_batch_across_dp instead of _sync_metadata_across_dp and move these logics into _determine_batch_execution_and_padding?

-2. Yes. i try to remove the duplicated code in dummy_run. please take a look at the latest commit. @lidenghui1110

@lidenghui1110
Copy link
Copy Markdown
Contributor

@lidenghui1110 PTAL

It looks like refactor in #6191 missed this part. Maybe we need a test case to cover pp+fc1 situation. And I have some questions here:

  1. For we are trying to refactor modelrunner more like GPUModelRunner, shall we use the same logic as vLLM using _determine_batch_execution_and_padding?
  2. In this PR only using sync_and_slice_intermediate_tensors in preprocess, can we use same logic in dummy_run using sync_and_slice_intermediate_tensors? The code is duplicated and in vLLM they all using sync_and_slice_intermediate_tensors

-1. It seems that _determine_batch_execution_and_padding in VLLM includes the logic of (1) padding for sp (2) dispatch_cudagraph (3) ubatch sync/handshake for dp (4) _sync_metadata_across_dp. In my opinion, maybe we can refactor modelrunner using this func after integrating the logic of ubatch/dbo in vllm-ascend? i.e.,use coordinate_batch_across_dp instead of _sync_metadata_across_dp and move these logics into _determine_batch_execution_and_padding?

-2. Yes. i try to remove the duplicated code in dummy_run. please take a look at the latest commit. @lidenghui1110

LGTM. I'm not familiar with ubatch/dbo, let's see code refactor in the future.

@zxdukki zxdukki force-pushed the dev-v0.14.0-pp-fix branch from c804ae7 to 805a6b5 Compare January 27, 2026 12:58
@zxdukki
Copy link
Copy Markdown
Contributor Author

zxdukki commented Jan 27, 2026

Run CI again.

@zxdukki zxdukki force-pushed the dev-v0.14.0-pp-fix branch from 805a6b5 to ef8c973 Compare January 28, 2026 08:53
@zxdukki
Copy link
Copy Markdown
Contributor Author

zxdukki commented Jan 28, 2026

It seems that the community has refactor the model runner using _determine_batch_execution_and_padding! @lidenghui1110
So we donot need to pad the tokens anymore. However, we should also replace is_residual_scattered_for_sp with enable_sp in sync_and_slice_intermediate_tensors.

@zxdukki zxdukki force-pushed the dev-v0.14.0-pp-fix branch from ef8c973 to 9054174 Compare January 28, 2026 09:55
@lidenghui1110
Copy link
Copy Markdown
Contributor

It seems that the community has refactor the model runner using _determine_batch_execution_and_padding! @lidenghui1110 So we donot need to pad the tokens anymore. However, we should also replace is_residual_scattered_for_sp with enable_sp in sync_and_slice_intermediate_tensors.

Good!That's fine.

@zxdukki zxdukki force-pushed the dev-v0.14.0-pp-fix branch 3 times, most recently from 4153d4d to 17ef6b0 Compare January 29, 2026 11:50
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Feb 1, 2026

This pull request has conflicts, please resolve those before we can evaluate the pull request.

@github-actions
Copy link
Copy Markdown
Contributor

This pull request has conflicts, please resolve those before we can evaluate the pull request.

@lidenghui1110
Copy link
Copy Markdown
Contributor

@zxdukki Could U please make this PR forward? PP is not available for a long time in the main branch due to this bug.

@zxdukki
Copy link
Copy Markdown
Contributor Author

zxdukki commented Mar 2, 2026

@zxdukki Could U please make this PR forward? PP is not available for a long time in the main branch due to this bug.

Sorry for the late reply, i will check it today.

@zxdukki zxdukki force-pushed the dev-v0.14.0-pp-fix branch from 58ddf1d to cc43032 Compare March 3, 2026 06:04
Signed-off-by: zhuohuan <zxdu1997@gmail.com>
@zxdukki zxdukki force-pushed the dev-v0.14.0-pp-fix branch from e9d9620 to 1910991 Compare March 4, 2026 03:42
@github-actions
Copy link
Copy Markdown
Contributor

This pull request has conflicts, please resolve those before we can evaluate the pull request.

@zxdukki
Copy link
Copy Markdown
Contributor Author

zxdukki commented Apr 28, 2026

It should be fixed in #7896 , close it

@zxdukki zxdukki closed this Apr 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

merge-conflicts ready read for review ready-for-test start test by label for PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants