[Bugfix]: fix pp errors when applying flashcomm1 by zxdukki · Pull Request #6282 · vllm-project/vllm-ascend

zxdukki · 2026-01-26T13:31:26Z

What this PR does / why we need it?

Currently vllm-ascend cannot correctly calculate the size of intermediate tensors in preprocess when enabling PP + SP/flashcomm1 after merging #6191 .

We should replace is_residual_scattered_for_sp with enable_sp in sync_and_slice_intermediate_tensors.

Does this PR introduce any user-facing change?

No

How was this patch tested?

vllm: v0.14.0
vllm-ascend: main

github-actions · 2026-01-26T13:31:51Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

gemini-code-assist

Code Review

The pull request introduces a new method sync_and_slice_intermediate_tensors to correctly handle intermediate tensor synchronization and slicing, particularly when sequence parallelism (flashcomm1) is enabled. This addresses the bug where the intermediate tensor's copy length was incorrectly calculated, leading to errors in pipeline parallelism setups. The implementation correctly uses ceiling division for copy_len to ensure proper sharding across tensor parallel ranks.

jianzs · 2026-01-27T05:08:35Z

@lidenghui1110 PTAL

lidenghui1110 · 2026-01-27T08:48:39Z

@lidenghui1110 PTAL

It looks like refactor in #6191 missed this part. Maybe we need a test case to cover pp+fc1 situation. And I have some questions here:

For we are trying to refactor modelrunner more like GPUModelRunner, shall we use the same logic as vLLM using _determine_batch_execution_and_padding?
In this PR only using sync_and_slice_intermediate_tensors in preprocess, can we use same logic in dummy_run using sync_and_slice_intermediate_tensors? The code is duplicated and in vLLM they all using sync_and_slice_intermediate_tensors

zxdukki · 2026-01-27T10:12:18Z

@lidenghui1110 PTAL

It looks like refactor in #6191 missed this part. Maybe we need a test case to cover pp+fc1 situation. And I have some questions here:

For we are trying to refactor modelrunner more like GPUModelRunner, shall we use the same logic as vLLM using _determine_batch_execution_and_padding?

In this PR only using sync_and_slice_intermediate_tensors in preprocess, can we use same logic in dummy_run using sync_and_slice_intermediate_tensors? The code is duplicated and in vLLM they all using sync_and_slice_intermediate_tensors

-1. It seems that _determine_batch_execution_and_padding in VLLM includes the logic of (1) padding for sp (2) dispatch_cudagraph (3) ubatch sync/handshake for dp (4) _sync_metadata_across_dp. In my opinion, maybe we can refactor modelrunner using this func after integrating the logic of ubatch/dbo in vllm-ascend? i.e.,use coordinate_batch_across_dp instead of _sync_metadata_across_dp and move these logics into _determine_batch_execution_and_padding?

-2. Yes. i try to remove the duplicated code in dummy_run. please take a look at the latest commit. @lidenghui1110

lidenghui1110 · 2026-01-27T10:33:02Z

@lidenghui1110 PTAL

It looks like refactor in #6191 missed this part. Maybe we need a test case to cover pp+fc1 situation. And I have some questions here:

For we are trying to refactor modelrunner more like GPUModelRunner, shall we use the same logic as vLLM using _determine_batch_execution_and_padding?

In this PR only using sync_and_slice_intermediate_tensors in preprocess, can we use same logic in dummy_run using sync_and_slice_intermediate_tensors? The code is duplicated and in vLLM they all using sync_and_slice_intermediate_tensors

-1. It seems that _determine_batch_execution_and_padding in VLLM includes the logic of (1) padding for sp (2) dispatch_cudagraph (3) ubatch sync/handshake for dp (4) _sync_metadata_across_dp. In my opinion, maybe we can refactor modelrunner using this func after integrating the logic of ubatch/dbo in vllm-ascend? i.e.,use coordinate_batch_across_dp instead of _sync_metadata_across_dp and move these logics into _determine_batch_execution_and_padding?

-2. Yes. i try to remove the duplicated code in dummy_run. please take a look at the latest commit. @lidenghui1110

LGTM. I'm not familiar with ubatch/dbo, let's see code refactor in the future.

zxdukki · 2026-01-27T13:00:53Z

Run CI again.

zxdukki · 2026-01-28T09:12:30Z

It seems that the community has refactor the model runner using _determine_batch_execution_and_padding! @lidenghui1110
So we donot need to pad the tokens anymore. However, we should also replace is_residual_scattered_for_sp with enable_sp in sync_and_slice_intermediate_tensors.

lidenghui1110 · 2026-01-28T11:34:38Z

It seems that the community has refactor the model runner using _determine_batch_execution_and_padding! @lidenghui1110 So we donot need to pad the tokens anymore. However, we should also replace is_residual_scattered_for_sp with enable_sp in sync_and_slice_intermediate_tensors.

Good！That's fine.

github-actions · 2026-02-01T15:24:39Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

github-actions · 2026-02-27T00:29:53Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

lidenghui1110 · 2026-02-28T03:15:11Z

@zxdukki Could U please make this PR forward? PP is not available for a long time in the main branch due to this bug.

zxdukki · 2026-03-02T02:24:25Z

@zxdukki Could U please make this PR forward? PP is not available for a long time in the main branch due to this bug.

Sorry for the late reply, i will check it today.

Signed-off-by: zhuohuan <zxdu1997@gmail.com>

github-actions · 2026-04-28T08:59:11Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

zxdukki · 2026-04-28T09:44:55Z

It should be fixed in #7896 , close it

zxdukki requested a review from MengqingCao as a code owner January 26, 2026 13:31

gemini-code-assist Bot reviewed Jan 26, 2026

View reviewed changes

zxdukki mentioned this pull request Jan 26, 2026

[Bug]: pipeline parallelism errors when applying flashcomm1/sp #6283

Closed

jianzs added ready read for review ready-for-test start test by label for PR labels Jan 27, 2026

zxdukki force-pushed the dev-v0.14.0-pp-fix branch from b7bdc37 to c804ae7 Compare January 27, 2026 10:01

zxdukki force-pushed the dev-v0.14.0-pp-fix branch from c804ae7 to 805a6b5 Compare January 27, 2026 12:58

zxdukki force-pushed the dev-v0.14.0-pp-fix branch from 805a6b5 to ef8c973 Compare January 28, 2026 08:53

zxdukki force-pushed the dev-v0.14.0-pp-fix branch from ef8c973 to 9054174 Compare January 28, 2026 09:55

zxdukki force-pushed the dev-v0.14.0-pp-fix branch 3 times, most recently from 4153d4d to 17ef6b0 Compare January 29, 2026 11:50

github-actions Bot added the merge-conflicts label Feb 1, 2026

github-actions Bot removed the merge-conflicts label Feb 2, 2026

zxdukki force-pushed the dev-v0.14.0-pp-fix branch 2 times, most recently from 756cd0d to 3c35934 Compare February 2, 2026 06:12

zzhx1 mentioned this pull request Feb 6, 2026

[bugfix] Support pipeline parallellism for Deepseek V3.2 DSA-CP #6589

Open

github-actions Bot added the merge-conflicts label Feb 27, 2026

github-actions Bot removed the merge-conflicts label Mar 1, 2026

zxdukki force-pushed the dev-v0.14.0-pp-fix branch from 58ddf1d to cc43032 Compare March 3, 2026 06:04

[fix]:fix pp errors when applying flashcomm1/sp

1910991

Signed-off-by: zhuohuan <zxdu1997@gmail.com>

zxdukki force-pushed the dev-v0.14.0-pp-fix branch from e9d9620 to 1910991 Compare March 4, 2026 03:42

Merge branch 'main' into dev-v0.14.0-pp-fix

be86716

RichardHoOoOo mentioned this pull request Apr 20, 2026

[Bug]: vllm v0.18.0rc1 + Ray + TP=8, PP=4 + 4节点 * 8卡910B4 + MiniMax M2.5 报KeyError: 'model.layers.0.self_attn.attn' #8440

Open

github-actions Bot added the merge-conflicts label Apr 28, 2026

zxdukki closed this Apr 28, 2026

Conversation

zxdukki commented Jan 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

github-actions Bot commented Jan 26, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

jianzs commented Jan 27, 2026

Uh oh!

lidenghui1110 commented Jan 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zxdukki commented Jan 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lidenghui1110 commented Jan 27, 2026

Uh oh!

zxdukki commented Jan 27, 2026

Uh oh!

zxdukki commented Jan 28, 2026

Uh oh!

lidenghui1110 commented Jan 28, 2026

Uh oh!

github-actions Bot commented Feb 1, 2026

Uh oh!

github-actions Bot commented Feb 27, 2026

Uh oh!

lidenghui1110 commented Feb 28, 2026

Uh oh!

zxdukki commented Mar 2, 2026

Uh oh!

github-actions Bot commented Apr 28, 2026

Uh oh!

zxdukki commented Apr 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

zxdukki commented Jan 26, 2026 •

edited

Loading

lidenghui1110 commented Jan 27, 2026 •

edited

Loading

zxdukki commented Jan 27, 2026 •

edited

Loading