Revert "[bugfix]limit graph replay sync (#5761)" by Angazenn · Pull Request #5965 · vllm-project/vllm-ascend

Angazenn · 2026-01-16T15:06:19Z

What this PR does / why we need it?

reverts #5761 to fix accuracy issues when using piecewise graph mode.

Does this PR introduce any user-facing change?

How was this patch tested?

vLLM version: v0.13.0
vLLM main: vllm-project/vllm@2c24bc6

github-actions · 2026-01-16T15:06:34Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

gemini-code-assist

Code Review

This pull request reverts a previous change, making the torch.npu.synchronize() call before graph replay unconditional. This is a necessary change to ensure correctness and prevent race conditions between model iterations, especially in asynchronous execution environments. However, the unconditional device-wide synchronization can be a performance bottleneck. I've added a suggestion to consider a more fine-grained, event-based synchronization mechanism to mitigate the performance impact while maintaining correctness.

gemini-code-assist · 2026-01-16T15:09:52Z

        # so that update_attn_params only executes after the previous graph replay has fully completed.
-        if self.runtime_mode == CUDAGraphMode.FULL:
-            torch.npu.synchronize()
+        torch.npu.synchronize()


While making the synchronization unconditional correctly addresses a potential race condition, using torch.npu.synchronize() can introduce a significant performance bottleneck as it stalls the CPU and waits for all kernels on the device to complete. A more performant approach would be to use explicit event-based synchronization. For instance, you could record an event after the update_attn_params call in the previous iteration and have the current iteration's stream wait for that specific event before replaying the graph. This would avoid a full device-wide synchronization and improve overall throughput.

This reverts commit 4453c60. Signed-off-by: Angazenn <supperccell@163.com>

…oject#5965) ### What this PR does / why we need it? reverts vllm-project#5761 to fix accuracy issues when using piecewise graph mode. ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@2c24bc6 Signed-off-by: Angazenn <supperccell@163.com>

…oject#5965) ### What this PR does / why we need it? reverts vllm-project#5761 to fix accuracy issues when using piecewise graph mode. ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@2c24bc6 Signed-off-by: Angazenn <supperccell@163.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>

…oject#5965) ### What this PR does / why we need it? reverts vllm-project#5761 to fix accuracy issues when using piecewise graph mode. ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@2c24bc6 Signed-off-by: Angazenn <supperccell@163.com>

…oject#5965) ### What this PR does / why we need it? reverts vllm-project#5761 to fix accuracy issues when using piecewise graph mode. ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@2c24bc6 Signed-off-by: Angazenn <supperccell@163.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>

…oject#5965) ### What this PR does / why we need it? reverts vllm-project#5761 to fix accuracy issues when using piecewise graph mode. ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@2c24bc6 Signed-off-by: Angazenn <supperccell@163.com>

caozuoba · 2026-04-17T07:15:09Z

@Angazenn @yiz-liu @wangxiyuan Hi , regarding #5761 and the revert in #5965, could you share what exact accuracy issue was observed after removing the PIECEWISE replay sync?

Was it output corruption / gibberish, nondeterminism, or a measurable benchmark accuracy drop? Was the root cause identified?

Angazenn requested a review from yiz-liu as a code owner January 16, 2026 15:06

gemini-code-assist Bot reviewed Jan 16, 2026

View reviewed changes

Revert "[bugfix]limit graph replay sync (vllm-project#5761)"

e57aa44

This reverts commit 4453c60. Signed-off-by: Angazenn <supperccell@163.com>

Angazenn force-pushed the revert_main branch from 68d17cc to e57aa44 Compare January 16, 2026 15:11

wangxiyuan merged commit 7feb745 into vllm-project:main Jan 16, 2026
8 checks passed

Angazenn deleted the revert_main branch February 4, 2026 06:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Revert "[bugfix]limit graph replay sync (#5761)"#5965

Revert "[bugfix]limit graph replay sync (#5761)"#5965
wangxiyuan merged 1 commit intovllm-project:mainfrom
Angazenn:revert_main

Angazenn commented Jan 16, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jan 16, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Jan 16, 2026

Uh oh!

Uh oh!

caozuoba commented Apr 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Angazenn commented Jan 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

github-actions Bot commented Jan 16, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jan 16, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

caozuoba commented Apr 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Angazenn commented Jan 16, 2026 •

edited

Loading