Skip to content

Revert "[bugfix]limit graph replay sync (#5761)"#5965

Merged
wangxiyuan merged 1 commit intovllm-project:mainfrom
Angazenn:revert_main
Jan 16, 2026
Merged

Revert "[bugfix]limit graph replay sync (#5761)"#5965
wangxiyuan merged 1 commit intovllm-project:mainfrom
Angazenn:revert_main

Conversation

@Angazenn
Copy link
Copy Markdown
Collaborator

@Angazenn Angazenn commented Jan 16, 2026

What this PR does / why we need it?

reverts #5761 to fix accuracy issues when using piecewise graph mode.

Does this PR introduce any user-facing change?

How was this patch tested?

@Angazenn Angazenn requested a review from yiz-liu as a code owner January 16, 2026 15:06
@github-actions
Copy link
Copy Markdown
Contributor

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

  • A PR should do only one thing, smaller PRs enable faster reviews.
  • Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
  • Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request reverts a previous change, making the torch.npu.synchronize() call before graph replay unconditional. This is a necessary change to ensure correctness and prevent race conditions between model iterations, especially in asynchronous execution environments. However, the unconditional device-wide synchronization can be a performance bottleneck. I've added a suggestion to consider a more fine-grained, event-based synchronization mechanism to mitigate the performance impact while maintaining correctness.

# so that update_attn_params only executes after the previous graph replay has fully completed.
if self.runtime_mode == CUDAGraphMode.FULL:
torch.npu.synchronize()
torch.npu.synchronize()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

While making the synchronization unconditional correctly addresses a potential race condition, using torch.npu.synchronize() can introduce a significant performance bottleneck as it stalls the CPU and waits for all kernels on the device to complete. A more performant approach would be to use explicit event-based synchronization. For instance, you could record an event after the update_attn_params call in the previous iteration and have the current iteration's stream wait for that specific event before replaying the graph. This would avoid a full device-wide synchronization and improve overall throughput.

This reverts commit 4453c60.

Signed-off-by: Angazenn <supperccell@163.com>
@wangxiyuan wangxiyuan merged commit 7feb745 into vllm-project:main Jan 16, 2026
8 checks passed
starmountain1997 pushed a commit to starmountain1997/vllm-ascend that referenced this pull request Jan 31, 2026
…oject#5965)

### What this PR does / why we need it?
reverts vllm-project#5761 to fix accuracy issues when using piecewise graph mode.

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.13.0
- vLLM main:
vllm-project/vllm@2c24bc6

Signed-off-by: Angazenn <supperccell@163.com>
starmountain1997 pushed a commit to starmountain1997/vllm-ascend that referenced this pull request Jan 31, 2026
…oject#5965)

### What this PR does / why we need it?
reverts vllm-project#5761 to fix accuracy issues when using piecewise graph mode.

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.13.0
- vLLM main:
vllm-project/vllm@2c24bc6

Signed-off-by: Angazenn <supperccell@163.com>
@Angazenn Angazenn deleted the revert_main branch February 4, 2026 06:30
ZRJ026 pushed a commit to ZRJ026/vllm-ascend that referenced this pull request Feb 28, 2026
…oject#5965)

### What this PR does / why we need it?
reverts vllm-project#5761 to fix accuracy issues when using piecewise graph mode.

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.13.0
- vLLM main:
vllm-project/vllm@2c24bc6

Signed-off-by: Angazenn <supperccell@163.com>
Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
maoxx241 pushed a commit to maoxx241/vllm-ascend that referenced this pull request Mar 2, 2026
…oject#5965)

### What this PR does / why we need it?
reverts vllm-project#5761 to fix accuracy issues when using piecewise graph mode.

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.13.0
- vLLM main:
vllm-project/vllm@2c24bc6

Signed-off-by: Angazenn <supperccell@163.com>
ZRJ026 pushed a commit to ZRJ026/vllm-ascend that referenced this pull request Mar 4, 2026
…oject#5965)

### What this PR does / why we need it?
reverts vllm-project#5761 to fix accuracy issues when using piecewise graph mode.

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.13.0
- vLLM main:
vllm-project/vllm@2c24bc6

Signed-off-by: Angazenn <supperccell@163.com>
Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
LCAIZJ pushed a commit to LCAIZJ/vllm-ascend that referenced this pull request Mar 7, 2026
…oject#5965)

### What this PR does / why we need it?
reverts vllm-project#5761 to fix accuracy issues when using piecewise graph mode.

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.13.0
- vLLM main:
vllm-project/vllm@2c24bc6

Signed-off-by: Angazenn <supperccell@163.com>
@caozuoba
Copy link
Copy Markdown
Contributor

@Angazenn @yiz-liu @wangxiyuan Hi , regarding #5761 and the revert in #5965, could you share what exact accuracy issue was observed after removing the PIECEWISE replay sync?

Was it output corruption / gibberish, nondeterminism, or a measurable benchmark accuracy drop? Was the root cause identified?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants