Skip to content

update to vllm 12-19#5223

Merged
wangxiyuan merged 8 commits intovllm-project:mainfrom
leo-pony:update_12_22
Dec 23, 2025
Merged

update to vllm 12-19#5223
wangxiyuan merged 8 commits intovllm-project:mainfrom
leo-pony:update_12_22

Conversation

@leo-pony
Copy link
Copy Markdown
Collaborator

@leo-pony leo-pony commented Dec 22, 2025

What this PR does / why we need it?

Does this PR introduce any user-facing change?

Fix vllm break:

  1. [Enable cuda graph for deepepHT, 5.3% throughput improvement, 4.4% TTFT improvement] ([Perf] Enable cuda graph for deepepHT, 5.3% throughput improvement, 4.4% TTFT improvement vllm#29558)
    Fix Solution: Add the now-necessary all2all_backend parameter. The impact of this parameter on the original set_splitting_ops_for_v1 implementation is only that graph mode is disabled in vllm if deepep_high_throughput is enabled; it has no effect on the vllm-ascend logic.

2.[Migrate legacy ViT MultiHeadAttention to new MMEncoderAttention interface ] (vllm-project/vllm#30684)
Fix Solution: The reason why the GPU does not need to convert qkv to 3D is that the GPU's flash_attention operator is compatible with 3D and 4D (b s h d and s b ( h d)), but the NPU's flash_attention_unpad operator only supports 3D (s b ( h d)). Therefore, we need to introduce the reshape_qkv_to_3d operation.

4.Skip Tencent-Hunyuan/HunyuanOCR test case, as it has following issue in upgrade vllm code:
#5297

How was this patch tested?

Co-authored-by: zxwang 1476209578@qq.com

@leo-pony leo-pony marked this pull request as draft December 22, 2025 02:48
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the versioning policy documentation to include a specific vLLM commit hash for compatibility with the main branch. While this adds precision, the pull request description is empty. More importantly, the corresponding Chinese translation file has not been updated, leading to inconsistent documentation. This should be addressed to ensure all users have correct information.

| vLLM Ascend | vLLM | Python | Stable CANN | PyTorch/torch_npu |
|-------------|--------------|------------------|-------------|--------------------|
| main | v0.13.0 tag | >= 3.10, < 3.12 | 8.3.RC2 | 2.8.0 / 2.8.0 |
| main | 5fbfa8d9ef15948599631baeb91e8220b2ee9bcc, v0.13.0 tag | >= 3.10, < 3.12 | 8.3.RC2 | 2.8.0 / 2.8.0 |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The compatibility matrix has been updated, but the corresponding Chinese translation file (docs/source/locale/zh_CN/LC_MESSAGES/community/versioning_policy.po) was not updated. This will cause a discrepancy and provide outdated information to users relying on the Chinese documentation. Please update the translation files to ensure consistency across all supported languages.

@github-actions
Copy link
Copy Markdown
Contributor

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

  • A PR should do only one thing, smaller PRs enable faster reviews.
  • Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
  • Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

@leo-pony leo-pony changed the title update to vllm 12-18 update to vllm 12-19 Dec 22, 2025
@leo-pony leo-pony changed the title update to vllm 12-19 update to vllm 12-18 Dec 22, 2025
@github-actions github-actions bot added documentation Improvements or additions to documentation ci/build labels Dec 22, 2025
@leo-pony leo-pony added ready read for review ready-for-test start test by label for PR and removed ready read for review ready-for-test start test by label for PR labels Dec 22, 2025
@leo-pony leo-pony marked this pull request as ready for review December 23, 2025 03:28
@leo-pony leo-pony changed the title update to vllm 12-18 update to vllm 12-19 Dec 23, 2025
leo-pony and others added 8 commits December 23, 2025 11:15
Signed-off-by: leo-pony <nengjunma@outlook.com>
…29558'

Signed-off-by: leo-pony <nengjunma@outlook.com>
Signed-off-by: leo-pony <nengjunma@outlook.com>
Signed-off-by: zxwang <1476209578@qq.com>
Signed-off-by: zxwang <1476209578@qq.com>
Signed-off-by: leo-pony <nengjunma@outlook.com>
Signed-off-by: leo-pony <nengjunma@outlook.com>
Signed-off-by: leo-pony <nengjunma@outlook.com>
@leo-pony leo-pony added ready read for review ready-for-test start test by label for PR labels Dec 23, 2025
@wangxiyuan wangxiyuan merged commit 3b59f20 into vllm-project:main Dec 23, 2025
53 of 66 checks passed
@leo-pony leo-pony deleted the update_12_22 branch December 30, 2025 06:25
ZRJ026 pushed a commit to ZRJ026/vllm-ascend that referenced this pull request Feb 28, 2026
### What this PR does / why we need it?

### Does this PR introduce _any_ user-facing change?
Fix vllm break:
1. [Enable cuda graph for deepepHT, 5.3% throughput improvement, 4.4%
TTFT improvement] (vllm-project/vllm#29558)
Fix Solution: Add the now-necessary `all2all_backend` parameter. The
impact of this parameter on the original `set_splitting_ops_for_v1`
implementation is only that graph mode is disabled in `vllm` if
`deepep_high_throughput` is enabled; it has no effect on the
`vllm-ascend` logic.

2.[Migrate legacy ViT MultiHeadAttention to new MMEncoderAttention
interface ] (vllm-project/vllm#30684)
Fix Solution: The reason why the GPU does not need to convert qkv to 3D
is that the GPU's flash_attention operator is compatible with 3D and 4D
(b s h d and s b ( h d)), but the NPU's flash_attention_unpad operator
only supports 3D (s b ( h d)). Therefore, we need to introduce the
reshape_qkv_to_3d operation.

4.Skip Tencent-Hunyuan/HunyuanOCR test case, as it has following issue
in upgrade vllm code:
vllm-project#5297

### How was this patch tested?

Co-authored-by: zxwang <1476209578@qq.com>

- vLLM version: release/v0.13.0
- vLLM main:
vllm-project/vllm@ad32e3e

---------

Signed-off-by: leo-pony <nengjunma@outlook.com>
Signed-off-by: zxwang <1476209578@qq.com>
Co-authored-by: zxwang <1476209578@qq.com>
Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
ZRJ026 pushed a commit to ZRJ026/vllm-ascend that referenced this pull request Mar 4, 2026
### What this PR does / why we need it?

### Does this PR introduce _any_ user-facing change?
Fix vllm break:
1. [Enable cuda graph for deepepHT, 5.3% throughput improvement, 4.4%
TTFT improvement] (vllm-project/vllm#29558)
Fix Solution: Add the now-necessary `all2all_backend` parameter. The
impact of this parameter on the original `set_splitting_ops_for_v1`
implementation is only that graph mode is disabled in `vllm` if
`deepep_high_throughput` is enabled; it has no effect on the
`vllm-ascend` logic.

2.[Migrate legacy ViT MultiHeadAttention to new MMEncoderAttention
interface ] (vllm-project/vllm#30684)
Fix Solution: The reason why the GPU does not need to convert qkv to 3D
is that the GPU's flash_attention operator is compatible with 3D and 4D
(b s h d and s b ( h d)), but the NPU's flash_attention_unpad operator
only supports 3D (s b ( h d)). Therefore, we need to introduce the
reshape_qkv_to_3d operation.

4.Skip Tencent-Hunyuan/HunyuanOCR test case, as it has following issue
in upgrade vllm code:
vllm-project#5297

### How was this patch tested?

Co-authored-by: zxwang <1476209578@qq.com>

- vLLM version: release/v0.13.0
- vLLM main:
vllm-project/vllm@ad32e3e

---------

Signed-off-by: leo-pony <nengjunma@outlook.com>
Signed-off-by: zxwang <1476209578@qq.com>
Co-authored-by: zxwang <1476209578@qq.com>
Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci/build documentation Improvements or additions to documentation ready read for review ready-for-test start test by label for PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants