[Main2Main] Upgrade vLLM to 0303 by MrZ20 · Pull Request #6944 · vllm-project/vllm-ascend

MrZ20 · 2026-03-03T03:08:20Z

What this PR does / why we need it?

break:

[DP] Only use DP padding when cudagraphs are actually used vllm#34102
Disable_full param replaced with valid_modes/invalid_modes API
[Bugfix] Propagate compilation_time from workers to main process for TP>1 vllm#35503
Now must return float compilation_time
[Model Runner V2] Move MM encoder to Model States [3/N] vllm#35564
New sequence_lengths param added
[UX] Add --moe-backend arg for explicit kernel selection vllm#33807
A check was performed (if runner_backend != "auto")
[1/N] Elastic EP Milestone 2 vllm#34861
BaseDeviceCommunicator now accesses PyTorch's internal pg_map to check process group state
Remove bc-lint vllm#35274

Important change:

[torch.compile] Sequence Parallelism threshold compile ranges vllm#28672

matcher_utils directly accesses torch.ops._C.* during the import phase. In the Ascend environment, some unregistered ops trigger AttributeError, causing e2e initialization failure.
https://github.com/vllm-project/vllm-ascend/actions/runs/22607260487/job/65502047131#step:10:2323
https://github.com/vllm-project/vllm/blob/main/vllm/compilation/passes/fusion/matcher_utils.py#L29

This PR adds temporary compatibility placeholders (rms_norm, fused_add_rms_norm, rotate_embedding, static/dynamic fp8 quant, silu_and_mul) to vllm_ascend/patch/platform/patch_fusion_matcher_compat_ops.py to ensure no crashes during the import phase. Upstream repairs will be considered later.

Does this PR introduce any user-facing change?

How was this patch tested?

vLLM version: v0.16.0
vLLM main: vllm-project/vllm@15d76f7

gemini-code-assist · 2026-03-03T03:08:26Z

Note

Gemini is unable to generate a summary for this pull request due to the file types involved not being currently supported.

github-actions · 2026-03-03T03:08:33Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

github-actions · 2026-03-03T03:35:21Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

Root causes: - CudagraphDispatcher.dispatch() disable_full replaced with valid_modes/invalid_modes (PR #34102) - compile_or_warm_up_model() now returns float compilation_time (PR #35503) - MMEncoderAttention forward methods added sequence_lengths param (PR #35564) - Removed auto-forcing of +rms_norm for sequence parallelism (PR #35410) Upstream commit range: 15d76f74e2fdb12a95ea00f0ca283acf6219a2b7..6290470843c131681e3e1318ae71070a34f33225 Co-Authored-By: Claude Code <noreply@anthropic.com> Signed-off-by: MrZ20 <2609716663@qq.com>

Root causes: - CudagraphDispatcher.dispatch() API changed: disable_full -> valid_modes/invalid_modes (#34102) - compile_or_warm_up_model() must return float compilation_time (#35503) - MMEncoderAttention.forward_oot() gained new sequence_lengths param (#34580) - +rms_norm no longer auto-forced for SP, breaks Ascend without CUDA _C ops (#35410) Upstream commit range: 15d76f74e2fdb12a95ea00f0ca283acf6219a2b7..6290470843c131681e3e1318ae71070a34f33225 Co-Authored-By: Claude Code <noreply@anthropic.com> Signed-off-by: MrZ20 <2609716663@qq.com>

Signed-off-by: MrZ20 <2609716663@qq.com>

Signed-off-by: gcanlin <canlinguosdu@gmail.com>

Signed-off-by: MrZ20 <2609716663@qq.com>

### What this PR does / why we need it? break: - vllm-project/vllm#34102 Disable_full param replaced with valid_modes/invalid_modes API - vllm-project/vllm#35503 Now must return float compilation_time - vllm-project/vllm#35564 New sequence_lengths param added - vllm-project/vllm#33807 A check was performed (if runner_backend != "auto") - vllm-project/vllm#34861 `BaseDeviceCommunicator` now accesses PyTorch's internal `pg_map` to check process group state - vllm-project/vllm#35274 **Important change:** - vllm-project/vllm#28672 `matcher_utils` directly accesses `torch.ops._C.*` during the import phase. In the Ascend environment, some unregistered ops trigger `AttributeError`, causing e2e initialization failure. https://github.com/vllm-project/vllm-ascend/actions/runs/22607260487/job/65502047131#step:10:2323 https://github.com/vllm-project/vllm/blob/main/vllm/compilation/passes/fusion/matcher_utils.py#L29 This PR adds temporary compatibility placeholders (rms_norm, fused_add_rms_norm, rotate_embedding, static/dynamic fp8 quant, silu_and_mul) to `vllm_ascend/patch/platform/patch_fusion_matcher_compat_ops.py` to ensure no crashes during the import phase. Upstream repairs will be considered later. ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.16.0 - vLLM main: vllm-project/vllm@15d76f7 --------- Signed-off-by: MrZ20 <2609716663@qq.com> Signed-off-by: gcanlin <canlinguosdu@gmail.com> Co-authored-by: Meihan-chen <jcccx.cmh@gmail.com> Co-authored-by: Claude Code <noreply@anthropic.com> Co-authored-by: gcanlin <canlinguosdu@gmail.com>

github-actions bot added the ci/build label Mar 3, 2026

Potabk added ready read for review ready-for-test start test by label for PR labels Mar 3, 2026

MrZ20 force-pushed the main_0302 branch from 4f1d3eb to a6fc5b8 Compare March 3, 2026 03:35

github-actions bot added the merge-conflicts label Mar 3, 2026

MrZ20 force-pushed the main_0302 branch from a6fc5b8 to e06fee4 Compare March 3, 2026 03:42

github-actions bot removed the merge-conflicts label Mar 3, 2026

MrZ20 force-pushed the main_0302 branch from c301141 to 4ab3904 Compare March 3, 2026 12:50

MrZ20 marked this pull request as ready for review March 3, 2026 12:51

MrZ20 requested review from MengqingCao, Yikun, realliujiaxu, wangxiyuan, whx-sjtu and zzzzwwjj as code owners March 3, 2026 12:51

gcanlin requested a review from LCAIZJ as a code owner March 3, 2026 15:20

gcanlin mentioned this pull request Mar 3, 2026

[Release]: Release checklist for v0.16.0rc1 #6970

Closed

33 tasks

MrZ20 force-pushed the main_0302 branch 7 times, most recently from b919e3c to b097c45 Compare March 5, 2026 04:36

wangxiyuan approved these changes Mar 5, 2026

View reviewed changes

Meihan-chen and others added 2 commits March 5, 2026 17:55

MrZ20 and others added 9 commits March 5, 2026 17:55

update

b93ec4d

Signed-off-by: MrZ20 <2609716663@qq.com>

fix unit test

0810856

Signed-off-by: MrZ20 <2609716663@qq.com>

fix

a78e2c3

Signed-off-by: MrZ20 <2609716663@qq.com>

add apply_config_platform_defaults

9feb0b2

Signed-off-by: gcanlin <canlinguosdu@gmail.com>

dockerfile

24e4aef

Signed-off-by: gcanlin <canlinguosdu@gmail.com>

fix

8763085

Signed-off-by: MrZ20 <2609716663@qq.com>

fix lint

0d2c3ca

Signed-off-by: MrZ20 <2609716663@qq.com>

update

6a09856

Signed-off-by: MrZ20 <2609716663@qq.com>

fix

c663504

Signed-off-by: MrZ20 <2609716663@qq.com>

MrZ20 force-pushed the main_0302 branch from b097c45 to c663504 Compare March 5, 2026 10:00

wangxiyuan merged commit bd571cf into vllm-project:main Mar 6, 2026
54 of 55 checks passed

MrZ20 deleted the main_0302 branch March 6, 2026 01:22

Meihan-chen mentioned this pull request Mar 9, 2026

[RFC]: AI-Agent-Driven Main2Main Automation for vllm-ascend #7074

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Main2Main] Upgrade vLLM to 0303#6944

[Main2Main] Upgrade vLLM to 0303#6944
wangxiyuan merged 11 commits intovllm-project:mainfrom
MrZ20:main_0302

MrZ20 commented Mar 3, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot commented Mar 3, 2026

Uh oh!

github-actions bot commented Mar 3, 2026

Uh oh!

github-actions bot commented Mar 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

MrZ20 commented Mar 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

gemini-code-assist bot commented Mar 3, 2026

Uh oh!

github-actions bot commented Mar 3, 2026

Uh oh!

github-actions bot commented Mar 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

MrZ20 commented Mar 3, 2026 •

edited

Loading