[Main2Main] Upgrade vllm commit to 0105 by wjunLu · Pull Request #5595 · vllm-project/vllm-ascend

wjunLu · 2026-01-05T03:17:23Z

What this PR does / why we need it?

Upgrade vllm commit to 0105 (8be6432bdaf6275664d857b1e5e9bf8ed1ce299e)

Remove maybe_padded_num_tokens arg in model_runner_v1.py since [Core] Remove unused num_tokens parameter from _init_model_kwargs vllm#31517 deleted unused arg
Remove dense Qwen/Qwen3-0.6B in tests/e2e/multicard/test_aclgraph_capture_replay.py and tests/e2e/multicard/test_data_parallel.py due to [BugFix] Support online dense model DP without overhead vllm#30739
where offline data parallel mode will not be supported/useful for dense models
Adapt vllm_ascend/worker/worker.py due to [BugFix] Fix async scheduling for pooling models vllm#31584
Adapt self.block_size calling due to [Bugfix] Fix block size used in EAGLE slot mapping vllm#31540
Modify test_mla_v1.py due to [Core] Parse vLLM engine required fields from hf_config to model_arch_config vllm#28454 , which refactorred get_head_size()

Does this PR introduce any user-facing change?

How was this patch tested?

vLLM version: v0.13.0
vLLM main: vllm-project/vllm@7157596

gemini-code-assist

Code Review

This pull request upgrades the vllm dependency to a newer commit. The changes primarily consist of adaptations to the updated vllm API, including modifications to function signatures and return types in the worker and model runner components. Additionally, some tests have been updated to reflect changes in model attributes and to improve test robustness. I've identified a potential bug in vllm_ascend/spec_decode/eagle_proposer.py where an incorrect set of layers is used to determine the attention layer name, which could lead to runtime errors. Other changes appear to be correct and necessary for the upgrade.

vllm_ascend/spec_decode/eagle_proposer.py

github-actions · 2026-01-05T05:39:28Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

github-actions · 2026-01-05T06:32:35Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

Signed-off-by: hfadzxy <starmoon_zhang@163.com> Signed-off-by: wjunLu <wjunlu217@gmail.com>

…to FIA_rebase * 'main' of https://github.com/vllm-project/vllm-ascend: (58 commits) [Main2Main] Upgrade vllm commit to 0106 (vllm-project#5617) [CI]update bisheng version (vllm-project#5621) [UT][PCP&DCP] UT for block_table.py (vllm-project#5032) [Main2Main] Upgrade vllm commit to 0105 (vllm-project#5595) [CI] mv ops to correct path (vllm-project#5615) [BugFix] Fix Smoke Testing Bug for DSR1 longseq (vllm-project#5613) Revert "[Feat] enable hierarchical mc2 ops on A2 by default (vllm-project#5545)" (vllm-project#5611) [TRITON][TEST]Add nightly test for triton split_qkv_rmsnorm_rope (vllm-project#5267) [perf] Fix MLAPO weight disposal for KV-consumer MLA in PD-mix deploy... (vllm-project#5192) [docs] Correct image about prefill phase of PCP (vllm-project#5598) [CI] update triton-ascend version (vllm-project#5584) [P/D]Remove mooncake kvpool unused parameter `local_hostname` (vllm-project#5574) [Bugfix] record cos and sin cache in AscendRotaryEmbedding (vllm-project#5516) [bugfix] fix test_camem failed with triton-ascend (vllm-project#5492) [UT]add triton ops ut : test_fused_qkvzba_split_reshape_cat (vllm-project#5474) [CI] Download models from ms (vllm-project#5405) Docs: Add A3 Docker image guidance for Atlas A3 machines (vllm-project#5256) [Doc] Add NNAL installation guide and requirements (vllm-project#5235) Add the requirement of arctic-inference which speculative decoding with suffix_decode (vllm-project#5045) [BugFix][Fusion] Fix graph fusion failure problem (vllm-project#5253) ...

### What this PR does / why we need it? Upgrade vllm commit to 0105 (8be6432bdaf6275664d857b1e5e9bf8ed1ce299e) 1. Remove `maybe_padded_num_tokens` arg in `model_runner_v1.py` since vllm-project/vllm#31517 deleted unused arg 2. Remove dense `Qwen/Qwen3-0.6B` in `tests/e2e/multicard/test_aclgraph_capture_replay.py` and `tests/e2e/multicard/test_data_parallel.py` due to vllm-project/vllm#30739 where offline data parallel mode will not be supported/useful for dense models 3. Adapt `vllm_ascend/worker/worker.py` due to vllm-project/vllm#31584 4. Adapt `self.block_size` calling due to vllm-project/vllm#31540 5. Modify `test_mla_v1.py` due to vllm-project/vllm#28454 , which refactorred `get_head_size()` ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@7157596 Signed-off-by: wjunLu <wjunlu217@gmail.com>

### What this PR does / why we need it? Upgrade vllm commit to 0105 (8be6432bdaf6275664d857b1e5e9bf8ed1ce299e) 1. Remove `maybe_padded_num_tokens` arg in `model_runner_v1.py` since vllm-project/vllm#31517 deleted unused arg 2. Remove dense `Qwen/Qwen3-0.6B` in `tests/e2e/multicard/test_aclgraph_capture_replay.py` and `tests/e2e/multicard/test_data_parallel.py` due to vllm-project/vllm#30739 where offline data parallel mode will not be supported/useful for dense models 3. Adapt `vllm_ascend/worker/worker.py` due to vllm-project/vllm#31584 4. Adapt `self.block_size` calling due to vllm-project/vllm#31540 5. Modify `test_mla_v1.py` due to vllm-project/vllm#28454 , which refactorred `get_head_size()` ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@7157596 Signed-off-by: wjunLu <wjunlu217@gmail.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>

### What this PR does / why we need it? Upgrade vllm commit to 0105 (8be6432bdaf6275664d857b1e5e9bf8ed1ce299e) 1. Remove `maybe_padded_num_tokens` arg in `model_runner_v1.py` since vllm-project/vllm#31517 deleted unused arg 2. Remove dense `Qwen/Qwen3-0.6B` in `tests/e2e/multicard/test_aclgraph_capture_replay.py` and `tests/e2e/multicard/test_data_parallel.py` due to vllm-project/vllm#30739 where offline data parallel mode will not be supported/useful for dense models 3. Adapt `vllm_ascend/worker/worker.py` due to vllm-project/vllm#31584 4. Adapt `self.block_size` calling due to vllm-project/vllm#31540 5. Modify `test_mla_v1.py` due to vllm-project/vllm#28454 , which refactorred `get_head_size()` ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@7157596 Signed-off-by: wjunLu <wjunlu217@gmail.com>

### What this PR does / why we need it? Upgrade vllm commit to 0105 (8be6432bdaf6275664d857b1e5e9bf8ed1ce299e) 1. Remove `maybe_padded_num_tokens` arg in `model_runner_v1.py` since vllm-project/vllm#31517 deleted unused arg 2. Remove dense `Qwen/Qwen3-0.6B` in `tests/e2e/multicard/test_aclgraph_capture_replay.py` and `tests/e2e/multicard/test_data_parallel.py` due to vllm-project/vllm#30739 where offline data parallel mode will not be supported/useful for dense models 3. Adapt `vllm_ascend/worker/worker.py` due to vllm-project/vllm#31584 4. Adapt `self.block_size` calling due to vllm-project/vllm#31540 5. Modify `test_mla_v1.py` due to vllm-project/vllm#28454 , which refactorred `get_head_size()` ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@7157596 Signed-off-by: wjunLu <wjunlu217@gmail.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>

### What this PR does / why we need it? Upgrade vllm commit to 0105 (8be6432bdaf6275664d857b1e5e9bf8ed1ce299e) 1. Remove `maybe_padded_num_tokens` arg in `model_runner_v1.py` since vllm-project/vllm#31517 deleted unused arg 2. Remove dense `Qwen/Qwen3-0.6B` in `tests/e2e/multicard/test_aclgraph_capture_replay.py` and `tests/e2e/multicard/test_data_parallel.py` due to vllm-project/vllm#30739 where offline data parallel mode will not be supported/useful for dense models 3. Adapt `vllm_ascend/worker/worker.py` due to vllm-project/vllm#31584 4. Adapt `self.block_size` calling due to vllm-project/vllm#31540 5. Modify `test_mla_v1.py` due to vllm-project/vllm#28454 , which refactorred `get_head_size()` ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@7157596 Signed-off-by: wjunLu <wjunlu217@gmail.com>

gemini-code-assist bot reviewed Jan 5, 2026

View reviewed changes

vllm_ascend/spec_decode/eagle_proposer.py Outdated Show resolved Hide resolved

wjunLu added ready-for-test start test by label for PR ready read for review labels Jan 5, 2026

github-actions bot added documentation Improvements or additions to documentation ci/build module:tests labels Jan 5, 2026

wjunLu force-pushed the main0105 branch from 08b9fc6 to a606311 Compare January 5, 2026 04:47

github-actions bot added the merge-conflicts label Jan 5, 2026

Upgrade vllm to 0105

2957779

Signed-off-by: hfadzxy <starmoon_zhang@163.com> Signed-off-by: wjunLu <wjunlu217@gmail.com>

wjunLu force-pushed the main0105 branch from a606311 to 2957779 Compare January 5, 2026 11:42

github-actions bot removed the merge-conflicts label Jan 5, 2026

wangxiyuan approved these changes Jan 6, 2026

View reviewed changes

wangxiyuan merged commit 3cf059a into vllm-project:main Jan 6, 2026
19 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Main2Main] Upgrade vllm commit to 0105#5595

[Main2Main] Upgrade vllm commit to 0105#5595
wangxiyuan merged 1 commit intovllm-project:mainfrom
wjunLu:main0105

wjunLu commented Jan 5, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

github-actions bot commented Jan 5, 2026

Uh oh!

github-actions bot commented Jan 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

wjunLu commented Jan 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

github-actions bot commented Jan 5, 2026

Uh oh!

github-actions bot commented Jan 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

wjunLu commented Jan 5, 2026 •

edited

Loading