[Main2Main] Upgrade vllm commit to 0109 by zhangxinyuehfad · Pull Request #5752 · vllm-project/vllm-ascend

zhangxinyuehfad · 2026-01-09T03:48:10Z

What this PR does / why we need it?

Upgrade vllm commit to 0109 (bde38c11df0ea066a740efe9b77fff5418be45df)

remove init_cached_hf_modules due to [Chore] Try remove init_cached_hf_modules vllm#31786
fix spec_decode e2e test due to [Perf] Async Scheduling + Speculative Decoding + Structured Outputs vllm#29821 break
fix vllm.v1.attention.backends.utils duo to [Chore] Migrate V0 attention utils vllm#31891
fix self.seq_lens - query_lens on same device due to [Attention][1/n] Remove usage of deprecated seq_lens_cpu and num_computed_tokens_cpu CommonAttentionMetadata properties vllm#31773
skip model_runner_v2 e2e test due to '_OpNamespace' '_C' object has no attribute 'get_cuda_view_from_cpu_tensor'

Does this PR introduce any user-facing change?

How was this patch tested?

vLLM version: v0.13.0
vLLM main: vllm-project/vllm@2f4e654

github-actions · 2026-01-09T03:48:41Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

gemini-code-assist

Code Review

This pull request updates the vllm dependency and introduces compatibility code. The changes look reasonable, but there is a critical issue with how the vllm version is being checked. The new code uses vllm_version_is, which performs an exact version match. This is very brittle and will likely break with older or newer patch/minor versions of vllm. I've left comments with suggestions to use version range comparisons for more robust and future-proof code.

tests/ut/worker/test_worker_v1.py

vllm_ascend/attention/mla_v1.py

vllm_ascend/ops/triton/mamba/causal_conv1d.py

vllm_ascend/worker/worker.py

github-actions · 2026-01-09T08:05:06Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

Signed-off-by: hfadzxy <starmoon_zhang@163.com>

…to eplb_refactor * 'main' of https://github.com/vllm-project/vllm-ascend: [CI] Fix lint CI (vllm-project#5880) [Feature] implement eagle spec decoding for model runner v2 (vllm-project#5840) [Quantization] Support compressed tensors moe w8a8 int8 dynamic weight (vllm-project#5718) [EPLB][Bugfix] Get expert map from layers (vllm-project#5817) [Bugfix] Fixed an accuracy problem of sp with eagle3 (vllm-project#5816) [P/D] bugfix for p node force free requset (vllm-project#5431) [Lint]Style: Convert `example` to `ruff format` (vllm-project#5863) [Main2Main] Upgrade vllm commit to 0109 (vllm-project#5752) [Bugfix][P/D] fix layerwise connector for decoder tp size > num kv heads (vllm-project#5846) [Test][e2e][LoRA] Add more e2e tests to cover scenarios of LoRA (vllm-project#4075) [CustomOp][Perf] Merge Q/K split to simplify AscendApplyRotaryEmb for better performance (vllm-project#5799) [Lint]Style: Convert `root`, `benchmarks`, `tools` and `docs` to `ruff format` (vllm-project#5843) enable ep32 for dispatch_ffn_combine (vllm-project#5787)

### What this PR does / why we need it? Upgrade vllm commit to 0109 (bde38c11df0ea066a740efe9b77fff5418be45df) 1. remove `init_cached_hf_modules ` due to vllm-project/vllm#31786 2. fix spec_decode e2e test due to vllm-project/vllm#29821 break 3. fix `vllm.v1.attention.backends.utils` duo to vllm-project/vllm#31891 4. fix `self.seq_lens - query_lens` on same device due to vllm-project/vllm#31773 5. skip model_runner_v2 e2e test due to `'_OpNamespace' '_C' object has no attribute 'get_cuda_view_from_cpu_tensor'` - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@2f4e654 Signed-off-by: hfadzxy <starmoon_zhang@163.com>

### What this PR does / why we need it? Upgrade vllm commit to 0109 (bde38c11df0ea066a740efe9b77fff5418be45df) 1. remove `init_cached_hf_modules ` due to vllm-project/vllm#31786 2. fix spec_decode e2e test due to vllm-project/vllm#29821 break 3. fix `vllm.v1.attention.backends.utils` duo to vllm-project/vllm#31891 4. fix `self.seq_lens - query_lens` on same device due to vllm-project/vllm#31773 5. skip model_runner_v2 e2e test due to `'_OpNamespace' '_C' object has no attribute 'get_cuda_view_from_cpu_tensor'` - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@2f4e654 Signed-off-by: hfadzxy <starmoon_zhang@163.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>

### What this PR does / why we need it? Upgrade vllm commit to 0109 (bde38c11df0ea066a740efe9b77fff5418be45df) 1. remove `init_cached_hf_modules ` due to vllm-project/vllm#31786 2. fix spec_decode e2e test due to vllm-project/vllm#29821 break 3. fix `vllm.v1.attention.backends.utils` duo to vllm-project/vllm#31891 4. fix `self.seq_lens - query_lens` on same device due to vllm-project/vllm#31773 5. skip model_runner_v2 e2e test due to `'_OpNamespace' '_C' object has no attribute 'get_cuda_view_from_cpu_tensor'` - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@2f4e654 Signed-off-by: hfadzxy <starmoon_zhang@163.com>

### What this PR does / why we need it? Upgrade vllm commit to 0109 (bde38c11df0ea066a740efe9b77fff5418be45df) 1. remove `init_cached_hf_modules ` due to vllm-project/vllm#31786 2. fix spec_decode e2e test due to vllm-project/vllm#29821 break 3. fix `vllm.v1.attention.backends.utils` duo to vllm-project/vllm#31891 4. fix `self.seq_lens - query_lens` on same device due to vllm-project/vllm#31773 5. skip model_runner_v2 e2e test due to `'_OpNamespace' '_C' object has no attribute 'get_cuda_view_from_cpu_tensor'` - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@2f4e654 Signed-off-by: hfadzxy <starmoon_zhang@163.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>

### What this PR does / why we need it? Upgrade vllm commit to 0109 (bde38c11df0ea066a740efe9b77fff5418be45df) 1. remove `init_cached_hf_modules ` due to vllm-project/vllm#31786 2. fix spec_decode e2e test due to vllm-project/vllm#29821 break 3. fix `vllm.v1.attention.backends.utils` duo to vllm-project/vllm#31891 4. fix `self.seq_lens - query_lens` on same device due to vllm-project/vllm#31773 5. skip model_runner_v2 e2e test due to `'_OpNamespace' '_C' object has no attribute 'get_cuda_view_from_cpu_tensor'` - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@2f4e654 Signed-off-by: hfadzxy <starmoon_zhang@163.com>

- Remove is_skipped flag from tests/e2e/singlecard/model_runner_v2/test_basic.py - Test was originally skipped due to get_cuda_view_from_cpu_tensor error (vllm-project#5752) - Recent model_runner_v2 improvements may have resolved the issue: - vllm-project#7110: Added aclgraph support - vllm-project#7496: Optimized post_update performance - vllm-project#7221: Optimized _topk_log_softmax_kernel performance - CI will verify if the test now passes successfully Signed-off-by: hejianping <hejianping7@huawei.com>

- Remove is_skipped flag from tests/e2e/singlecard/model_runner_v2/test_basic.py - Test was originally skipped due to get_cuda_view_from_cpu_tensor error (vllm-project#5752) - Recent model_runner_v2 improvements may have resolved the issue: - vllm-project#7110: Added aclgraph support - vllm-project#7496: Optimized post_update performance - vllm-project#7221: Optimized _topk_log_softmax_kernel performance - CI will verify if test now passes successfully Signed-off-by: hejianping <hejianping7@huawei.com>

- Remove is_skipped flag from tests/e2e/singlecard/model_runner_v2/test_basic.py - Test was originally skipped due to get_cuda_view_from_cpu_tensor error (vllm-project#5752) - Recent model_runner_v2 improvements may have resolved the issue: - vllm-project#7110: Added aclgraph support - vllm-project#7496: Optimized post_update performance - vllm-project#7221: Optimized _topk_log_softmax_kernel performance - CI will verify if the test now passes successfully Signed-off-by: hejianping <hejianping7@huawei.com>

vllm-ascend-ci added ready read for review ready-for-test start test by label for PR labels Jan 9, 2026

github-actions bot added documentation Improvements or additions to documentation ci/build module:tests module:ops labels Jan 9, 2026

gemini-code-assist bot reviewed Jan 9, 2026

View reviewed changes

tests/ut/worker/test_worker_v1.py Show resolved Hide resolved

vllm_ascend/attention/mla_v1.py Show resolved Hide resolved

vllm_ascend/ops/triton/mamba/causal_conv1d.py Show resolved Hide resolved

vllm_ascend/worker/worker.py Show resolved Hide resolved

zhangxinyuehfad force-pushed the main0109 branch 2 times, most recently from 1d83826 to 07020bf Compare January 9, 2026 06:04

github-actions bot added the merge-conflicts label Jan 9, 2026

zhangxinyuehfad force-pushed the main0109 branch from 07020bf to c5650dd Compare January 9, 2026 09:48

github-actions bot removed the merge-conflicts label Jan 9, 2026

zhangxinyuehfad force-pushed the main0109 branch 8 times, most recently from 485d307 to 1b6c2e7 Compare January 13, 2026 02:35

[Main2Main] Upgrade vllm commit to 0109

8482bbd

Signed-off-by: hfadzxy <starmoon_zhang@163.com>

zhangxinyuehfad force-pushed the main0109 branch from 1b6c2e7 to 8482bbd Compare January 13, 2026 04:20

wangxiyuan approved these changes Jan 13, 2026

View reviewed changes

wangxiyuan merged commit f7b9046 into vllm-project:main Jan 13, 2026
21 of 23 checks passed

zhangxinyuehfad deleted the main0109 branch March 19, 2026 02:09

winson-00178005 mentioned this pull request Mar 26, 2026

[CI] improve test partition algorithm for better load balancing #7588

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Main2Main] Upgrade vllm commit to 0109#5752

[Main2Main] Upgrade vllm commit to 0109#5752
wangxiyuan merged 1 commit intovllm-project:mainfrom
zhangxinyuehfad:main0109

zhangxinyuehfad commented Jan 9, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Jan 9, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Jan 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

zhangxinyuehfad commented Jan 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

github-actions bot commented Jan 9, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Jan 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

zhangxinyuehfad commented Jan 9, 2026 •

edited

Loading