[Main2Main] Upgrade to newest vLLM 0204 by zhangxinyuehfad · Pull Request #6510 · vllm-project/vllm-ascend

zhangxinyuehfad · 2026-02-03T11:39:27Z

What this PR does / why we need it?

[Main2Main] Upgrade to newest vLLM 0128

Fix the problem caused by 'tuple' object has no attribute 'job_id' due to [Performance] Support FP8 flashinfer TRTLLM MOE on Qwen3 and Qwen-3next vllm#27492
Fix the problem that all_moe_layers is not equal to vllm.moe_forward, vllm.moe_forward_shared due to [torch.compile] Speed up MOE handling in forward_context vllm#33184
Fix the problem "got multiple values for keyword argument 'add_special_tokens'" due to [Frontend] Use new Renderer for Completions and Tokenize API vllm#32863

Does this PR introduce any user-facing change?

How was this patch tested?

vLLM version: v0.15.0
vLLM main: https://github.com/vllm-project/vllm/commit/v0.15.0

github-actions · 2026-02-03T11:39:42Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

gemini-code-assist · 2026-02-03T11:39:51Z

Summary of Changes

Hello @zhangxinyuehfad, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request primarily focuses on upgrading the vllm_ascend project's compatibility with a more recent version of the core vLLM library. The changes involve adapting import paths for key attention-related components, which have been refactored in newer vLLM releases. This update ensures that vllm_ascend can continue to function correctly with the latest vLLM developments while maintaining support for existing versions.

Highlights

Documentation Update: The versioning_policy.md file has been updated to reflect compatibility with a newer vLLM commit (af9b69f977bd1166ed63c46f9ccbd3a02344ae4f) alongside the v0.15.0 tag.
Conditional Imports for Attention Layers: Multiple Python files now use conditional import logic (if vllm_version_is("v0.15.0")) to correctly import Attention and MLAAttention classes from their respective locations in different vLLM versions. This ensures compatibility with both older (v0.15.0) and newer vLLM API structures.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog

docs/source/community/versioning_policy.md
- Updated the vLLM version entry in the compatibility table to include a specific commit hash (af9b69f977bd1166ed63c46f9ccbd3a02344ae4f) alongside the v0.15.0 tag for the main branch of vLLM Ascend.
tests/ut/quantization/test_modelslim_config.py
- Modified the import of Attention to be conditional based on the vllm_version_is("v0.15.0") utility, addressing a change in the class's module path.
vllm_ascend/compilation/npugraph_ex_passes/graphex_qknorm_rope_fusion_pass.py
- Implemented conditional import for Attention to handle different module locations across vLLM versions.
vllm_ascend/compilation/passes/qknorm_rope_fusion_pass.py
- Updated Attention import to be conditional, ensuring compatibility with varying vLLM API structures.
vllm_ascend/distributed/kv_transfer/kv_pool/cpu_offload/cpu_offload_connector.py
- Adjusted imports for Attention and MLAAttention to be conditional, accommodating their relocation in newer vLLM versions.
vllm_ascend/ops/mla.py
- Changed the import of MLAAttention to be conditional, supporting different module paths.
vllm_ascend/patch/worker/patch_qwen3_next_mtp.py
- Modified the import of Attention to be conditional, adapting to changes in vLLM's internal structure.
vllm_ascend/quantization/modelslim_config.py
- Updated Attention import to be conditional, ensuring correct class resolution across vLLM versions.
vllm_ascend/worker/model_runner_v1.py
- Implemented conditional imports for Attention and MLAAttention to maintain compatibility with different vLLM API versions.

Ignored Files

Ignored by pattern: .github/workflows/** (6)
- .github/workflows/_pre_commit.yml
- .github/workflows/bot_pr_create.yaml
- .github/workflows/dockerfiles/Dockerfile.lint
- .github/workflows/pr_test_full.yaml
- .github/workflows/pr_test_light.yaml
- .github/workflows/schedule_codecov_refresh.yaml

Activity

No specific activity (comments, reviews, etc.) has been recorded for this pull request yet.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request upgrades the vLLM dependency and adapts the codebase to handle a refactoring where the Attention and MLAAttention modules were moved. The changes introduce conditional imports based on the vLLM version. My main feedback is to replace the version-checking logic with a more robust try...except ImportError pattern. This is a standard way to handle such changes and is more maintainable as it doesn't rely on specific version strings.

github-actions · 2026-02-04T00:51:28Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>

Signed-off-by: Meihan-chen <jcccx.cmh@gmail.com>

Signed-off-by: hfadzxy <starmoon_zhang@163.com> Signed-off-by: Meihan-chen <jcccx.cmh@gmail.com>

Signed-off-by: Meihan-chen <jcccx.cmh@gmail.com>

…_special_tokens' Signed-off-by: hfadzxy <starmoon_zhang@163.com>

Signed-off-by: Meihan-chen <jcccx.cmh@gmail.com>

Signed-off-by: hfadzxy <starmoon_zhang@163.com>

github-actions · 2026-02-05T11:35:36Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

zhangxinyuehfad requested review from LCAIZJ, MengqingCao, Yikun, realliujiaxu, wangxiyuan, whx-sjtu, yiz-liu and zzzzwwjj as code owners February 3, 2026 11:39

github-actions bot added documentation Improvements or additions to documentation ci/build module:tests module:ops module:quantization labels Feb 3, 2026

vllm-ascend-ci added ready read for review ready-for-test start test by label for PR labels Feb 3, 2026

gemini-code-assist bot reviewed Feb 3, 2026

View reviewed changes

vllm-ascend-ci added ready-for-test start test by label for PR and removed ready-for-test start test by label for PR labels Feb 3, 2026

zhangxinyuehfad force-pushed the upgrade_main_0228 branch from dc82722 to c8bd3ba Compare February 3, 2026 11:59

zhangxinyuehfad changed the title ~~[Main2Main] Upgrade to newest vLLM 0228~~ [Main2Main] Upgrade to newest vLLM 0128 Feb 3, 2026

github-actions bot added the merge-conflicts label Feb 4, 2026

zhangxinyuehfad force-pushed the upgrade_main_0228 branch from c8bd3ba to 241d636 Compare February 4, 2026 02:49

zhangxinyuehfad requested a review from nalinaly as a code owner February 4, 2026 02:49

github-actions bot removed the merge-conflicts label Feb 4, 2026

zhangxinyuehfad force-pushed the upgrade_main_0228 branch 2 times, most recently from cc50bc9 to e8d4231 Compare February 4, 2026 03:06

[Main2Main] Upgrade to newest vLLM

1b019fd

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>

wangxiyuan and others added 4 commits February 4, 2026 12:04

fix import error

b97032a

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>

fix ut

cb5877a

Signed-off-by: Meihan-chen <jcccx.cmh@gmail.com>

[Main2Main] Upgrade to newest vLLM 0228

f687837

Signed-off-by: hfadzxy <starmoon_zhang@163.com> Signed-off-by: Meihan-chen <jcccx.cmh@gmail.com>

fix AscendMLAImpl Can't instantiate

0334b91

Signed-off-by: Meihan-chen <jcccx.cmh@gmail.com>

zhangxinyuehfad force-pushed the upgrade_main_0228 branch 2 times, most recently from 3efb455 to aa7311c Compare February 4, 2026 10:18

Meihan-chen and others added 5 commits February 5, 2026 10:07

update vllm to 0205

bfccd61

Signed-off-by: Meihan-chen <jcccx.cmh@gmail.com>

fix unexpected num_active_loras and skip hunyuan-vl

c5a5306

Signed-off-by: Meihan-chen <jcccx.cmh@gmail.com>

add hunyuan_vl patch to fix multiple values for keyword argument 'add…

5ade200

…_special_tokens' Signed-off-by: hfadzxy <starmoon_zhang@163.com>

revert commit to ab37478

eb65fe6

Signed-off-by: Meihan-chen <jcccx.cmh@gmail.com>

[Main2Main] upgrade vllm main 0204

695ad57

Signed-off-by: hfadzxy <starmoon_zhang@163.com>

zhangxinyuehfad force-pushed the upgrade_main_0228 branch from aa7311c to 695ad57 Compare February 5, 2026 08:54

zhangxinyuehfad requested a review from weijinqian0 as a code owner February 5, 2026 08:54

vllm-ascend-ci changed the title ~~[Main2Main] Upgrade to newest vLLM 0128~~ [Main2Main] Upgrade to newest vLLM 0204 Feb 5, 2026

github-actions bot added the merge-conflicts label Feb 5, 2026

zhangxinyuehfad removed ready read for review ready-for-test start test by label for PR labels Mar 2, 2026

zhangxinyuehfad closed this Mar 2, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Main2Main] Upgrade to newest vLLM 0204#6510

[Main2Main] Upgrade to newest vLLM 0204#6510
zhangxinyuehfad wants to merge 10 commits intovllm-project:mainfrom
zhangxinyuehfad:upgrade_main_0228

zhangxinyuehfad commented Feb 3, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Feb 3, 2026

Uh oh!

gemini-code-assist bot commented Feb 3, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Feb 4, 2026

Uh oh!

github-actions bot commented Feb 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

zhangxinyuehfad commented Feb 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

github-actions bot commented Feb 3, 2026

Uh oh!

gemini-code-assist bot commented Feb 3, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Feb 4, 2026

Uh oh!

github-actions bot commented Feb 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

zhangxinyuehfad commented Feb 3, 2026 •

edited

Loading