Skip to content

[Main2Main] Upgrade to newest vLLM 0204#6510

Closed
zhangxinyuehfad wants to merge 10 commits intovllm-project:mainfrom
zhangxinyuehfad:upgrade_main_0228
Closed

[Main2Main] Upgrade to newest vLLM 0204#6510
zhangxinyuehfad wants to merge 10 commits intovllm-project:mainfrom
zhangxinyuehfad:upgrade_main_0228

Conversation

@zhangxinyuehfad
Copy link
Copy Markdown
Collaborator

@zhangxinyuehfad zhangxinyuehfad commented Feb 3, 2026

What this PR does / why we need it?

[Main2Main] Upgrade to newest vLLM 0128

  1. Fix the problem caused by 'tuple' object has no attribute 'job_id' due to [Performance] Support FP8 flashinfer TRTLLM MOE on Qwen3 and Qwen-3next vllm#27492
  2. Fix the problem that all_moe_layers is not equal to vllm.moe_forward, vllm.moe_forward_shared due to [torch.compile] Speed up MOE handling in forward_context vllm#33184
  3. Fix the problem "got multiple values for keyword argument 'add_special_tokens'" due to [Frontend] Use new Renderer for Completions and Tokenize API vllm#32863

Does this PR introduce any user-facing change?

How was this patch tested?

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Feb 3, 2026

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

  • A PR should do only one thing, smaller PRs enable faster reviews.
  • Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
  • Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello @zhangxinyuehfad, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request primarily focuses on upgrading the vllm_ascend project's compatibility with a more recent version of the core vLLM library. The changes involve adapting import paths for key attention-related components, which have been refactored in newer vLLM releases. This update ensures that vllm_ascend can continue to function correctly with the latest vLLM developments while maintaining support for existing versions.

Highlights

  • Documentation Update: The versioning_policy.md file has been updated to reflect compatibility with a newer vLLM commit (af9b69f977bd1166ed63c46f9ccbd3a02344ae4f) alongside the v0.15.0 tag.
  • Conditional Imports for Attention Layers: Multiple Python files now use conditional import logic (if vllm_version_is("v0.15.0")) to correctly import Attention and MLAAttention classes from their respective locations in different vLLM versions. This ensures compatibility with both older (v0.15.0) and newer vLLM API structures.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • docs/source/community/versioning_policy.md
    • Updated the vLLM version entry in the compatibility table to include a specific commit hash (af9b69f977bd1166ed63c46f9ccbd3a02344ae4f) alongside the v0.15.0 tag for the main branch of vLLM Ascend.
  • tests/ut/quantization/test_modelslim_config.py
    • Modified the import of Attention to be conditional based on the vllm_version_is("v0.15.0") utility, addressing a change in the class's module path.
  • vllm_ascend/compilation/npugraph_ex_passes/graphex_qknorm_rope_fusion_pass.py
    • Implemented conditional import for Attention to handle different module locations across vLLM versions.
  • vllm_ascend/compilation/passes/qknorm_rope_fusion_pass.py
    • Updated Attention import to be conditional, ensuring compatibility with varying vLLM API structures.
  • vllm_ascend/distributed/kv_transfer/kv_pool/cpu_offload/cpu_offload_connector.py
    • Adjusted imports for Attention and MLAAttention to be conditional, accommodating their relocation in newer vLLM versions.
  • vllm_ascend/ops/mla.py
    • Changed the import of MLAAttention to be conditional, supporting different module paths.
  • vllm_ascend/patch/worker/patch_qwen3_next_mtp.py
    • Modified the import of Attention to be conditional, adapting to changes in vLLM's internal structure.
  • vllm_ascend/quantization/modelslim_config.py
    • Updated Attention import to be conditional, ensuring correct class resolution across vLLM versions.
  • vllm_ascend/worker/model_runner_v1.py
    • Implemented conditional imports for Attention and MLAAttention to maintain compatibility with different vLLM API versions.
Ignored Files
  • Ignored by pattern: .github/workflows/** (6)
    • .github/workflows/_pre_commit.yml
    • .github/workflows/bot_pr_create.yaml
    • .github/workflows/dockerfiles/Dockerfile.lint
    • .github/workflows/pr_test_full.yaml
    • .github/workflows/pr_test_light.yaml
    • .github/workflows/schedule_codecov_refresh.yaml
Activity
  • No specific activity (comments, reviews, etc.) has been recorded for this pull request yet.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@vllm-ascend-ci vllm-ascend-ci added ready read for review ready-for-test start test by label for PR labels Feb 3, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request upgrades the vLLM dependency and adapts the codebase to handle a refactoring where the Attention and MLAAttention modules were moved. The changes introduce conditional imports based on the vLLM version. My main feedback is to replace the version-checking logic with a more robust try...except ImportError pattern. This is a standard way to handle such changes and is more maintainable as it doesn't rely on specific version strings.

Comment thread tests/ut/quantization/test_modelslim_config.py
Comment thread vllm_ascend/compilation/passes/qknorm_rope_fusion_pass.py
Comment thread vllm_ascend/ops/mla.py
Comment thread vllm_ascend/patch/worker/patch_qwen3_next_mtp.py
Comment thread vllm_ascend/quantization/modelslim_config.py
Comment thread vllm_ascend/worker/model_runner_v1.py
@vllm-ascend-ci vllm-ascend-ci added ready-for-test start test by label for PR and removed ready-for-test start test by label for PR labels Feb 3, 2026
@zhangxinyuehfad zhangxinyuehfad changed the title [Main2Main] Upgrade to newest vLLM 0228 [Main2Main] Upgrade to newest vLLM 0128 Feb 3, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Feb 4, 2026

This pull request has conflicts, please resolve those before we can evaluate the pull request.

@zhangxinyuehfad zhangxinyuehfad force-pushed the upgrade_main_0228 branch 2 times, most recently from cc50bc9 to e8d4231 Compare February 4, 2026 03:06
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
wangxiyuan and others added 4 commits February 4, 2026 12:04
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
Signed-off-by: Meihan-chen <jcccx.cmh@gmail.com>
Signed-off-by: hfadzxy <starmoon_zhang@163.com>
Signed-off-by: Meihan-chen <jcccx.cmh@gmail.com>
Signed-off-by: Meihan-chen <jcccx.cmh@gmail.com>
@zhangxinyuehfad zhangxinyuehfad force-pushed the upgrade_main_0228 branch 2 times, most recently from 3efb455 to aa7311c Compare February 4, 2026 10:18
Meihan-chen and others added 5 commits February 5, 2026 10:07
Signed-off-by: Meihan-chen <jcccx.cmh@gmail.com>
Signed-off-by: Meihan-chen <jcccx.cmh@gmail.com>
…_special_tokens'

Signed-off-by: hfadzxy <starmoon_zhang@163.com>
Signed-off-by: Meihan-chen <jcccx.cmh@gmail.com>
Signed-off-by: hfadzxy <starmoon_zhang@163.com>
@vllm-ascend-ci vllm-ascend-ci changed the title [Main2Main] Upgrade to newest vLLM 0128 [Main2Main] Upgrade to newest vLLM 0204 Feb 5, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Feb 5, 2026

This pull request has conflicts, please resolve those before we can evaluate the pull request.

@zhangxinyuehfad zhangxinyuehfad removed ready read for review ready-for-test start test by label for PR labels Mar 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants