[Misc] Upgrade vllm version to 0408 by Potabk · Pull Request #8060 · vllm-project/vllm-ascend

Potabk · 2026-04-08T14:30:20Z

What this PR does / why we need it?

For the fusedmoe:
vllm-project/vllm#33049
vllm-project/vllm#35949
FusedMoe refactor

For the qwen3_vl:
vllm-project/vllm#34539
A new Triton kernel has been added for fast rope position encoding. I've added a patch to fallback to native. We'll consider registering custom operators and implementing ascending later.

vllm-project/vllm#38361

Does this PR introduce any user-facing change?

How was this patch tested?

vLLM version:
vLLM main: vllm-project/vllm@29e4870

gemini-code-assist · 2026-04-08T14:30:33Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request updates the vLLM dependency to a newer commit and aligns the Ascend-specific implementation with recent upstream architectural changes. Key adjustments include refactoring the MoE runner to handle gate parameter storage correctly and cleaning up the input batch initialization logic to reduce code duplication and improve maintainability.

Highlights

vLLM Version Upgrade: Updated the vLLM dependency to commit 5af684c31912232e5c89484c2e8259e0fac6c55b across all Dockerfiles and documentation.
MoE Runner Refactoring: Refactored AscendMoERunner to accommodate upstream changes in how gate parameters are stored, ensuring compatibility with the updated FusedMoE implementation.
Input Batch Optimization: Simplified NPU input batch initialization by leveraging the upstream base class and streamlining NPU-specific tracking structures.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Ignored Files

Ignored by pattern: .github/workflows/** (6)
- .github/workflows/bot_pr_create.yaml
- .github/workflows/dockerfiles/Dockerfile.lint
- .github/workflows/pr_test_full.yaml
- .github/workflows/pr_test_light.yaml
- .github/workflows/schedule_codecov_refresh.yaml
- .github/workflows/schedule_test_benchmarks.yaml

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

github-actions · 2026-04-08T14:30:38Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

gemini-code-assist

Code Review

This pull request updates the vLLM commit hash across multiple Dockerfiles and documentation files to synchronize with the latest upstream changes. It also refactors the MoE runner initialization in vllm_ascend to resolve circular dependencies and potential AttributeError issues by bypassing the FusedMoE.shared_experts property during object construction. Additionally, it cleans up unused imports and refines the NPUInputBatch initialization logic to improve compatibility with upstream structures. The review comments identified critical circular dependency risks in the _init_runner methods, suggesting safer getattr usage to prevent initialization failures.

github-actions · 2026-04-09T01:53:53Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

Potabk · 2026-04-10T06:36:44Z

The full test passed here https://github.com/vllm-project/vllm-ascend/actions/runs/24225696146/job/70726469550?pr=8060

Signed-off-by: wangli <wangli858794774@gmail.com>

Potabk · 2026-04-10T08:17:23Z

@@ -49,9 +49,7 @@ RUN pip config set global.index-url ${PIP_INDEX_URL} && \
 # Install vLLM
 ARG VLLM_REPO=https://github.com/vllm-project/vllm.git
 ARG VLLM_COMMIT=v0.19.0


Signed-off-by: wangli <wangli858794774@gmail.com>

### What this PR does / why we need it? For the fusedmoe: vllm-project/vllm#33049 vllm-project/vllm#35949 FusedMoe refactor For the qwen3_vl: vllm-project/vllm#34539 A new Triton kernel has been added for fast rope position encoding. I've added a patch to fallback to native. We'll consider registering custom operators and implementing ascending later. vllm-project/vllm#38361 ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: - vLLM main: vllm-project/vllm@29e4870 --------- Signed-off-by: wangli <wangli858794774@gmail.com>

### What this PR does / why we need it? For the fusedmoe: vllm-project/vllm#33049 vllm-project/vllm#35949 FusedMoe refactor For the qwen3_vl: vllm-project/vllm#34539 A new Triton kernel has been added for fast rope position encoding. I've added a patch to fallback to native. We'll consider registering custom operators and implementing ascending later. vllm-project/vllm#38361 ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: - vLLM main: vllm-project/vllm@29e4870 --------- Signed-off-by: wangli <wangli858794774@gmail.com> Signed-off-by: guxin108 <1252896542@qq.com>

### What this PR does / why we need it? For the fusedmoe: vllm-project/vllm#33049 vllm-project/vllm#35949 FusedMoe refactor For the qwen3_vl: vllm-project/vllm#34539 A new Triton kernel has been added for fast rope position encoding. I've added a patch to fallback to native. We'll consider registering custom operators and implementing ascending later. vllm-project/vllm#38361 ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: - vLLM main: vllm-project/vllm@29e4870 --------- Signed-off-by: wangli <wangli858794774@gmail.com> Signed-off-by: zouyida2052 <zouyida2002@gmail.com>

Potabk requested review from LCAIZJ, MengqingCao, Yikun, realliujiaxu, wangxiyuan, whx-sjtu and zzzzwwjj as code owners April 8, 2026 14:30

github-actions Bot added documentation Improvements or additions to documentation ci/build module:ops labels Apr 8, 2026

gemini-code-assist Bot reviewed Apr 8, 2026

View reviewed changes

Comment thread vllm_ascend/_310p/fused_moe/fused_moe.py Outdated

Comment thread vllm_ascend/ops/fused_moe/fused_moe.py Outdated

Potabk force-pushed the 0408 branch from cee4379 to 9e8f2a2 Compare April 8, 2026 14:35

Potabk added ready read for review ready-for-test start test by label for PR labels Apr 8, 2026

github-actions Bot added the merge-conflicts label Apr 9, 2026

Potabk force-pushed the 0408 branch from 50e705a to 2a7a616 Compare April 9, 2026 06:49

github-actions Bot removed the merge-conflicts label Apr 9, 2026

Potabk force-pushed the 0408 branch 2 times, most recently from 7819761 to 786cde3 Compare April 10, 2026 04:01

leo-pony reviewed Apr 10, 2026

View reviewed changes

Comment thread .github/workflows/_e2e_test.yaml Outdated

Potabk added 2 commits April 10, 2026 14:49

support vllm 0408

a80bf64

Signed-off-by: wangli <wangli858794774@gmail.com>

mock triton in ut

6408a8c

Signed-off-by: wangli <wangli858794774@gmail.com>

Potabk force-pushed the 0408 branch from 786cde3 to 6408a8c Compare April 10, 2026 06:50

revert continue on error

7a50a64

Signed-off-by: wangli <wangli858794774@gmail.com>

Potabk commented Apr 10, 2026

View reviewed changes

wangxiyuan approved these changes Apr 10, 2026

View reviewed changes

rename dockerfile env

44330e2

Signed-off-by: wangli <wangli858794774@gmail.com>

wangxiyuan merged commit 3736ad8 into vllm-project:main Apr 10, 2026
51 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Misc] Upgrade vllm version to 0408#8060

[Misc] Upgrade vllm version to 0408#8060
wangxiyuan merged 4 commits intovllm-project:mainfrom
Potabk:0408

Potabk commented Apr 8, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot commented Apr 8, 2026

Uh oh!

github-actions Bot commented Apr 8, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented Apr 9, 2026

Uh oh!

Potabk commented Apr 10, 2026

Uh oh!

Uh oh!

Potabk Apr 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Potabk commented Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

gemini-code-assist Bot commented Apr 8, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

github-actions Bot commented Apr 8, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented Apr 9, 2026

Uh oh!

Potabk commented Apr 10, 2026

Uh oh!

Uh oh!

Potabk Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Potabk commented Apr 8, 2026 •

edited

Loading