[Misc]Main2main to 0420 by Meihan-chen · Pull Request #8610 · vllm-project/vllm-ascend

Meihan-chen · 2026-04-23T03:34:24Z

What this PR does / why we need it?

fix ModuleNotFoundError: No module named 'vllm.model_executor.layers.fused_moe.runner.default_moe_runner' due to [MoE Refactor] Combine MoERunnerBase + DefaultMoERunner vllm#40560

Does this PR introduce any user-facing change?

How was this patch tested?

vLLM version: v0.19.0
vLLM main: vllm-project/vllm@6f786f2

**Commit range:** `6f786f2`..`d886c26` | Priority | Change | vLLM File | vllm-ascend File | Description | |:---|:---|:---|:---|:---| | P0 | Zero-expert logical expert boundary | `vllm/model_executor/layers/fused_moe/layer.py` | `vllm_ascend/ops/fused_moe/fused_moe.py`, `vllm_ascend/_310p/fused_moe/fused_moe.py`, `vllm_ascend/quantization/methods/w8a8_dynamic.py`, `vllm_ascend/_310p/quantization/methods/w8a8_dynamic.py` | Use `logical_num_experts` when available so Ascend zero-expert handling stays compatible with the upstream FusedMoE router changes on main. | | P1 | Main2main commit reference bump | `docs/source/conf.py`, `.github/workflows/pr_test_full.yaml`, `.github/workflows/pr_test_light.yaml`, `.github/workflows/dockerfiles/Dockerfile.lint` | same | Update main-branch vLLM commit references from `6f786f2c506cb07f4566771fdc62e640e2c4a176` to `d886c26d4d4fef7d079696beb4ece1cfb4b008a8`. | - `.github/workflows/dockerfiles/Dockerfile.lint` - `.github/workflows/pr_test_full.yaml` - `.github/workflows/pr_test_light.yaml` - `docs/source/conf.py` - `vllm_ascend/ops/fused_moe/fused_moe.py` - `vllm_ascend/_310p/fused_moe/fused_moe.py` - `vllm_ascend/quantization/methods/w8a8_dynamic.py` - `vllm_ascend/_310p/quantization/methods/w8a8_dynamic.py` Signed-off-by: Meihan-chen <jcccx.cmh@gmail.com>

CI Fix Summary (run ID: 24638481321) Commit range: 6f786f2..f150107 Issues Fixed Error Upstream Cause Commit Affected Files Fix Description AttributeError: 'float' object has no attribute 'language_model' c08f3b2a — Measure encoder compile time seperate from llm backbone vllm_ascend/worker/worker.py, tests/ut/worker/test_worker_v1.py Return CompilationTimes on vLLM main, keep scalar return for v0.19.0 via vllm_version_is("0.19.0"). Signed-off-by: Meihan-chen <jcccx.cmh@gmail.com>

Signed-off-by: Meihan-chen <jcccx.cmh@gmail.com>

github-actions · 2026-04-23T03:34:43Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

gemini-code-assist · 2026-04-23T03:34:50Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request updates the codebase to maintain compatibility with newer versions of vLLM while preserving support for v0.19.0. It primarily addresses breaking changes in model component structures and data processing logic by introducing version-aware conditional checks. Additionally, it refines dependency management and internal API usage to ensure stability across different vLLM releases.

Highlights

vLLM Version Compatibility: Introduced a vllm_version_is utility to handle conditional logic for supporting both vLLM v0.19.0 and newer versions.
Refactored Model Components: Updated IndexerWrapper and related attention modules to accommodate upstream changes where wk and weights_proj were fused into wk_weights_proj.
Data Processing Updates: Adjusted how max_tokens_across_dp is calculated to support different data structures across vLLM versions.
Dependency Update: Constrained the transformers dependency to transformers>=4.57.4, <5.0 in requirements.txt.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Ignored Files

Ignored by pattern: .github/workflows/** (4)
- .github/workflows/_e2e_test.yaml
- .github/workflows/dockerfiles/Dockerfile.lint
- .github/workflows/pr_test_full.yaml
- .github/workflows/pr_test_light.yaml

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request updates the vLLM dependency to v0.19.0 and implements compatibility logic for fused MLA weights, updated dp_metadata fields, and changes to the CompilationTimes structure. Suggested PR Title: \nmarkdown\n[Ops][Test][Doc][Misc] Update vLLM to v0.19.0 and handle fused MLA weights\n\nSuggested PR Summary: \nmarkdown\n### What this PR does / why we need it?\nThis PR updates the vLLM commit hash to v0.19.0 and introduces compatibility changes for fused `wk` and `weights_proj` in MLA/SFA, `dp_metadata` field changes, and `CompilationTimes` structure updates.\n\n### Does this PR introduce _any_ user-facing change?\nNo.\n\n### How was this patch tested?\nCI passed with updated unit tests supporting the new vLLM version.\n\nFeedback: The reviewer identified a bug in worker.py where compilation_config is accessed incorrectly and noted that the logic for handling fused weights appears to be inverted across several modules (v0.19.0 should use the fused wk_weights_proj attribute). Additionally, the reviewer recommended using version range checks instead of exact equality for better future compatibility.

gemini-code-assist · 2026-04-23T03:37:15Z

+    "main_vllm_commit": "ccaf5ffaa3e1fb2a081b2c9e403ac0e4dfc142c8",
    # vLLM tag for main branch
    "main_vllm_tag": "v0.19.0",


The Pull Request title and summary do not adhere to the repository style guide. Please update them according to the following suggestions:

Suggested PR Title:

[Ops][Test][Doc][Misc] Update vLLM to v0.19.0 and handle fused MLA weights

Suggested PR Summary:

### What this PR does / why we need it? This PR updates the vLLM commit hash to a newer version of v0.19.0 and introduces compatibility changes to handle breaking changes in upstream vLLM, including: - Fused `wk` and `weights_proj` in MLA/SFA. - Changes in `dp_metadata` fields. - Changes in `CompilationTimes` structure. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? CI passed with updated unit tests supporting the new vLLM version.

References

PR Title and Summary must follow a specific format and be provided in markdown code blocks. ^(link)

gemini-code-assist · 2026-04-23T03:37:15Z

+        return CompilationTimes(
+            language_model=self.vllm_config.compilation_config.compilation_time,
+            encoder=self.compilation_config.encoder_compilation_time,
+        )


There is a bug on line 484: self.compilation_config is not a valid attribute of NPUWorker. It should be accessed via self.vllm_config.compilation_config.

Suggested change

return CompilationTimes(

language_model=self.vllm_config.compilation_config.compilation_time,

encoder=self.compilation_config.encoder_compilation_time,

)

return CompilationTimes(

language_model=self.vllm_config.compilation_config.compilation_time,

encoder=self.vllm_config.compilation_config.encoder_compilation_time,

)

gemini-code-assist · 2026-04-23T03:37:16Z

+        if vllm_version_is("0.19.0"):
+            self.wk = vllm_indexer.wk
+            self.weights_proj = vllm_indexer.weights_proj
+        else:
+            self.wk_weights_proj = vllm_indexer.wk_weights_proj


The logic for handling fused weights appears to be inverted. The comment states that upstream fused wk and weights_proj into wk_weights_proj. Since this PR targets vLLM v0.19.0, the if vllm_version_is("0.19.0") block should use the fused weight, while the else block should handle older versions.

Additionally, using vllm_version_is for exact equality is fragile as it will fail for future versions (e.g., 0.19.1). Consider using a version range check if available.

Suggested change

if vllm_version_is("0.19.0"):

self.wk = vllm_indexer.wk

self.weights_proj = vllm_indexer.weights_proj

else:

self.wk_weights_proj = vllm_indexer.wk_weights_proj

if vllm_version_is("0.19.0"):

self.wk_weights_proj = vllm_indexer.wk_weights_proj

else:

self.wk = vllm_indexer.wk

self.weights_proj = vllm_indexer.weights_proj

gemini-code-assist · 2026-04-23T03:37:16Z

+        if vllm_version_is("0.19.0"):
+            self.wk = self.indexer.wk
+            self.weights_proj = self.indexer.weights_proj
+        else:
+            self.wk_weights_proj = self.indexer.wk_weights_proj


The logic for fused weights appears to be inverted here, which is inconsistent with the logic used in ascend_forward_context.py. If v0.19.0 is the version that introduced fused weights, it should use wk_weights_proj.

Suggested change

if vllm_version_is("0.19.0"):

self.wk = self.indexer.wk

self.weights_proj = self.indexer.weights_proj

else:

self.wk_weights_proj = self.indexer.wk_weights_proj

if vllm_version_is("0.19.0"):

self.wk_weights_proj = self.indexer.wk_weights_proj

else:

self.wk = self.indexer.wk

self.weights_proj = self.indexer.weights_proj

gemini-code-assist · 2026-04-23T03:37:16Z

+        if vllm_version_is("0.19.0"):
+            k_li, _ = self.wk(x)  # [b,s,7168] @ [7168,128] = [b,s,128]
+        else:
+            kw, _ = self.wk_weights_proj(x)
+            k_li = kw[:, : self.head_dim]


The logic for fused weights appears to be inverted. If v0.19.0 has fused weights, it should use wk_weights_proj.

Suggested change

if vllm_version_is("0.19.0"):

k_li, _ = self.wk(x) # [b,s,7168] @ [7168,128] = [b,s,128]

else:

kw, _ = self.wk_weights_proj(x)

k_li = kw[:, : self.head_dim]

if vllm_version_is("0.19.0"):

kw, _ = self.wk_weights_proj(x)

k_li = kw[:, : self.head_dim]

else:

k_li, _ = self.wk(x) # [b,s,7168] @ [7168,128] = [b,s,128]

gemini-code-assist · 2026-04-23T03:37:16Z

+        if vllm_version_is("0.19.0"):
+            weights, _ = self.weights_proj(x)
+        else:
+            kw, _ = self.wk_weights_proj(x)
+            weights = kw[:, self.head_dim :]


The logic for fused weights appears to be inverted. If v0.19.0 has fused weights, it should use wk_weights_proj.

Suggested change

if vllm_version_is("0.19.0"):

weights, _ = self.weights_proj(x)

else:

kw, _ = self.wk_weights_proj(x)

weights = kw[:, self.head_dim :]

if vllm_version_is("0.19.0"):

kw, _ = self.wk_weights_proj(x)

weights = kw[:, self.head_dim :]

else:

weights, _ = self.weights_proj(x)

gemini-code-assist · 2026-04-23T03:37:16Z

+if not vllm_version_is("0.19.0"):
+    from vllm.v1.worker.worker_base import CompilationTimes  # noqa: E402


Using vllm_version_is("0.19.0") for conditional imports and logic is fragile. If a breaking change was introduced in 0.19.0 and persists in future versions, this check will fail for versions like 0.19.1 or 0.20.0. It is recommended to use a version range check to handle future versions correctly.

Signed-off-by: Meihan-chen <jcccx.cmh@gmail.com>

Signed-off-by: Meihan-chen <jcccx.cmh@gmail.com> Signed-off-by: Meihan-chen <zr010426ztt@outlook.com>

Signed-off-by: Meihan-chen <zr010426ztt@outlook.com>

Meihan-chen added 6 commits April 20, 2026 14:55

fix

8f8796b

Signed-off-by: Meihan-chen <jcccx.cmh@gmail.com>

fix DPMetadata and Indexer AttributeError

a3d9883

Signed-off-by: Meihan-chen <jcccx.cmh@gmail.com>

fix ut

6c6b91c

Signed-off-by: Meihan-chen <jcccx.cmh@gmail.com>

update to 0423

6e28c78

Signed-off-by: Meihan-chen <jcccx.cmh@gmail.com>

Meihan-chen requested review from LCAIZJ, MengqingCao, Yikun, realliujiaxu, wangxiyuan, weijinqian0, whx-sjtu and zzzzwwjj as code owners April 23, 2026 03:34

github-actions Bot added documentation Improvements or additions to documentation ci/build module:tests module:ops module:core labels Apr 23, 2026

Meihan-chen added ready read for review ready-for-test start test by label for PR and removed documentation Improvements or additions to documentation ci/build module:tests module:ops module:core labels Apr 23, 2026

gemini-code-assist Bot reviewed Apr 23, 2026

View reviewed changes

fix MOErefactor

e6d2eae

Signed-off-by: Meihan-chen <jcccx.cmh@gmail.com>

fix MOErefactor

3b5c6ea

Signed-off-by: Meihan-chen <jcccx.cmh@gmail.com> Signed-off-by: Meihan-chen <zr010426ztt@outlook.com>

Meihan-chen force-pushed the main2main_0423 branch from 1731f6b to 3b5c6ea Compare April 24, 2026 10:22

Meihan-chen requested a review from paulyu12 as a code owner April 25, 2026 03:13

fix MOErefactor and Lora

7f497ba

Signed-off-by: Meihan-chen <zr010426ztt@outlook.com>

Meihan-chen force-pushed the main2main_0423 branch from 3fdf6ab to 7f497ba Compare April 25, 2026 09:59

fix AscendRMSNormGated activation

bfb5ae3

Signed-off-by: Meihan-chen <zr010426ztt@outlook.com>

Meihan-chen closed this Apr 27, 2026

Meihan-chen changed the title ~~[Misc]Main2main to 0423~~ [Misc]Main2main to 0420 Apr 27, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Misc]Main2main to 0420#8610

[Misc]Main2main to 0420#8610
Meihan-chen wants to merge 10 commits intovllm-project:mainfrom
Meihan-chen:main2main_0423

Meihan-chen commented Apr 23, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Apr 23, 2026

Uh oh!

gemini-code-assist Bot commented Apr 23, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Apr 23, 2026

Uh oh!

gemini-code-assist Bot Apr 23, 2026

Uh oh!

gemini-code-assist Bot Apr 23, 2026

Uh oh!

gemini-code-assist Bot Apr 23, 2026

Uh oh!

gemini-code-assist Bot Apr 23, 2026

Uh oh!

gemini-code-assist Bot Apr 23, 2026

Uh oh!

gemini-code-assist Bot Apr 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		if not vllm_version_is("0.19.0"):
		from vllm.v1.worker.worker_base import CompilationTimes # noqa: E402

Conversation

Meihan-chen commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

github-actions Bot commented Apr 23, 2026

Uh oh!

gemini-code-assist Bot commented Apr 23, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Meihan-chen commented Apr 23, 2026 •

edited

Loading