Skip to content

[Misc]Main2main to 0420#8610

Closed
Meihan-chen wants to merge 10 commits intovllm-project:mainfrom
Meihan-chen:main2main_0423
Closed

[Misc]Main2main to 0420#8610
Meihan-chen wants to merge 10 commits intovllm-project:mainfrom
Meihan-chen:main2main_0423

Conversation

@Meihan-chen
Copy link
Copy Markdown
Collaborator

@Meihan-chen Meihan-chen commented Apr 23, 2026

What this PR does / why we need it?

  1. fix ModuleNotFoundError: No module named 'vllm.model_executor.layers.fused_moe.runner.default_moe_runner' due to [MoE Refactor] Combine MoERunnerBase + DefaultMoERunner vllm#40560

Does this PR introduce any user-facing change?

How was this patch tested?

**Commit range:** `6f786f2`..`d886c26`

| Priority | Change | vLLM File | vllm-ascend File | Description |
|:---|:---|:---|:---|:---|
| P0 | Zero-expert logical expert boundary | `vllm/model_executor/layers/fused_moe/layer.py` | `vllm_ascend/ops/fused_moe/fused_moe.py`, `vllm_ascend/_310p/fused_moe/fused_moe.py`, `vllm_ascend/quantization/methods/w8a8_dynamic.py`, `vllm_ascend/_310p/quantization/methods/w8a8_dynamic.py` | Use `logical_num_experts` when available so Ascend zero-expert handling stays compatible with the upstream FusedMoE router changes on main. |
| P1 | Main2main commit reference bump | `docs/source/conf.py`, `.github/workflows/pr_test_full.yaml`, `.github/workflows/pr_test_light.yaml`, `.github/workflows/dockerfiles/Dockerfile.lint` | same | Update main-branch vLLM commit references from `6f786f2c506cb07f4566771fdc62e640e2c4a176` to `d886c26d4d4fef7d079696beb4ece1cfb4b008a8`. |

- `.github/workflows/dockerfiles/Dockerfile.lint`
- `.github/workflows/pr_test_full.yaml`
- `.github/workflows/pr_test_light.yaml`
- `docs/source/conf.py`
- `vllm_ascend/ops/fused_moe/fused_moe.py`
- `vllm_ascend/_310p/fused_moe/fused_moe.py`
- `vllm_ascend/quantization/methods/w8a8_dynamic.py`
- `vllm_ascend/_310p/quantization/methods/w8a8_dynamic.py`

Signed-off-by: Meihan-chen <jcccx.cmh@gmail.com>
CI Fix Summary (run ID: 24638481321)
Commit range: 6f786f2..f150107

Issues Fixed
Error	Upstream Cause Commit	Affected Files	Fix Description
AttributeError: 'float' object has no attribute 'language_model'	c08f3b2a — Measure encoder compile time seperate from llm backbone	vllm_ascend/worker/worker.py, tests/ut/worker/test_worker_v1.py	Return CompilationTimes on vLLM main, keep scalar return for v0.19.0 via vllm_version_is("0.19.0").

Signed-off-by: Meihan-chen <jcccx.cmh@gmail.com>
Signed-off-by: Meihan-chen <jcccx.cmh@gmail.com>
Signed-off-by: Meihan-chen <jcccx.cmh@gmail.com>
Signed-off-by: Meihan-chen <jcccx.cmh@gmail.com>
Signed-off-by: Meihan-chen <jcccx.cmh@gmail.com>
@github-actions
Copy link
Copy Markdown
Contributor

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

  • A PR should do only one thing, smaller PRs enable faster reviews.
  • Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
  • Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

@Meihan-chen Meihan-chen added ready read for review ready-for-test start test by label for PR and removed documentation Improvements or additions to documentation ci/build module:tests module:ops module:core labels Apr 23, 2026
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request updates the codebase to maintain compatibility with newer versions of vLLM while preserving support for v0.19.0. It primarily addresses breaking changes in model component structures and data processing logic by introducing version-aware conditional checks. Additionally, it refines dependency management and internal API usage to ensure stability across different vLLM releases.

Highlights

  • vLLM Version Compatibility: Introduced a vllm_version_is utility to handle conditional logic for supporting both vLLM v0.19.0 and newer versions.
  • Refactored Model Components: Updated IndexerWrapper and related attention modules to accommodate upstream changes where wk and weights_proj were fused into wk_weights_proj.
  • Data Processing Updates: Adjusted how max_tokens_across_dp is calculated to support different data structures across vLLM versions.
  • Dependency Update: Constrained the transformers dependency to transformers>=4.57.4, <5.0 in requirements.txt.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Ignored Files
  • Ignored by pattern: .github/workflows/** (4)
    • .github/workflows/_e2e_test.yaml
    • .github/workflows/dockerfiles/Dockerfile.lint
    • .github/workflows/pr_test_full.yaml
    • .github/workflows/pr_test_light.yaml
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the vLLM dependency to v0.19.0 and implements compatibility logic for fused MLA weights, updated dp_metadata fields, and changes to the CompilationTimes structure. Suggested PR Title: \nmarkdown\n[Ops][Test][Doc][Misc] Update vLLM to v0.19.0 and handle fused MLA weights\n\nSuggested PR Summary: \nmarkdown\n### What this PR does / why we need it?\nThis PR updates the vLLM commit hash to v0.19.0 and introduces compatibility changes for fused `wk` and `weights_proj` in MLA/SFA, `dp_metadata` field changes, and `CompilationTimes` structure updates.\n\n### Does this PR introduce _any_ user-facing change?\nNo.\n\n### How was this patch tested?\nCI passed with updated unit tests supporting the new vLLM version.\n\nFeedback: The reviewer identified a bug in worker.py where compilation_config is accessed incorrectly and noted that the logic for handling fused weights appears to be inverted across several modules (v0.19.0 should use the fused wk_weights_proj attribute). Additionally, the reviewer recommended using version range checks instead of exact equality for better future compatibility.

Comment thread docs/source/conf.py
Comment on lines +83 to 85
"main_vllm_commit": "ccaf5ffaa3e1fb2a081b2c9e403ac0e4dfc142c8",
# vLLM tag for main branch
"main_vllm_tag": "v0.19.0",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The Pull Request title and summary do not adhere to the repository style guide. Please update them according to the following suggestions:

Suggested PR Title:

[Ops][Test][Doc][Misc] Update vLLM to v0.19.0 and handle fused MLA weights

Suggested PR Summary:

### What this PR does / why we need it?
This PR updates the vLLM commit hash to a newer version of v0.19.0 and introduces compatibility changes to handle breaking changes in upstream vLLM, including:
- Fused `wk` and `weights_proj` in MLA/SFA.
- Changes in `dp_metadata` fields.
- Changes in `CompilationTimes` structure.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
CI passed with updated unit tests supporting the new vLLM version.
References
  1. PR Title and Summary must follow a specific format and be provided in markdown code blocks. (link)

Comment on lines +482 to +485
return CompilationTimes(
language_model=self.vllm_config.compilation_config.compilation_time,
encoder=self.compilation_config.encoder_compilation_time,
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

There is a bug on line 484: self.compilation_config is not a valid attribute of NPUWorker. It should be accessed via self.vllm_config.compilation_config.

Suggested change
return CompilationTimes(
language_model=self.vllm_config.compilation_config.compilation_time,
encoder=self.compilation_config.encoder_compilation_time,
)
return CompilationTimes(
language_model=self.vllm_config.compilation_config.compilation_time,
encoder=self.vllm_config.compilation_config.encoder_compilation_time,
)

Comment thread vllm_ascend/ops/mla.py
Comment on lines +58 to +62
if vllm_version_is("0.19.0"):
self.wk = vllm_indexer.wk
self.weights_proj = vllm_indexer.weights_proj
else:
self.wk_weights_proj = vllm_indexer.wk_weights_proj
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The logic for handling fused weights appears to be inverted. The comment states that upstream fused wk and weights_proj into wk_weights_proj. Since this PR targets vLLM v0.19.0, the if vllm_version_is("0.19.0") block should use the fused weight, while the else block should handle older versions.

Additionally, using vllm_version_is for exact equality is fragile as it will fail for future versions (e.g., 0.19.1). Consider using a version range check if available.

Suggested change
if vllm_version_is("0.19.0"):
self.wk = vllm_indexer.wk
self.weights_proj = vllm_indexer.weights_proj
else:
self.wk_weights_proj = vllm_indexer.wk_weights_proj
if vllm_version_is("0.19.0"):
self.wk_weights_proj = vllm_indexer.wk_weights_proj
else:
self.wk = vllm_indexer.wk
self.weights_proj = vllm_indexer.weights_proj

Comment on lines +443 to +447
if vllm_version_is("0.19.0"):
self.wk = self.indexer.wk
self.weights_proj = self.indexer.weights_proj
else:
self.wk_weights_proj = self.indexer.wk_weights_proj
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The logic for fused weights appears to be inverted here, which is inconsistent with the logic used in ascend_forward_context.py. If v0.19.0 is the version that introduced fused weights, it should use wk_weights_proj.

Suggested change
if vllm_version_is("0.19.0"):
self.wk = self.indexer.wk
self.weights_proj = self.indexer.weights_proj
else:
self.wk_weights_proj = self.indexer.wk_weights_proj
if vllm_version_is("0.19.0"):
self.wk_weights_proj = self.indexer.wk_weights_proj
else:
self.wk = self.indexer.wk
self.weights_proj = self.indexer.weights_proj

Comment on lines +916 to +920
if vllm_version_is("0.19.0"):
k_li, _ = self.wk(x) # [b,s,7168] @ [7168,128] = [b,s,128]
else:
kw, _ = self.wk_weights_proj(x)
k_li = kw[:, : self.head_dim]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The logic for fused weights appears to be inverted. If v0.19.0 has fused weights, it should use wk_weights_proj.

Suggested change
if vllm_version_is("0.19.0"):
k_li, _ = self.wk(x) # [b,s,7168] @ [7168,128] = [b,s,128]
else:
kw, _ = self.wk_weights_proj(x)
k_li = kw[:, : self.head_dim]
if vllm_version_is("0.19.0"):
kw, _ = self.wk_weights_proj(x)
k_li = kw[:, : self.head_dim]
else:
k_li, _ = self.wk(x) # [b,s,7168] @ [7168,128] = [b,s,128]

Comment on lines +389 to +393
if vllm_version_is("0.19.0"):
weights, _ = self.weights_proj(x)
else:
kw, _ = self.wk_weights_proj(x)
weights = kw[:, self.head_dim :]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The logic for fused weights appears to be inverted. If v0.19.0 has fused weights, it should use wk_weights_proj.

Suggested change
if vllm_version_is("0.19.0"):
weights, _ = self.weights_proj(x)
else:
kw, _ = self.wk_weights_proj(x)
weights = kw[:, self.head_dim :]
if vllm_version_is("0.19.0"):
kw, _ = self.wk_weights_proj(x)
weights = kw[:, self.head_dim :]
else:
weights, _ = self.weights_proj(x)

Comment on lines +66 to +67
if not vllm_version_is("0.19.0"):
from vllm.v1.worker.worker_base import CompilationTimes # noqa: E402
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Using vllm_version_is("0.19.0") for conditional imports and logic is fragile. If a breaking change was introduced in 0.19.0 and persists in future versions, this check will fail for versions like 0.19.1 or 0.20.0. It is recommended to use a version range check to handle future versions correctly.

Signed-off-by: Meihan-chen <jcccx.cmh@gmail.com>
Signed-off-by: Meihan-chen <jcccx.cmh@gmail.com>
Signed-off-by: Meihan-chen <zr010426ztt@outlook.com>
Signed-off-by: Meihan-chen <zr010426ztt@outlook.com>
Signed-off-by: Meihan-chen <zr010426ztt@outlook.com>
@Meihan-chen Meihan-chen changed the title [Misc]Main2main to 0423 [Misc]Main2main to 0420 Apr 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready read for review ready-for-test start test by label for PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant