[Misc]Main2main to 0420#8610
Conversation
**Commit range:** `6f786f2`..`d886c26` | Priority | Change | vLLM File | vllm-ascend File | Description | |:---|:---|:---|:---|:---| | P0 | Zero-expert logical expert boundary | `vllm/model_executor/layers/fused_moe/layer.py` | `vllm_ascend/ops/fused_moe/fused_moe.py`, `vllm_ascend/_310p/fused_moe/fused_moe.py`, `vllm_ascend/quantization/methods/w8a8_dynamic.py`, `vllm_ascend/_310p/quantization/methods/w8a8_dynamic.py` | Use `logical_num_experts` when available so Ascend zero-expert handling stays compatible with the upstream FusedMoE router changes on main. | | P1 | Main2main commit reference bump | `docs/source/conf.py`, `.github/workflows/pr_test_full.yaml`, `.github/workflows/pr_test_light.yaml`, `.github/workflows/dockerfiles/Dockerfile.lint` | same | Update main-branch vLLM commit references from `6f786f2c506cb07f4566771fdc62e640e2c4a176` to `d886c26d4d4fef7d079696beb4ece1cfb4b008a8`. | - `.github/workflows/dockerfiles/Dockerfile.lint` - `.github/workflows/pr_test_full.yaml` - `.github/workflows/pr_test_light.yaml` - `docs/source/conf.py` - `vllm_ascend/ops/fused_moe/fused_moe.py` - `vllm_ascend/_310p/fused_moe/fused_moe.py` - `vllm_ascend/quantization/methods/w8a8_dynamic.py` - `vllm_ascend/_310p/quantization/methods/w8a8_dynamic.py` Signed-off-by: Meihan-chen <jcccx.cmh@gmail.com>
CI Fix Summary (run ID: 24638481321)
Commit range: 6f786f2..f150107
Issues Fixed
Error Upstream Cause Commit Affected Files Fix Description
AttributeError: 'float' object has no attribute 'language_model' c08f3b2a — Measure encoder compile time seperate from llm backbone vllm_ascend/worker/worker.py, tests/ut/worker/test_worker_v1.py Return CompilationTimes on vLLM main, keep scalar return for v0.19.0 via vllm_version_is("0.19.0").
Signed-off-by: Meihan-chen <jcccx.cmh@gmail.com>
Signed-off-by: Meihan-chen <jcccx.cmh@gmail.com>
Signed-off-by: Meihan-chen <jcccx.cmh@gmail.com>
|
👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:
If CI fails, you can run linting and testing checks locally according Contributing and Testing. |
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request updates the codebase to maintain compatibility with newer versions of vLLM while preserving support for v0.19.0. It primarily addresses breaking changes in model component structures and data processing logic by introducing version-aware conditional checks. Additionally, it refines dependency management and internal API usage to ensure stability across different vLLM releases. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Ignored Files
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request updates the vLLM dependency to v0.19.0 and implements compatibility logic for fused MLA weights, updated dp_metadata fields, and changes to the CompilationTimes structure. Suggested PR Title: \nmarkdown\n[Ops][Test][Doc][Misc] Update vLLM to v0.19.0 and handle fused MLA weights\n\nSuggested PR Summary: \nmarkdown\n### What this PR does / why we need it?\nThis PR updates the vLLM commit hash to v0.19.0 and introduces compatibility changes for fused `wk` and `weights_proj` in MLA/SFA, `dp_metadata` field changes, and `CompilationTimes` structure updates.\n\n### Does this PR introduce _any_ user-facing change?\nNo.\n\n### How was this patch tested?\nCI passed with updated unit tests supporting the new vLLM version.\n\nFeedback: The reviewer identified a bug in worker.py where compilation_config is accessed incorrectly and noted that the logic for handling fused weights appears to be inverted across several modules (v0.19.0 should use the fused wk_weights_proj attribute). Additionally, the reviewer recommended using version range checks instead of exact equality for better future compatibility.
| "main_vllm_commit": "ccaf5ffaa3e1fb2a081b2c9e403ac0e4dfc142c8", | ||
| # vLLM tag for main branch | ||
| "main_vllm_tag": "v0.19.0", |
There was a problem hiding this comment.
The Pull Request title and summary do not adhere to the repository style guide. Please update them according to the following suggestions:
Suggested PR Title:
[Ops][Test][Doc][Misc] Update vLLM to v0.19.0 and handle fused MLA weightsSuggested PR Summary:
### What this PR does / why we need it?
This PR updates the vLLM commit hash to a newer version of v0.19.0 and introduces compatibility changes to handle breaking changes in upstream vLLM, including:
- Fused `wk` and `weights_proj` in MLA/SFA.
- Changes in `dp_metadata` fields.
- Changes in `CompilationTimes` structure.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
CI passed with updated unit tests supporting the new vLLM version.References
- PR Title and Summary must follow a specific format and be provided in markdown code blocks. (link)
| return CompilationTimes( | ||
| language_model=self.vllm_config.compilation_config.compilation_time, | ||
| encoder=self.compilation_config.encoder_compilation_time, | ||
| ) |
There was a problem hiding this comment.
There is a bug on line 484: self.compilation_config is not a valid attribute of NPUWorker. It should be accessed via self.vllm_config.compilation_config.
| return CompilationTimes( | |
| language_model=self.vllm_config.compilation_config.compilation_time, | |
| encoder=self.compilation_config.encoder_compilation_time, | |
| ) | |
| return CompilationTimes( | |
| language_model=self.vllm_config.compilation_config.compilation_time, | |
| encoder=self.vllm_config.compilation_config.encoder_compilation_time, | |
| ) |
| if vllm_version_is("0.19.0"): | ||
| self.wk = vllm_indexer.wk | ||
| self.weights_proj = vllm_indexer.weights_proj | ||
| else: | ||
| self.wk_weights_proj = vllm_indexer.wk_weights_proj |
There was a problem hiding this comment.
The logic for handling fused weights appears to be inverted. The comment states that upstream fused wk and weights_proj into wk_weights_proj. Since this PR targets vLLM v0.19.0, the if vllm_version_is("0.19.0") block should use the fused weight, while the else block should handle older versions.
Additionally, using vllm_version_is for exact equality is fragile as it will fail for future versions (e.g., 0.19.1). Consider using a version range check if available.
| if vllm_version_is("0.19.0"): | |
| self.wk = vllm_indexer.wk | |
| self.weights_proj = vllm_indexer.weights_proj | |
| else: | |
| self.wk_weights_proj = vllm_indexer.wk_weights_proj | |
| if vllm_version_is("0.19.0"): | |
| self.wk_weights_proj = vllm_indexer.wk_weights_proj | |
| else: | |
| self.wk = vllm_indexer.wk | |
| self.weights_proj = vllm_indexer.weights_proj |
| if vllm_version_is("0.19.0"): | ||
| self.wk = self.indexer.wk | ||
| self.weights_proj = self.indexer.weights_proj | ||
| else: | ||
| self.wk_weights_proj = self.indexer.wk_weights_proj |
There was a problem hiding this comment.
The logic for fused weights appears to be inverted here, which is inconsistent with the logic used in ascend_forward_context.py. If v0.19.0 is the version that introduced fused weights, it should use wk_weights_proj.
| if vllm_version_is("0.19.0"): | |
| self.wk = self.indexer.wk | |
| self.weights_proj = self.indexer.weights_proj | |
| else: | |
| self.wk_weights_proj = self.indexer.wk_weights_proj | |
| if vllm_version_is("0.19.0"): | |
| self.wk_weights_proj = self.indexer.wk_weights_proj | |
| else: | |
| self.wk = self.indexer.wk | |
| self.weights_proj = self.indexer.weights_proj |
| if vllm_version_is("0.19.0"): | ||
| k_li, _ = self.wk(x) # [b,s,7168] @ [7168,128] = [b,s,128] | ||
| else: | ||
| kw, _ = self.wk_weights_proj(x) | ||
| k_li = kw[:, : self.head_dim] |
There was a problem hiding this comment.
The logic for fused weights appears to be inverted. If v0.19.0 has fused weights, it should use wk_weights_proj.
| if vllm_version_is("0.19.0"): | |
| k_li, _ = self.wk(x) # [b,s,7168] @ [7168,128] = [b,s,128] | |
| else: | |
| kw, _ = self.wk_weights_proj(x) | |
| k_li = kw[:, : self.head_dim] | |
| if vllm_version_is("0.19.0"): | |
| kw, _ = self.wk_weights_proj(x) | |
| k_li = kw[:, : self.head_dim] | |
| else: | |
| k_li, _ = self.wk(x) # [b,s,7168] @ [7168,128] = [b,s,128] |
| if vllm_version_is("0.19.0"): | ||
| weights, _ = self.weights_proj(x) | ||
| else: | ||
| kw, _ = self.wk_weights_proj(x) | ||
| weights = kw[:, self.head_dim :] |
There was a problem hiding this comment.
The logic for fused weights appears to be inverted. If v0.19.0 has fused weights, it should use wk_weights_proj.
| if vllm_version_is("0.19.0"): | |
| weights, _ = self.weights_proj(x) | |
| else: | |
| kw, _ = self.wk_weights_proj(x) | |
| weights = kw[:, self.head_dim :] | |
| if vllm_version_is("0.19.0"): | |
| kw, _ = self.wk_weights_proj(x) | |
| weights = kw[:, self.head_dim :] | |
| else: | |
| weights, _ = self.weights_proj(x) |
| if not vllm_version_is("0.19.0"): | ||
| from vllm.v1.worker.worker_base import CompilationTimes # noqa: E402 |
There was a problem hiding this comment.
Signed-off-by: Meihan-chen <jcccx.cmh@gmail.com>
Signed-off-by: Meihan-chen <jcccx.cmh@gmail.com> Signed-off-by: Meihan-chen <zr010426ztt@outlook.com>
1731f6b to
3b5c6ea
Compare
Signed-off-by: Meihan-chen <zr010426ztt@outlook.com>
3fdf6ab to
7f497ba
Compare
Signed-off-by: Meihan-chen <zr010426ztt@outlook.com>
What this PR does / why we need it?
Does this PR introduce any user-facing change?
How was this patch tested?