[300I][Bugfix] fix unquant model weight nd2nz error by Tflowers-0129 · Pull Request #6851 · vllm-project/vllm-ascend

Tflowers-0129 · 2026-02-27T07:49:03Z

What this PR does / why we need it?

This PR fixes an issue with weight format conversion for unquantized models running on Ascend 310P devices.
The changes refactor the logic for converting weights to the FRACTAL_NZ format. Previously, this was handled in a 310P-specific linear layer implementation (AscendUnquantizedLinearMethod310). This implementation has been removed, and the logic is now centralized in the maybe_trans_nz utility function. This function now checks if the device is a 310P and applies the NZ format cast accordingly for float16/bfloat16 weights.
This refactoring simplifies the code by removing platform-specific duplication and ensures correct weight handling for unquantized models on 310P.

Does this PR introduce any user-facing change?

No

How was this patch tested?

ut and local test

vLLM version: v0.15.0
vLLM main: vllm-project/vllm@83b47f6

gemini-code-assist · 2026-02-27T07:49:19Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses a critical bug affecting unquantized model weights on Ascend 310P devices, ensuring they are correctly formatted for optimal performance and preventing nd2nz errors. Concurrently, it includes a refactoring effort to generalize the AscendUnquantizedLinearMethod by relocating it to a broader operational module, streamlining the codebase.

Highlights

Bug Fix: Unquantized Model Weight Formatting: Resolved an issue where unquantized bf16/fp16 model weights on Ascend 310P devices were not consistently cast to the ACL_FORMAT_FRACTAL_NZ format, which could lead to nd2nz errors during model execution.
Code Refactoring: Refactored the AscendUnquantizedLinearMethod by moving its definition from a 310P-specific file to a more general vllm_ascend.ops.linear module, improving code organization and reusability.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog

vllm_ascend/_310p/ops/linear.py
- Removed the file containing 310P-specific linear operation definitions, as its functionality was generalized.
vllm_ascend/_310p/quantization/modelslim_config.py
- Updated the import statement for AscendUnquantizedLinearMethod to reflect its new, generalized location.
- Modified the instantiation of AscendUnquantizedLinearMethod310 to AscendUnquantizedLinearMethod.
vllm_ascend/utils.py
- Extended the condition in maybe_trans_nz to ensure that bf16/fp16 weights are always cast to ACL_FORMAT_FRACTAL_NZ when running on 310P devices, in addition to the existing environment variable check.

Activity

No human activity has been recorded on this pull request yet, as the description is empty.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request refactors the weight conversion logic for unquantized models on Ascend 310P devices. The platform-specific logic for converting weights to the FRACTAL_NZ format is moved from a dedicated 310P linear layer implementation into the generic maybe_trans_nz utility function. This simplifies the codebase by removing duplicated logic and fixes an issue with weight conversion. The change is a good improvement.

I have one suggestion to improve code maintainability by updating an outdated comment in vllm_ascend/utils.py.

As per the repository's style guide, I'm also suggesting an updated title and summary for this pull request:

Suggested PR Title:

[310P][Ops][BugFix] Fix unquantized model weight NZ conversion

Suggested PR Summary:

### What this PR does / why we need it?
This PR fixes an issue with weight format conversion for unquantized models running on Ascend 310P devices.

The changes refactor the logic for converting weights to the FRACTAL_NZ format. Previously, this was handled in a 310P-specific linear layer implementation (`AscendUnquantizedLinearMethod310`). This implementation has been removed, and the logic is now centralized in the `maybe_trans_nz` utility function. This function now checks if the device is a 310P and applies the NZ format cast accordingly for `float16`/`bfloat16` weights.

This refactoring simplifies the code by removing platform-specific duplication and ensures correct weight handling for unquantized models on 310P.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
CI passed. The change is a refactoring that preserves existing behavior for 310P devices while improving code structure.

github-actions · 2026-02-27T07:52:11Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

github-actions · 2026-02-28T01:24:37Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

pu-zhe · 2026-03-02T15:35:47Z

+class _AscendParallelLMHead310QuantMethod(QuantizeMethodBase):
+    def __init__(self, base_method: QuantizeMethodBase):
+        self._base_method = base_method
+
+    def __getattr__(self, name: str):
+        return getattr(self._base_method, name)


已这种非常规继承类有如下风险：

isinstance(UnquantizedEmbeddingMethod)会报False，为后续潜在的框架优化埋雷。

如果后续UnquantizedEmbeddingMethod基类的create_weights，apply等方法新增入参，此处会出现不适配问题。

如果某一天有其他类需要继承AscendParallelLMHead310QuantMethod，super().init()方法无法起到预期效果。

getattr 在每次属性访问时都会触发，比真正的继承增加性能开销。

pu-zhe · 2026-03-02T15:37:19Z

+from vllm_ascend.utils import maybe_trans_nz
+
+
+class _AscendParallelLMHead310QuantMethod(QuantizeMethodBase):


类命名前加下划线_不合适，此处应为AscendUnquantizedEmbeddingMethod310，与其他类名命名风格保持一致。

Signed-off-by: Tflowers-0129 <2906339855@qq.com>

…to qwen3next_graph * 'main' of https://github.com/vllm-project/vllm-ascend: (40 commits) [Feature] Add docs of batch invariance and make some extra operators patch (vllm-project#6910) [bugfix]Qwen2.5VL accurate question (vllm-project#6975) [CI] Add DeepSeek-V3.2 large EP nightly ci (vllm-project#6378) [Ops][BugFix] Fix RoPE shape mismatch for mtp models with flashcomm v1 enabled (vllm-project#6939) [bugfix]fix file not found error in nightly of single-node (vllm-project#6976) [Bugfix] Fix the acceptance rates dorp issue when applying eagle3 to QuaRot model (vllm-project#6914) [CI] Enable auto upgrade e2e estimated time for auto-partition suites (vllm-project#6840) [Doc][Misc] Fix msprobe_guide.md documentation issues (vllm-project#6965) [Nightly][Refactor]Migrate nightly single-node model tests from `.py` to `.yaml` (vllm-project#6503) [BugFix] Improve GDN layer detection for multimodal models (vllm-project#6941) [feat]ds3.2 pcp support mtp and chunkprefill (vllm-project#6917) [CPU binding] Implement global CPU slicing and improve IRQ binding for Ascend NPUs (vllm-project#6945) [Triton] Centralize Ascend extension op dispatch in triton_utils (vllm-project#6937) [csrc][bugfix] Add compile-time Ascend950/910_95 compatibility for custom ops between CANN8.5 and 9.0 (vllm-project#6936) [300I][Bugfix] fix unquant model weight nd2nz error (vllm-project#6851) [doc] fix supported_models (vllm-project#6930) [CI] nightly test timeout (vllm-project#6912) [CI] Upgrade CANN to 8.5.1 (vllm-project#6897) [Model]Add Qwen3-Omni quantization Ascend NPU adaptation and optimization (vllm-project#6828) [P/D][v0.16.0]Adapt to RecomputeScheduler in vLLM 0.16.0 (vllm-project#6898) ...

### What this PR does / why we need it? - This PR fixes an issue with weight format conversion for unquantized models running on Ascend 310P devices. - The changes refactor the logic for converting weights to the FRACTAL_NZ format. Previously, this was handled in a 310P-specific linear layer implementation (`AscendUnquantizedLinearMethod310`). This implementation has been removed, and the logic is now centralized in the `maybe_trans_nz` utility function. This function now checks if the device is a 310P and applies the NZ format cast accordingly for `float16`/`bfloat16` weights. - This refactoring simplifies the code by removing platform-specific duplication and ensures correct weight handling for unquantized models on 310P. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? ut and local test - vLLM version: v0.15.0 - vLLM main: vllm-project/vllm@83b47f6 --------- Signed-off-by: Tflowers-0129 <2906339855@qq.com>

Tflowers-0129 requested a review from wangxiyuan as a code owner February 27, 2026 07:49

gemini-code-assist bot reviewed Feb 27, 2026

View reviewed changes

Comment thread vllm_ascend/utils.py Outdated

github-actions bot added the module:core label Feb 27, 2026

github-actions bot added the merge-conflicts label Feb 28, 2026

github-actions bot removed the merge-conflicts label Feb 28, 2026

Tflowers-0129 force-pushed the fixnd2nznew branch from 436e697 to 4d90ae0 Compare February 28, 2026 02:16

pu-zhe reviewed Feb 28, 2026

View reviewed changes

Comment thread vllm_ascend/_310p/ops/vocab_parallel_embedding.py Outdated

pu-zhe reviewed Mar 2, 2026

View reviewed changes

Comment thread vllm_ascend/utils.py

pu-zhe reviewed Mar 2, 2026

View reviewed changes

Tflowers-0129 added 14 commits March 3, 2026 10:30

[300I][Bugfix] fix unquant model weight nd2nz error

a792359

Signed-off-by: Tflowers-0129 <2906339855@qq.com>

add ut

ec9fee0

Signed-off-by: Tflowers-0129 <2906339855@qq.com>

ut

8ce786d

Signed-off-by: Tflowers-0129 <2906339855@qq.com>

[310p] add vocab parallel nz

3be1e5d

Signed-off-by: Tflowers-0129 <2906339855@qq.com>

refactoring utils maybe_trans_nz

e42e58e

Signed-off-by: Tflowers-0129 <2906339855@qq.com>

add vocabembedding ut

7a07d65

Signed-off-by: Tflowers-0129 <2906339855@qq.com>

ruffformat error fix

4f50b0f

Signed-off-by: Tflowers-0129 <2906339855@qq.com>

cleancode

dad79ea

Signed-off-by: Tflowers-0129 <2906339855@qq.com>

ut

65c021d

Signed-off-by: Tflowers-0129 <2906339855@qq.com>

remove patch on 310p vocabembedding

fd842d0

Signed-off-by: Tflowers-0129 <2906339855@qq.com>

cleancode

bb8382b

Signed-off-by: Tflowers-0129 <2906339855@qq.com>

refactor

ffa3f3a

Signed-off-by: Tflowers-0129 <2906339855@qq.com>

cleancode

d0c73dd

Signed-off-by: Tflowers-0129 <2906339855@qq.com>

cleancode

2f50b65

Signed-off-by: Tflowers-0129 <2906339855@qq.com>

Tflowers-0129 force-pushed the fixnd2nznew branch from 28e014e to 2f50b65 Compare March 3, 2026 02:32

Tflowers-0129 added 2 commits March 3, 2026 10:37

cleancode

a369ad5

Signed-off-by: Tflowers-0129 <2906339855@qq.com>

trigger

f3d13f0

Signed-off-by: Tflowers-0129 <2906339855@qq.com>

wangxiyuan approved these changes Mar 3, 2026

View reviewed changes

wangxiyuan merged commit 2064afe into vllm-project:main Mar 3, 2026
25 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[300I][Bugfix] fix unquant model weight nd2nz error#6851

[300I][Bugfix] fix unquant model weight nd2nz error#6851
wangxiyuan merged 16 commits intovllm-project:mainfrom
Tflowers-0129:fixnd2nznew

Tflowers-0129 commented Feb 27, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot commented Feb 27, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

github-actions bot commented Feb 27, 2026

Uh oh!

github-actions bot commented Feb 28, 2026

Uh oh!

Uh oh!

Uh oh!

pu-zhe Mar 2, 2026

Uh oh!

pu-zhe Mar 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		from vllm_ascend.utils import maybe_trans_nz


		class _AscendParallelLMHead310QuantMethod(QuantizeMethodBase):

Conversation

Tflowers-0129 commented Feb 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

gemini-code-assist bot commented Feb 27, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

github-actions bot commented Feb 27, 2026

Uh oh!

github-actions bot commented Feb 28, 2026

Uh oh!

Uh oh!

Uh oh!

pu-zhe Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

pu-zhe Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Tflowers-0129 commented Feb 27, 2026 •

edited

Loading