Skip to content

[refactor] refactor weight trans nz and transpose#4878

Merged
zzzzwwjj merged 2 commits intovllm-project:mainfrom
zzzzwwjj:weight_nz
Dec 19, 2025
Merged

[refactor] refactor weight trans nz and transpose#4878
zzzzwwjj merged 2 commits intovllm-project:mainfrom
zzzzwwjj:weight_nz

Conversation

@zzzzwwjj
Copy link
Copy Markdown
Collaborator

@zzzzwwjj zzzzwwjj commented Dec 10, 2025

What this PR does / why we need it?

Now VLLM_ASCEND_ENABLE_NZ will have three options:
0: disable nz;
1: only quant case enable nz;
2: enable nz as long as possible;

And VLLM_ASCEND_ENABLE_NZ=1 by default.

All cases are shown in the table below:

W4A4 W4A8 W8A8 fp16/bf16 fp32
trans nz can't support nz trans nz by default trans nz by default trans nz when VLLM_ASCEND_ENABLE_NZ is 2 can't support nz
transpose only support not transpose case only support transpose case only support transpose case linear: only support not transpose case
gmm: only support transpose case
same to fp16/bf16

Some exceptional cases:

  1. MLAPO op need to do some additional processing on the weights, including trans nz. If use MLAPO op, some weight will be transformed to nz forcely;
  2. MLA/SFA's weight W_UV will be used by op torch.ops._C_ascend.batch_matmul_transpose, and this op can't support nz currently;

Does this PR introduce any user-facing change?

Now fp16/bf16 weight will not trans nz by default.

How was this patch tested?

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the handling of the VLLM_ASCEND_ENABLE_NZ environment variable by centralizing the logic into a new maybe_trans_nz helper function. This is a significant improvement in code clarity and maintainability. The changes are consistently applied across various modules, and the tests have been updated to reflect the new behavior. However, I've identified a critical issue in one of the quantization files where a torch.nn.Parameter is incorrectly replaced by a torch.Tensor, which could lead to incorrect model behavior.

Comment on lines +256 to +257
layer.w13_weight = maybe_trans_nz(layer.w13_weight)
layer.w2_weight = maybe_trans_nz(layer.w2_weight)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The maybe_trans_nz function returns a torch.Tensor. By assigning the result directly to layer.w13_weight and layer.w2_weight, you are replacing the torch.nn.Parameter objects with regular tensors. This will cause them to no longer be treated as model parameters, which can lead to issues with device placement, state dicts, and optimizer behavior.

The original code used an in-place operation torch_npu.npu_format_cast_, which preserved the Parameter status. To fix this, you should assign the result of maybe_trans_nz to the .data attribute of the parameter.

Suggested change
layer.w13_weight = maybe_trans_nz(layer.w13_weight)
layer.w2_weight = maybe_trans_nz(layer.w2_weight)
layer.w13_weight.data = maybe_trans_nz(layer.w13_weight.data)
layer.w2_weight.data = maybe_trans_nz(layer.w2_weight.data)

@zzzzwwjj zzzzwwjj changed the title [refactor] refactor VLLM_ASCEND_ENABLE_NZ [refactor] refactor weight trans nz and transpose Dec 10, 2025
@github-actions
Copy link
Copy Markdown
Contributor

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

  • A PR should do only one thing, smaller PRs enable faster reviews.
  • Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
  • Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

@github-actions
Copy link
Copy Markdown
Contributor

This pull request has conflicts, please resolve those before we can evaluate the pull request.

@yiz-liu
Copy link
Copy Markdown
Collaborator

yiz-liu commented Dec 10, 2025

Maybe we should change it to VLLM_ASCEND_FP16_ENABLE_NZ and default it to 0. Also leave it for training environment to disable all "cast tensor format to NZ" operations.

@github-actions
Copy link
Copy Markdown
Contributor

This pull request has conflicts, please resolve those before we can evaluate the pull request.

@zzzzwwjj zzzzwwjj force-pushed the weight_nz branch 2 times, most recently from 57e68ed to 7ffeef5 Compare December 11, 2025 02:20
@zzzzwwjj zzzzwwjj force-pushed the weight_nz branch 2 times, most recently from 9587107 to 57cfc83 Compare December 11, 2025 02:57
@github-actions
Copy link
Copy Markdown
Contributor

This pull request has conflicts, please resolve those before we can evaluate the pull request.

@github-actions
Copy link
Copy Markdown
Contributor

This pull request has conflicts, please resolve those before we can evaluate the pull request.

@zzzzwwjj zzzzwwjj force-pushed the weight_nz branch 3 times, most recently from 7f486d9 to 97ce196 Compare December 11, 2025 14:39
layer.w13_weight.data = layer.w13_weight.data.transpose(
1, 2).contiguous()
layer.w2_weight.data = layer.w2_weight.data.transpose(1,
2).contiguous()
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@github-actions
Copy link
Copy Markdown
Contributor

This pull request has conflicts, please resolve those before we can evaluate the pull request.

@github-actions
Copy link
Copy Markdown
Contributor

This pull request has conflicts, please resolve those before we can evaluate the pull request.

wangxiyuan added a commit that referenced this pull request Dec 18, 2025
I'd like to nominate @zzzzwwjj @realliujiaxu @LCAIZJ to join vLLM Ascend
committer team.

@zzzzwwjj
---
- Review Quality‌:
He has completed 80+reviews since April. 2025, include
#3232 (comment),
#4822 (comment),
#4768 (comment)
high quality review.

- Sustained Contributions
15+ Valuable bug fix and refactor is very good.

https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+author%3Azzzzwwjj+is%3Aclosed+review%3Aapproved
Continuous optimization of code architecture

https://github.com/vllm-project/vllm-ascend/pulls?q=author%3Azzzzwwjj+is%3Amerged

- Quality Contribution‌:
#1229
#1979
#4359
#4878

- Community Involvement‌: 
He lead the #1147, to
refactor AscendFusedMoE at the first time.
He shared topics about large-scale distributed inference and
reinforcement learning on vLLM-Ascend meetup on August 2nd.

@realliujiaxu
---
- Review Quality‌:
He has completed about [40+
reviews](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+commenter%3Arealliujiaxu+-author%3Arealliujiaxu+)
since September, include
#4868 (comment),
#2275 (comment).

- Sustained Contributions
He has completed (17
commits)[https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+author%3Arealliujiaxu+is%3Amerged],
continuously optimizing the performance of the MoE model.

- Quality Contribution‌:

Contributed the Flash Comm1 feature to the community, supporting both
eager and aclgraph execution modes, while compatible with multiple MoE
models including DeepSeek and GLM4.5.
  - #3334
  - #3420
  - #3015
  
  co-author:
  - #3495
  - #4868

- Community Involvement‌: 
1. Completed two major refactors, enabling vllm-ascend to evolve more
rapidly and robustly: [Linear
module](#2867) and
[rejection
sampler](#4975)
2. [fixed 8
bugs](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+author%3Arealliujiaxu+is%3Amerged+bugfix+)
in graph mode, spec decoding and async scheduling.

@LCAIZJ
---
- Review Quality‌: He's been the go-to reviewer for virtually all PD
disaggregation and KV Pool related PRs, having completed [30+
reviews](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+commenter%3ALCAIZJ+is%3Aopen+-author%3ALCAIZJ+)
since May 2025. Notable examples include
[discussion_r2553887360](#4345 (comment)),
[issuecomment-3540994801](#4161 (comment)),
and
[discussion_r2492593988](#3981 (comment)),
all demonstrating thorough and insightful feedback.
- Sustained and Quality Contributions: His contributions reflect a
strong grasp of both ‌vLLM‌ and ‌vLLM Ascend‌ codebases, particularly in
prefill-decode disaggregation and KV pool areas ([7 PRs
merged](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+author%3ALCAIZJ+is%3Amerged+)).
Prefill-Decode Disaggregation: Delivered KV transfer functionality using
Mooncake TransferEngine and enabled layerwise KV transfer
#1568
#2602
KV Pool: Developed the foundational KV Pool infrastructure and migrated
it to the latest ADXL stack
#2913
#3350
- Quality Contribution‌:
#1568
#2602
#2913
#3350
- Community Involvement‌: 
He actively responds to [community
issues](https://github.com/vllm-project/vllm-ascend/issues?q=is%3Aissue%20commenter%3ALCAIZJ%20is%3Aopen%20-author%3ALCAIZJ),
continuously monitors functionality and accuracy issues related to PD
disaggregation and KV Pool, and proactively delivers [bug
fixes](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+author%3ALCAIZJ+is%3Amerged+bugfix).
- vLLM version: v0.12.0
- vLLM main:
vllm-project/vllm@ad32e3e

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
@github-actions
Copy link
Copy Markdown
Contributor

This pull request has conflicts, please resolve those before we can evaluate the pull request.

Signed-off-by: zzzzwwjj <1183291235@qq.com>
@zzzzwwjj zzzzwwjj merged commit cc23067 into vllm-project:main Dec 19, 2025
14 of 16 checks passed
@zzzzwwjj zzzzwwjj deleted the weight_nz branch December 19, 2025 06:27
845473182 pushed a commit to 845473182/vllm-ascend that referenced this pull request Dec 19, 2025
…to eplb_refactor

* 'main' of https://github.com/vllm-project/vllm-ascend: (52 commits)
  [Doc]Add the user_guide doc file regarding fine-grained TP. (vllm-project#5084)
  [pref] qwen3_next add triton ops : fused_sigmoid_gating_delta_rule_update (vllm-project#4818)
  [Feature] Add token mask for DispatchGmmCombineDecode operator (vllm-project#5171)
  [CI] Improve CI (vllm-project#5078)
  [Refactor] remove some metadata variables in attention_v1. (vllm-project#5160)
  Add Qwen3-VL-235B-A22B-Instruct tutorials (vllm-project#5167)
  [Doc] Add a perf tune section (vllm-project#5127)
  [Image] Refactor image build (vllm-project#5175)
  [refactor] refactor weight trans nz and transpose (vllm-project#4878)
  [BugFix]Fix precision issue for LoRA feature (vllm-project#4141)
  【Doc】Deepseekv3.1/R1 doc enhancement (vllm-project#4827)
  support basic long_seq feature st (vllm-project#5140)
  [Bugfix] install trition for test_custom_op (vllm-project#5112)
  [2/N][Pangu][MoE] Remove Pangu Related Code (vllm-project#5130)
  [bugfix] Use FUSED_MC2 MoE comm path for the op `dispatch_ffn_combine` (vllm-project#5156)
  [BugFix] Fix top_p,top_k issue with EAGLE and add top_p,top_k in EAGLE e2e (vllm-project#5131)
  [Doc][P/D] Fix MooncakeConnector's name (vllm-project#5172)
  [Bugfix] Fix in_profile_run in mtp_proposer dummy_run (vllm-project#5165)
  [Doc] Refact benchmark doc (vllm-project#5173)
  [Nightly]  Avoid max_model_len being smaller than the decoder prompt to prevent single-node-accuray-tests from failing (vllm-project#5174)
  ...

Signed-off-by: 白永斌 <baiyongbin3@h-partners.com>
chenaoxuan pushed a commit to chenaoxuan/vllm-ascend that referenced this pull request Dec 20, 2025
…t#5152)

I'd like to nominate @zzzzwwjj @realliujiaxu @LCAIZJ to join vLLM Ascend
committer team.

@zzzzwwjj
---
- Review Quality‌:
He has completed 80+reviews since April. 2025, include
vllm-project#3232 (comment),
vllm-project#4822 (comment),
vllm-project#4768 (comment)
high quality review.

- Sustained Contributions
15+ Valuable bug fix and refactor is very good.

https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+author%3Azzzzwwjj+is%3Aclosed+review%3Aapproved
Continuous optimization of code architecture

https://github.com/vllm-project/vllm-ascend/pulls?q=author%3Azzzzwwjj+is%3Amerged

- Quality Contribution‌:
vllm-project#1229
vllm-project#1979
vllm-project#4359
vllm-project#4878

- Community Involvement‌: 
He lead the vllm-project#1147, to
refactor AscendFusedMoE at the first time.
He shared topics about large-scale distributed inference and
reinforcement learning on vLLM-Ascend meetup on August 2nd.

@realliujiaxu
---
- Review Quality‌:
He has completed about [40+
reviews](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+commenter%3Arealliujiaxu+-author%3Arealliujiaxu+)
since September, include
vllm-project#4868 (comment),
vllm-project#2275 (comment).

- Sustained Contributions
He has completed (17
commits)[https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+author%3Arealliujiaxu+is%3Amerged],
continuously optimizing the performance of the MoE model.

- Quality Contribution‌:

Contributed the Flash Comm1 feature to the community, supporting both
eager and aclgraph execution modes, while compatible with multiple MoE
models including DeepSeek and GLM4.5.
  - vllm-project#3334
  - vllm-project#3420
  - vllm-project#3015
  
  co-author:
  - vllm-project#3495
  - vllm-project#4868

- Community Involvement‌: 
1. Completed two major refactors, enabling vllm-ascend to evolve more
rapidly and robustly: [Linear
module](vllm-project#2867) and
[rejection
sampler](vllm-project#4975)
2. [fixed 8
bugs](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+author%3Arealliujiaxu+is%3Amerged+bugfix+)
in graph mode, spec decoding and async scheduling.

@LCAIZJ
---
- Review Quality‌: He's been the go-to reviewer for virtually all PD
disaggregation and KV Pool related PRs, having completed [30+
reviews](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+commenter%3ALCAIZJ+is%3Aopen+-author%3ALCAIZJ+)
since May 2025. Notable examples include
[discussion_r2553887360](vllm-project#4345 (comment)),
[issuecomment-3540994801](vllm-project#4161 (comment)),
and
[discussion_r2492593988](vllm-project#3981 (comment)),
all demonstrating thorough and insightful feedback.
- Sustained and Quality Contributions: His contributions reflect a
strong grasp of both ‌vLLM‌ and ‌vLLM Ascend‌ codebases, particularly in
prefill-decode disaggregation and KV pool areas ([7 PRs
merged](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+author%3ALCAIZJ+is%3Amerged+)).
Prefill-Decode Disaggregation: Delivered KV transfer functionality using
Mooncake TransferEngine and enabled layerwise KV transfer
vllm-project#1568
vllm-project#2602
KV Pool: Developed the foundational KV Pool infrastructure and migrated
it to the latest ADXL stack
vllm-project#2913
vllm-project#3350
- Quality Contribution‌:
vllm-project#1568
vllm-project#2602
vllm-project#2913
vllm-project#3350
- Community Involvement‌: 
He actively responds to [community
issues](https://github.com/vllm-project/vllm-ascend/issues?q=is%3Aissue%20commenter%3ALCAIZJ%20is%3Aopen%20-author%3ALCAIZJ),
continuously monitors functionality and accuracy issues related to PD
disaggregation and KV Pool, and proactively delivers [bug
fixes](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+author%3ALCAIZJ+is%3Amerged+bugfix).
- vLLM version: v0.12.0
- vLLM main:
vllm-project/vllm@ad32e3e

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
chenaoxuan pushed a commit to chenaoxuan/vllm-ascend that referenced this pull request Dec 20, 2025
### What this PR does / why we need it?

Now `VLLM_ASCEND_ENABLE_NZ` will have three options:
0: disable nz;
1: only quant case enable nz;
2: enable nz as long as possible;

And `VLLM_ASCEND_ENABLE_NZ`=1 by default.

All cases are shown in the table below:

|  | W4A4 | W4A8 | W8A8 | fp16/bf16 | fp32 |
|---|---|---|---|---|---|
| trans nz | can't support nz | trans nz by default | trans nz by
default | trans nz when VLLM_ASCEND_ENABLE_NZ is 2 | can't support nz |
| transpose | only support not transpose case | only support transpose
case | only support transpose case | linear: only support not transpose
case<br>gmm: only support transpose case | same to fp16/bf16 |

Some exceptional cases:
1. MLAPO op need to do some additional processing on the weights,
including trans nz. If use MLAPO op, some weight will be transformed to
nz forcely;
2. MLA/SFA's weight `W_UV` will be used by op
`torch.ops._C_ascend.batch_matmul_transpose`, and this op can't support
nz currently;

### Does this PR introduce _any_ user-facing change?
Now fp16/bf16 weight will not trans nz by default.

### How was this patch tested?

- vLLM version: v0.12.0
- vLLM main:
vllm-project/vllm@ad32e3e

Signed-off-by: zzzzwwjj <1183291235@qq.com>
ZRJ026 pushed a commit to ZRJ026/vllm-ascend that referenced this pull request Feb 28, 2026
…t#5152)

I'd like to nominate @zzzzwwjj @realliujiaxu @LCAIZJ to join vLLM Ascend
committer team.

@zzzzwwjj
---
- Review Quality‌:
He has completed 80+reviews since April. 2025, include
vllm-project#3232 (comment),
vllm-project#4822 (comment),
vllm-project#4768 (comment)
high quality review.

- Sustained Contributions
15+ Valuable bug fix and refactor is very good.

https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+author%3Azzzzwwjj+is%3Aclosed+review%3Aapproved
Continuous optimization of code architecture

https://github.com/vllm-project/vllm-ascend/pulls?q=author%3Azzzzwwjj+is%3Amerged

- Quality Contribution‌:
vllm-project#1229
vllm-project#1979
vllm-project#4359
vllm-project#4878

- Community Involvement‌:
He lead the vllm-project#1147, to
refactor AscendFusedMoE at the first time.
He shared topics about large-scale distributed inference and
reinforcement learning on vLLM-Ascend meetup on August 2nd.

@realliujiaxu
---
- Review Quality‌:
He has completed about [40+
reviews](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+commenter%3Arealliujiaxu+-author%3Arealliujiaxu+)
since September, include
vllm-project#4868 (comment),
vllm-project#2275 (comment).

- Sustained Contributions
He has completed (17
commits)[https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+author%3Arealliujiaxu+is%3Amerged],
continuously optimizing the performance of the MoE model.

- Quality Contribution‌:

Contributed the Flash Comm1 feature to the community, supporting both
eager and aclgraph execution modes, while compatible with multiple MoE
models including DeepSeek and GLM4.5.
  - vllm-project#3334
  - vllm-project#3420
  - vllm-project#3015

  co-author:
  - vllm-project#3495
  - vllm-project#4868

- Community Involvement‌:
1. Completed two major refactors, enabling vllm-ascend to evolve more
rapidly and robustly: [Linear
module](vllm-project#2867) and
[rejection
sampler](vllm-project#4975)
2. [fixed 8
bugs](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+author%3Arealliujiaxu+is%3Amerged+bugfix+)
in graph mode, spec decoding and async scheduling.

@LCAIZJ
---
- Review Quality‌: He's been the go-to reviewer for virtually all PD
disaggregation and KV Pool related PRs, having completed [30+
reviews](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+commenter%3ALCAIZJ+is%3Aopen+-author%3ALCAIZJ+)
since May 2025. Notable examples include
[discussion_r2553887360](vllm-project#4345 (comment)),
[issuecomment-3540994801](vllm-project#4161 (comment)),
and
[discussion_r2492593988](vllm-project#3981 (comment)),
all demonstrating thorough and insightful feedback.
- Sustained and Quality Contributions: His contributions reflect a
strong grasp of both ‌vLLM‌ and ‌vLLM Ascend‌ codebases, particularly in
prefill-decode disaggregation and KV pool areas ([7 PRs
merged](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+author%3ALCAIZJ+is%3Amerged+)).
Prefill-Decode Disaggregation: Delivered KV transfer functionality using
Mooncake TransferEngine and enabled layerwise KV transfer
vllm-project#1568
vllm-project#2602
KV Pool: Developed the foundational KV Pool infrastructure and migrated
it to the latest ADXL stack
vllm-project#2913
vllm-project#3350
- Quality Contribution‌:
vllm-project#1568
vllm-project#2602
vllm-project#2913
vllm-project#3350
- Community Involvement‌:
He actively responds to [community
issues](https://github.com/vllm-project/vllm-ascend/issues?q=is%3Aissue%20commenter%3ALCAIZJ%20is%3Aopen%20-author%3ALCAIZJ),
continuously monitors functionality and accuracy issues related to PD
disaggregation and KV Pool, and proactively delivers [bug
fixes](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+author%3ALCAIZJ+is%3Amerged+bugfix).
- vLLM version: v0.12.0
- vLLM main:
vllm-project/vllm@ad32e3e

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
ZRJ026 pushed a commit to ZRJ026/vllm-ascend that referenced this pull request Feb 28, 2026
### What this PR does / why we need it?

Now `VLLM_ASCEND_ENABLE_NZ` will have three options:
0: disable nz;
1: only quant case enable nz;
2: enable nz as long as possible;

And `VLLM_ASCEND_ENABLE_NZ`=1 by default.

All cases are shown in the table below:

|  | W4A4 | W4A8 | W8A8 | fp16/bf16 | fp32 |
|---|---|---|---|---|---|
| trans nz | can't support nz | trans nz by default | trans nz by
default | trans nz when VLLM_ASCEND_ENABLE_NZ is 2 | can't support nz |
| transpose | only support not transpose case | only support transpose
case | only support transpose case | linear: only support not transpose
case<br>gmm: only support transpose case | same to fp16/bf16 |

Some exceptional cases:
1. MLAPO op need to do some additional processing on the weights,
including trans nz. If use MLAPO op, some weight will be transformed to
nz forcely;
2. MLA/SFA's weight `W_UV` will be used by op
`torch.ops._C_ascend.batch_matmul_transpose`, and this op can't support
nz currently;

### Does this PR introduce _any_ user-facing change?
Now fp16/bf16 weight will not trans nz by default.

### How was this patch tested?

- vLLM version: v0.12.0
- vLLM main:
vllm-project/vllm@ad32e3e

Signed-off-by: zzzzwwjj <1183291235@qq.com>
Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
ZRJ026 pushed a commit to ZRJ026/vllm-ascend that referenced this pull request Mar 4, 2026
…t#5152)

I'd like to nominate @zzzzwwjj @realliujiaxu @LCAIZJ to join vLLM Ascend
committer team.

@zzzzwwjj
---
- Review Quality‌:
He has completed 80+reviews since April. 2025, include
vllm-project#3232 (comment),
vllm-project#4822 (comment),
vllm-project#4768 (comment)
high quality review.

- Sustained Contributions
15+ Valuable bug fix and refactor is very good.

https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+author%3Azzzzwwjj+is%3Aclosed+review%3Aapproved
Continuous optimization of code architecture

https://github.com/vllm-project/vllm-ascend/pulls?q=author%3Azzzzwwjj+is%3Amerged

- Quality Contribution‌:
vllm-project#1229
vllm-project#1979
vllm-project#4359
vllm-project#4878

- Community Involvement‌:
He lead the vllm-project#1147, to
refactor AscendFusedMoE at the first time.
He shared topics about large-scale distributed inference and
reinforcement learning on vLLM-Ascend meetup on August 2nd.

@realliujiaxu
---
- Review Quality‌:
He has completed about [40+
reviews](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+commenter%3Arealliujiaxu+-author%3Arealliujiaxu+)
since September, include
vllm-project#4868 (comment),
vllm-project#2275 (comment).

- Sustained Contributions
He has completed (17
commits)[https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+author%3Arealliujiaxu+is%3Amerged],
continuously optimizing the performance of the MoE model.

- Quality Contribution‌:

Contributed the Flash Comm1 feature to the community, supporting both
eager and aclgraph execution modes, while compatible with multiple MoE
models including DeepSeek and GLM4.5.
  - vllm-project#3334
  - vllm-project#3420
  - vllm-project#3015

  co-author:
  - vllm-project#3495
  - vllm-project#4868

- Community Involvement‌:
1. Completed two major refactors, enabling vllm-ascend to evolve more
rapidly and robustly: [Linear
module](vllm-project#2867) and
[rejection
sampler](vllm-project#4975)
2. [fixed 8
bugs](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+author%3Arealliujiaxu+is%3Amerged+bugfix+)
in graph mode, spec decoding and async scheduling.

@LCAIZJ
---
- Review Quality‌: He's been the go-to reviewer for virtually all PD
disaggregation and KV Pool related PRs, having completed [30+
reviews](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+commenter%3ALCAIZJ+is%3Aopen+-author%3ALCAIZJ+)
since May 2025. Notable examples include
[discussion_r2553887360](vllm-project#4345 (comment)),
[issuecomment-3540994801](vllm-project#4161 (comment)),
and
[discussion_r2492593988](vllm-project#3981 (comment)),
all demonstrating thorough and insightful feedback.
- Sustained and Quality Contributions: His contributions reflect a
strong grasp of both ‌vLLM‌ and ‌vLLM Ascend‌ codebases, particularly in
prefill-decode disaggregation and KV pool areas ([7 PRs
merged](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+author%3ALCAIZJ+is%3Amerged+)).
Prefill-Decode Disaggregation: Delivered KV transfer functionality using
Mooncake TransferEngine and enabled layerwise KV transfer
vllm-project#1568
vllm-project#2602
KV Pool: Developed the foundational KV Pool infrastructure and migrated
it to the latest ADXL stack
vllm-project#2913
vllm-project#3350
- Quality Contribution‌:
vllm-project#1568
vllm-project#2602
vllm-project#2913
vllm-project#3350
- Community Involvement‌:
He actively responds to [community
issues](https://github.com/vllm-project/vllm-ascend/issues?q=is%3Aissue%20commenter%3ALCAIZJ%20is%3Aopen%20-author%3ALCAIZJ),
continuously monitors functionality and accuracy issues related to PD
disaggregation and KV Pool, and proactively delivers [bug
fixes](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+author%3ALCAIZJ+is%3Amerged+bugfix).
- vLLM version: v0.12.0
- vLLM main:
vllm-project/vllm@ad32e3e

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
ZRJ026 pushed a commit to ZRJ026/vllm-ascend that referenced this pull request Mar 4, 2026
### What this PR does / why we need it?

Now `VLLM_ASCEND_ENABLE_NZ` will have three options:
0: disable nz;
1: only quant case enable nz;
2: enable nz as long as possible;

And `VLLM_ASCEND_ENABLE_NZ`=1 by default.

All cases are shown in the table below:

|  | W4A4 | W4A8 | W8A8 | fp16/bf16 | fp32 |
|---|---|---|---|---|---|
| trans nz | can't support nz | trans nz by default | trans nz by
default | trans nz when VLLM_ASCEND_ENABLE_NZ is 2 | can't support nz |
| transpose | only support not transpose case | only support transpose
case | only support transpose case | linear: only support not transpose
case<br>gmm: only support transpose case | same to fp16/bf16 |

Some exceptional cases:
1. MLAPO op need to do some additional processing on the weights,
including trans nz. If use MLAPO op, some weight will be transformed to
nz forcely;
2. MLA/SFA's weight `W_UV` will be used by op
`torch.ops._C_ascend.batch_matmul_transpose`, and this op can't support
nz currently;

### Does this PR introduce _any_ user-facing change?
Now fp16/bf16 weight will not trans nz by default.

### How was this patch tested?

- vLLM version: v0.12.0
- vLLM main:
vllm-project/vllm@ad32e3e

Signed-off-by: zzzzwwjj <1183291235@qq.com>
Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants