Skip to content

[Feat] Flash comm allgher ep#3334

Merged
jianzs merged 2 commits intovllm-project:mainfrom
realliujiaxu:flash-comm-allgher-ep
Oct 15, 2025
Merged

[Feat] Flash comm allgher ep#3334
jianzs merged 2 commits intovllm-project:mainfrom
realliujiaxu:flash-comm-allgher-ep

Conversation

@realliujiaxu
Copy link
Copy Markdown
Collaborator

@realliujiaxu realliujiaxu commented Oct 9, 2025

What this PR does / why we need it?

Support flash comm v1(Sequence Parallelism) for Allgather EP.

  • Deepseek R1(MLA model) eager and aclgraph
  • GLM 4.5 MoE (GQA model) eager and aclgraph

Does this PR introduce any user-facing change?

How was this patch tested?

set export VLLM_ASCEND_ENABLE_FLASHCOMM=1 on A2
I test Deepseek-R1 on GSM 8k:
image

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Oct 9, 2025

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

  • A PR should do only one thing, smaller PRs enable faster reviews.
  • Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
  • Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for a new Allgather-based communication strategy for sequence parallelism within expert parallel models, which appears to be a valuable performance enhancement. The changes primarily involve adding the EP_ALLGATHER communication type, its implementation using custom ops for padding and communication, and the necessary logic to enable this path. While the overall approach is sound, I've identified a critical issue in vllm_ascend/utils.py where the is_moe_model function is hardcoded to always return True. This will incorrectly classify all models as MoE models, potentially leading to incorrect behavior and performance degradation for non-MoE models. This needs to be addressed before merging.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Oct 9, 2025

This pull request has conflicts, please resolve those before we can evaluate the pull request.

@realliujiaxu realliujiaxu force-pushed the flash-comm-allgher-ep branch from ea81d0c to e37b987 Compare October 10, 2025 01:16
@realliujiaxu realliujiaxu changed the title Flash comm allgher ep [Feat] Flash comm allgher ep Oct 10, 2025
@github-actions
Copy link
Copy Markdown
Contributor

This pull request has conflicts, please resolve those before we can evaluate the pull request.

@jianzs jianzs added ready read for review ready-for-test start test by label for PR labels Oct 11, 2025
@github-actions
Copy link
Copy Markdown
Contributor

This pull request has conflicts, please resolve those before we can evaluate the pull request.

@github-actions
Copy link
Copy Markdown
Contributor

This pull request has conflicts, please resolve those before we can evaluate the pull request.

@realliujiaxu realliujiaxu force-pushed the flash-comm-allgher-ep branch from 82e8af1 to 23b6b65 Compare October 14, 2025 02:54
Signed-off-by: realliujiaxu <realliujiaxu@163.com>
@jianzs jianzs merged commit f69a83b into vllm-project:main Oct 15, 2025
17 checks passed
MrZ20 pushed a commit to MrZ20/vllm-ascend that referenced this pull request Oct 17, 2025
Support flash comm v1(Sequence Parallelism) for Allgather EP.

- vLLM version: v0.11.0rc3
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0

---------

Signed-off-by: realliujiaxu <realliujiaxu@163.com>
Co-authored-by: zhaozx-cn <zhaozx2116@163.com>
Signed-off-by: MrZ20 <2609716663@qq.com>
MrZ20 pushed a commit to MrZ20/vllm-ascend that referenced this pull request Oct 17, 2025
Support flash comm v1(Sequence Parallelism) for Allgather EP.

- vLLM version: v0.11.0rc3
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0

---------

Signed-off-by: realliujiaxu <realliujiaxu@163.com>
Co-authored-by: zhaozx-cn <zhaozx2116@163.com>
Signed-off-by: MrZ20 <2609716663@qq.com>
MengqingCao pushed a commit that referenced this pull request Oct 17, 2025
### What this PR does / why we need it?

Fix 3 bugs in flash comm1 of Allgather
EP(#3334):
1. call `enable_sp()` with argument `vllm_config` trigger a lot of
warning log, this PR caches its return value.
2. `num_tokens_after_padding` should be cpu tensor as it will used as
`num_tokens_across_dp_cpu` in `DPMetadata`. It will causes may d2h copy
when running model.
3. In PD, model runner will execute `kv_connector_no_forward`,where
`num_tokens` is None

- vLLM version: v0.11.0rc3
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0

---------

Signed-off-by: realliujiaxu <realliujiaxu@163.com>
ZYang6263 pushed a commit to rjg-lyh/vllm-ascend that referenced this pull request Oct 23, 2025
### What this PR does / why we need it?

Fix 3 bugs in flash comm1 of Allgather
EP(vllm-project#3334):
1. call `enable_sp()` with argument `vllm_config` trigger a lot of
warning log, this PR caches its return value.
2. `num_tokens_after_padding` should be cpu tensor as it will used as
`num_tokens_across_dp_cpu` in `DPMetadata`. It will causes may d2h copy
when running model.
3. In PD, model runner will execute `kv_connector_no_forward`,where
`num_tokens` is None

- vLLM version: v0.11.0rc3
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0

---------

Signed-off-by: realliujiaxu <realliujiaxu@163.com>
weijinqian0 pushed a commit that referenced this pull request Nov 4, 2025
### What this PR does / why we need it?
move quant before allgather in Allgather EP, rely on
#3334

Deepseek R1 W8A8 performance on A2 with
`HCCL_ALGO="level0:NA;level1:pipeline"`:
| Seq length | Mean TTFT (ms) main | Mean TTFT (ms)  this PR |
|----------|----------|----------|
| 4k   |  375.21  | 364.99   |
| 16k  | 1465.23   | 1421.75  |
### Does this PR introduce _any_ user-facing change?

### How was this patch tested?


- vLLM version: v0.11.0
- vLLM main:
vllm-project/vllm@83f478b

---------

Signed-off-by: realliujiaxu <realliujiaxu@163.com>
Pz1116 pushed a commit to Pz1116/vllm-ascend that referenced this pull request Nov 5, 2025
### What this PR does / why we need it?
move quant before allgather in Allgather EP, rely on
vllm-project#3334

Deepseek R1 W8A8 performance on A2 with
`HCCL_ALGO="level0:NA;level1:pipeline"`:
| Seq length | Mean TTFT (ms) main | Mean TTFT (ms)  this PR |
|----------|----------|----------|
| 4k   |  375.21  | 364.99   |
| 16k  | 1465.23   | 1421.75  |
### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.11.0
- vLLM main:
vllm-project/vllm@83f478b

---------

Signed-off-by: realliujiaxu <realliujiaxu@163.com>
Signed-off-by: Pz1116 <zpbzpb123123@gmail.com>
realliujiaxu added a commit to realliujiaxu/vllm-ascend that referenced this pull request Nov 13, 2025
### What this PR does / why we need it?
move quant before allgather in Allgather EP, rely on
vllm-project#3334

Deepseek R1 W8A8 performance on A2 with
`HCCL_ALGO="level0:NA;level1:pipeline"`:
| Seq length | Mean TTFT (ms) main | Mean TTFT (ms)  this PR |
|----------|----------|----------|
| 4k   |  375.21  | 364.99   |
| 16k  | 1465.23   | 1421.75  |
### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.11.0
- vLLM main:
vllm-project/vllm@83f478b

---------

Signed-off-by: realliujiaxu <realliujiaxu@163.com>
luolun pushed a commit to luolun/vllm-ascend that referenced this pull request Nov 19, 2025
Support flash comm v1(Sequence Parallelism) for Allgather EP.

- vLLM version: v0.11.0rc3
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0

---------

Signed-off-by: realliujiaxu <realliujiaxu@163.com>
Co-authored-by: zhaozx-cn <zhaozx2116@163.com>
Signed-off-by: luolun <luolun1995@cmbchina.com>
luolun pushed a commit to luolun/vllm-ascend that referenced this pull request Nov 19, 2025
### What this PR does / why we need it?

Fix 3 bugs in flash comm1 of Allgather
EP(vllm-project#3334):
1. call `enable_sp()` with argument `vllm_config` trigger a lot of
warning log, this PR caches its return value.
2. `num_tokens_after_padding` should be cpu tensor as it will used as
`num_tokens_across_dp_cpu` in `DPMetadata`. It will causes may d2h copy
when running model.
3. In PD, model runner will execute `kv_connector_no_forward`,where
`num_tokens` is None

- vLLM version: v0.11.0rc3
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0

---------

Signed-off-by: realliujiaxu <realliujiaxu@163.com>
Signed-off-by: luolun <luolun1995@cmbchina.com>
luolun pushed a commit to luolun/vllm-ascend that referenced this pull request Nov 19, 2025
### What this PR does / why we need it?
move quant before allgather in Allgather EP, rely on
vllm-project#3334

Deepseek R1 W8A8 performance on A2 with
`HCCL_ALGO="level0:NA;level1:pipeline"`:
| Seq length | Mean TTFT (ms) main | Mean TTFT (ms)  this PR |
|----------|----------|----------|
| 4k   |  375.21  | 364.99   |
| 16k  | 1465.23   | 1421.75  |
### Does this PR introduce _any_ user-facing change?

### How was this patch tested?


- vLLM version: v0.11.0
- vLLM main:
vllm-project/vllm@83f478b

---------

Signed-off-by: realliujiaxu <realliujiaxu@163.com>
Signed-off-by: luolun <luolun1995@cmbchina.com>
hwhaokun pushed a commit to hwhaokun/vllm-ascend that referenced this pull request Nov 19, 2025
Support flash comm v1(Sequence Parallelism) for Allgather EP.

- vLLM version: v0.11.0rc3
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0

---------

Signed-off-by: realliujiaxu <realliujiaxu@163.com>
Co-authored-by: zhaozx-cn <zhaozx2116@163.com>
Signed-off-by: hwhaokun <haokun0405@163.com>
hwhaokun pushed a commit to hwhaokun/vllm-ascend that referenced this pull request Nov 19, 2025
### What this PR does / why we need it?

Fix 3 bugs in flash comm1 of Allgather
EP(vllm-project#3334):
1. call `enable_sp()` with argument `vllm_config` trigger a lot of
warning log, this PR caches its return value.
2. `num_tokens_after_padding` should be cpu tensor as it will used as
`num_tokens_across_dp_cpu` in `DPMetadata`. It will causes may d2h copy
when running model.
3. In PD, model runner will execute `kv_connector_no_forward`,where
`num_tokens` is None

- vLLM version: v0.11.0rc3
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0

---------

Signed-off-by: realliujiaxu <realliujiaxu@163.com>
Signed-off-by: hwhaokun <haokun0405@163.com>
hwhaokun pushed a commit to hwhaokun/vllm-ascend that referenced this pull request Nov 19, 2025
### What this PR does / why we need it?
move quant before allgather in Allgather EP, rely on
vllm-project#3334

Deepseek R1 W8A8 performance on A2 with
`HCCL_ALGO="level0:NA;level1:pipeline"`:
| Seq length | Mean TTFT (ms) main | Mean TTFT (ms)  this PR |
|----------|----------|----------|
| 4k   |  375.21  | 364.99   |
| 16k  | 1465.23   | 1421.75  |
### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.11.0
- vLLM main:
vllm-project/vllm@83f478b

---------

Signed-off-by: realliujiaxu <realliujiaxu@163.com>
Signed-off-by: hwhaokun <haokun0405@163.com>
NSDie pushed a commit to NSDie/vllm-ascend that referenced this pull request Nov 24, 2025
Support flash comm v1(Sequence Parallelism) for Allgather EP.

- vLLM version: v0.11.0rc3
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0

---------

Signed-off-by: realliujiaxu <realliujiaxu@163.com>
Co-authored-by: zhaozx-cn <zhaozx2116@163.com>
Signed-off-by: nsdie <yeyifan@huawei.com>
NSDie pushed a commit to NSDie/vllm-ascend that referenced this pull request Nov 24, 2025
### What this PR does / why we need it?

Fix 3 bugs in flash comm1 of Allgather
EP(vllm-project#3334):
1. call `enable_sp()` with argument `vllm_config` trigger a lot of
warning log, this PR caches its return value.
2. `num_tokens_after_padding` should be cpu tensor as it will used as
`num_tokens_across_dp_cpu` in `DPMetadata`. It will causes may d2h copy
when running model.
3. In PD, model runner will execute `kv_connector_no_forward`,where
`num_tokens` is None

- vLLM version: v0.11.0rc3
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0

---------

Signed-off-by: realliujiaxu <realliujiaxu@163.com>
Signed-off-by: nsdie <yeyifan@huawei.com>
NSDie pushed a commit to NSDie/vllm-ascend that referenced this pull request Nov 24, 2025
### What this PR does / why we need it?
move quant before allgather in Allgather EP, rely on
vllm-project#3334

Deepseek R1 W8A8 performance on A2 with
`HCCL_ALGO="level0:NA;level1:pipeline"`:
| Seq length | Mean TTFT (ms) main | Mean TTFT (ms)  this PR |
|----------|----------|----------|
| 4k   |  375.21  | 364.99   |
| 16k  | 1465.23   | 1421.75  |
### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.11.0
- vLLM main:
vllm-project/vllm@83f478b

---------

Signed-off-by: realliujiaxu <realliujiaxu@163.com>
Signed-off-by: nsdie <yeyifan@huawei.com>
Clorist33 pushed a commit to Clorist33/vllm-ascend that referenced this pull request Dec 9, 2025
Support flash comm v1(Sequence Parallelism) for Allgather EP.

- vLLM version: v0.11.0rc3
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0

---------

Signed-off-by: realliujiaxu <realliujiaxu@163.com>
Co-authored-by: zhaozx-cn <zhaozx2116@163.com>
Clorist33 pushed a commit to Clorist33/vllm-ascend that referenced this pull request Dec 9, 2025
### What this PR does / why we need it?

Fix 3 bugs in flash comm1 of Allgather
EP(vllm-project#3334):
1. call `enable_sp()` with argument `vllm_config` trigger a lot of
warning log, this PR caches its return value.
2. `num_tokens_after_padding` should be cpu tensor as it will used as
`num_tokens_across_dp_cpu` in `DPMetadata`. It will causes may d2h copy
when running model.
3. In PD, model runner will execute `kv_connector_no_forward`,where
`num_tokens` is None

- vLLM version: v0.11.0rc3
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0

---------

Signed-off-by: realliujiaxu <realliujiaxu@163.com>
Clorist33 pushed a commit to Clorist33/vllm-ascend that referenced this pull request Dec 10, 2025
### What this PR does / why we need it?
move quant before allgather in Allgather EP, rely on
vllm-project#3334

Deepseek R1 W8A8 performance on A2 with
`HCCL_ALGO="level0:NA;level1:pipeline"`:
| Seq length | Mean TTFT (ms) main | Mean TTFT (ms)  this PR |
|----------|----------|----------|
| 4k   |  375.21  | 364.99   |
| 16k  | 1465.23   | 1421.75  |
### Does this PR introduce _any_ user-facing change?

### How was this patch tested?


- vLLM version: v0.11.0
- vLLM main:
vllm-project/vllm@83f478b

---------

Signed-off-by: realliujiaxu <realliujiaxu@163.com>
wangxiyuan added a commit that referenced this pull request Dec 18, 2025
I'd like to nominate @zzzzwwjj @realliujiaxu @LCAIZJ to join vLLM Ascend
committer team.

@zzzzwwjj
---
- Review Quality‌:
He has completed 80+reviews since April. 2025, include
#3232 (comment),
#4822 (comment),
#4768 (comment)
high quality review.

- Sustained Contributions
15+ Valuable bug fix and refactor is very good.

https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+author%3Azzzzwwjj+is%3Aclosed+review%3Aapproved
Continuous optimization of code architecture

https://github.com/vllm-project/vllm-ascend/pulls?q=author%3Azzzzwwjj+is%3Amerged

- Quality Contribution‌:
#1229
#1979
#4359
#4878

- Community Involvement‌: 
He lead the #1147, to
refactor AscendFusedMoE at the first time.
He shared topics about large-scale distributed inference and
reinforcement learning on vLLM-Ascend meetup on August 2nd.

@realliujiaxu
---
- Review Quality‌:
He has completed about [40+
reviews](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+commenter%3Arealliujiaxu+-author%3Arealliujiaxu+)
since September, include
#4868 (comment),
#2275 (comment).

- Sustained Contributions
He has completed (17
commits)[https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+author%3Arealliujiaxu+is%3Amerged],
continuously optimizing the performance of the MoE model.

- Quality Contribution‌:

Contributed the Flash Comm1 feature to the community, supporting both
eager and aclgraph execution modes, while compatible with multiple MoE
models including DeepSeek and GLM4.5.
  - #3334
  - #3420
  - #3015
  
  co-author:
  - #3495
  - #4868

- Community Involvement‌: 
1. Completed two major refactors, enabling vllm-ascend to evolve more
rapidly and robustly: [Linear
module](#2867) and
[rejection
sampler](#4975)
2. [fixed 8
bugs](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+author%3Arealliujiaxu+is%3Amerged+bugfix+)
in graph mode, spec decoding and async scheduling.

@LCAIZJ
---
- Review Quality‌: He's been the go-to reviewer for virtually all PD
disaggregation and KV Pool related PRs, having completed [30+
reviews](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+commenter%3ALCAIZJ+is%3Aopen+-author%3ALCAIZJ+)
since May 2025. Notable examples include
[discussion_r2553887360](#4345 (comment)),
[issuecomment-3540994801](#4161 (comment)),
and
[discussion_r2492593988](#3981 (comment)),
all demonstrating thorough and insightful feedback.
- Sustained and Quality Contributions: His contributions reflect a
strong grasp of both ‌vLLM‌ and ‌vLLM Ascend‌ codebases, particularly in
prefill-decode disaggregation and KV pool areas ([7 PRs
merged](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+author%3ALCAIZJ+is%3Amerged+)).
Prefill-Decode Disaggregation: Delivered KV transfer functionality using
Mooncake TransferEngine and enabled layerwise KV transfer
#1568
#2602
KV Pool: Developed the foundational KV Pool infrastructure and migrated
it to the latest ADXL stack
#2913
#3350
- Quality Contribution‌:
#1568
#2602
#2913
#3350
- Community Involvement‌: 
He actively responds to [community
issues](https://github.com/vllm-project/vllm-ascend/issues?q=is%3Aissue%20commenter%3ALCAIZJ%20is%3Aopen%20-author%3ALCAIZJ),
continuously monitors functionality and accuracy issues related to PD
disaggregation and KV Pool, and proactively delivers [bug
fixes](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+author%3ALCAIZJ+is%3Amerged+bugfix).
- vLLM version: v0.12.0
- vLLM main:
vllm-project/vllm@ad32e3e

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
chenaoxuan pushed a commit to chenaoxuan/vllm-ascend that referenced this pull request Dec 20, 2025
…t#5152)

I'd like to nominate @zzzzwwjj @realliujiaxu @LCAIZJ to join vLLM Ascend
committer team.

@zzzzwwjj
---
- Review Quality‌:
He has completed 80+reviews since April. 2025, include
vllm-project#3232 (comment),
vllm-project#4822 (comment),
vllm-project#4768 (comment)
high quality review.

- Sustained Contributions
15+ Valuable bug fix and refactor is very good.

https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+author%3Azzzzwwjj+is%3Aclosed+review%3Aapproved
Continuous optimization of code architecture

https://github.com/vllm-project/vllm-ascend/pulls?q=author%3Azzzzwwjj+is%3Amerged

- Quality Contribution‌:
vllm-project#1229
vllm-project#1979
vllm-project#4359
vllm-project#4878

- Community Involvement‌: 
He lead the vllm-project#1147, to
refactor AscendFusedMoE at the first time.
He shared topics about large-scale distributed inference and
reinforcement learning on vLLM-Ascend meetup on August 2nd.

@realliujiaxu
---
- Review Quality‌:
He has completed about [40+
reviews](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+commenter%3Arealliujiaxu+-author%3Arealliujiaxu+)
since September, include
vllm-project#4868 (comment),
vllm-project#2275 (comment).

- Sustained Contributions
He has completed (17
commits)[https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+author%3Arealliujiaxu+is%3Amerged],
continuously optimizing the performance of the MoE model.

- Quality Contribution‌:

Contributed the Flash Comm1 feature to the community, supporting both
eager and aclgraph execution modes, while compatible with multiple MoE
models including DeepSeek and GLM4.5.
  - vllm-project#3334
  - vllm-project#3420
  - vllm-project#3015
  
  co-author:
  - vllm-project#3495
  - vllm-project#4868

- Community Involvement‌: 
1. Completed two major refactors, enabling vllm-ascend to evolve more
rapidly and robustly: [Linear
module](vllm-project#2867) and
[rejection
sampler](vllm-project#4975)
2. [fixed 8
bugs](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+author%3Arealliujiaxu+is%3Amerged+bugfix+)
in graph mode, spec decoding and async scheduling.

@LCAIZJ
---
- Review Quality‌: He's been the go-to reviewer for virtually all PD
disaggregation and KV Pool related PRs, having completed [30+
reviews](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+commenter%3ALCAIZJ+is%3Aopen+-author%3ALCAIZJ+)
since May 2025. Notable examples include
[discussion_r2553887360](vllm-project#4345 (comment)),
[issuecomment-3540994801](vllm-project#4161 (comment)),
and
[discussion_r2492593988](vllm-project#3981 (comment)),
all demonstrating thorough and insightful feedback.
- Sustained and Quality Contributions: His contributions reflect a
strong grasp of both ‌vLLM‌ and ‌vLLM Ascend‌ codebases, particularly in
prefill-decode disaggregation and KV pool areas ([7 PRs
merged](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+author%3ALCAIZJ+is%3Amerged+)).
Prefill-Decode Disaggregation: Delivered KV transfer functionality using
Mooncake TransferEngine and enabled layerwise KV transfer
vllm-project#1568
vllm-project#2602
KV Pool: Developed the foundational KV Pool infrastructure and migrated
it to the latest ADXL stack
vllm-project#2913
vllm-project#3350
- Quality Contribution‌:
vllm-project#1568
vllm-project#2602
vllm-project#2913
vllm-project#3350
- Community Involvement‌: 
He actively responds to [community
issues](https://github.com/vllm-project/vllm-ascend/issues?q=is%3Aissue%20commenter%3ALCAIZJ%20is%3Aopen%20-author%3ALCAIZJ),
continuously monitors functionality and accuracy issues related to PD
disaggregation and KV Pool, and proactively delivers [bug
fixes](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+author%3ALCAIZJ+is%3Amerged+bugfix).
- vLLM version: v0.12.0
- vLLM main:
vllm-project/vllm@ad32e3e

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
ZRJ026 pushed a commit to ZRJ026/vllm-ascend that referenced this pull request Feb 28, 2026
…t#5152)

I'd like to nominate @zzzzwwjj @realliujiaxu @LCAIZJ to join vLLM Ascend
committer team.

@zzzzwwjj
---
- Review Quality‌:
He has completed 80+reviews since April. 2025, include
vllm-project#3232 (comment),
vllm-project#4822 (comment),
vllm-project#4768 (comment)
high quality review.

- Sustained Contributions
15+ Valuable bug fix and refactor is very good.

https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+author%3Azzzzwwjj+is%3Aclosed+review%3Aapproved
Continuous optimization of code architecture

https://github.com/vllm-project/vllm-ascend/pulls?q=author%3Azzzzwwjj+is%3Amerged

- Quality Contribution‌:
vllm-project#1229
vllm-project#1979
vllm-project#4359
vllm-project#4878

- Community Involvement‌:
He lead the vllm-project#1147, to
refactor AscendFusedMoE at the first time.
He shared topics about large-scale distributed inference and
reinforcement learning on vLLM-Ascend meetup on August 2nd.

@realliujiaxu
---
- Review Quality‌:
He has completed about [40+
reviews](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+commenter%3Arealliujiaxu+-author%3Arealliujiaxu+)
since September, include
vllm-project#4868 (comment),
vllm-project#2275 (comment).

- Sustained Contributions
He has completed (17
commits)[https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+author%3Arealliujiaxu+is%3Amerged],
continuously optimizing the performance of the MoE model.

- Quality Contribution‌:

Contributed the Flash Comm1 feature to the community, supporting both
eager and aclgraph execution modes, while compatible with multiple MoE
models including DeepSeek and GLM4.5.
  - vllm-project#3334
  - vllm-project#3420
  - vllm-project#3015

  co-author:
  - vllm-project#3495
  - vllm-project#4868

- Community Involvement‌:
1. Completed two major refactors, enabling vllm-ascend to evolve more
rapidly and robustly: [Linear
module](vllm-project#2867) and
[rejection
sampler](vllm-project#4975)
2. [fixed 8
bugs](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+author%3Arealliujiaxu+is%3Amerged+bugfix+)
in graph mode, spec decoding and async scheduling.

@LCAIZJ
---
- Review Quality‌: He's been the go-to reviewer for virtually all PD
disaggregation and KV Pool related PRs, having completed [30+
reviews](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+commenter%3ALCAIZJ+is%3Aopen+-author%3ALCAIZJ+)
since May 2025. Notable examples include
[discussion_r2553887360](vllm-project#4345 (comment)),
[issuecomment-3540994801](vllm-project#4161 (comment)),
and
[discussion_r2492593988](vllm-project#3981 (comment)),
all demonstrating thorough and insightful feedback.
- Sustained and Quality Contributions: His contributions reflect a
strong grasp of both ‌vLLM‌ and ‌vLLM Ascend‌ codebases, particularly in
prefill-decode disaggregation and KV pool areas ([7 PRs
merged](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+author%3ALCAIZJ+is%3Amerged+)).
Prefill-Decode Disaggregation: Delivered KV transfer functionality using
Mooncake TransferEngine and enabled layerwise KV transfer
vllm-project#1568
vllm-project#2602
KV Pool: Developed the foundational KV Pool infrastructure and migrated
it to the latest ADXL stack
vllm-project#2913
vllm-project#3350
- Quality Contribution‌:
vllm-project#1568
vllm-project#2602
vllm-project#2913
vllm-project#3350
- Community Involvement‌:
He actively responds to [community
issues](https://github.com/vllm-project/vllm-ascend/issues?q=is%3Aissue%20commenter%3ALCAIZJ%20is%3Aopen%20-author%3ALCAIZJ),
continuously monitors functionality and accuracy issues related to PD
disaggregation and KV Pool, and proactively delivers [bug
fixes](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+author%3ALCAIZJ+is%3Amerged+bugfix).
- vLLM version: v0.12.0
- vLLM main:
vllm-project/vllm@ad32e3e

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
ZRJ026 pushed a commit to ZRJ026/vllm-ascend that referenced this pull request Mar 4, 2026
…t#5152)

I'd like to nominate @zzzzwwjj @realliujiaxu @LCAIZJ to join vLLM Ascend
committer team.

@zzzzwwjj
---
- Review Quality‌:
He has completed 80+reviews since April. 2025, include
vllm-project#3232 (comment),
vllm-project#4822 (comment),
vllm-project#4768 (comment)
high quality review.

- Sustained Contributions
15+ Valuable bug fix and refactor is very good.

https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+author%3Azzzzwwjj+is%3Aclosed+review%3Aapproved
Continuous optimization of code architecture

https://github.com/vllm-project/vllm-ascend/pulls?q=author%3Azzzzwwjj+is%3Amerged

- Quality Contribution‌:
vllm-project#1229
vllm-project#1979
vllm-project#4359
vllm-project#4878

- Community Involvement‌:
He lead the vllm-project#1147, to
refactor AscendFusedMoE at the first time.
He shared topics about large-scale distributed inference and
reinforcement learning on vLLM-Ascend meetup on August 2nd.

@realliujiaxu
---
- Review Quality‌:
He has completed about [40+
reviews](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+commenter%3Arealliujiaxu+-author%3Arealliujiaxu+)
since September, include
vllm-project#4868 (comment),
vllm-project#2275 (comment).

- Sustained Contributions
He has completed (17
commits)[https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+author%3Arealliujiaxu+is%3Amerged],
continuously optimizing the performance of the MoE model.

- Quality Contribution‌:

Contributed the Flash Comm1 feature to the community, supporting both
eager and aclgraph execution modes, while compatible with multiple MoE
models including DeepSeek and GLM4.5.
  - vllm-project#3334
  - vllm-project#3420
  - vllm-project#3015

  co-author:
  - vllm-project#3495
  - vllm-project#4868

- Community Involvement‌:
1. Completed two major refactors, enabling vllm-ascend to evolve more
rapidly and robustly: [Linear
module](vllm-project#2867) and
[rejection
sampler](vllm-project#4975)
2. [fixed 8
bugs](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+author%3Arealliujiaxu+is%3Amerged+bugfix+)
in graph mode, spec decoding and async scheduling.

@LCAIZJ
---
- Review Quality‌: He's been the go-to reviewer for virtually all PD
disaggregation and KV Pool related PRs, having completed [30+
reviews](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+commenter%3ALCAIZJ+is%3Aopen+-author%3ALCAIZJ+)
since May 2025. Notable examples include
[discussion_r2553887360](vllm-project#4345 (comment)),
[issuecomment-3540994801](vllm-project#4161 (comment)),
and
[discussion_r2492593988](vllm-project#3981 (comment)),
all demonstrating thorough and insightful feedback.
- Sustained and Quality Contributions: His contributions reflect a
strong grasp of both ‌vLLM‌ and ‌vLLM Ascend‌ codebases, particularly in
prefill-decode disaggregation and KV pool areas ([7 PRs
merged](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+author%3ALCAIZJ+is%3Amerged+)).
Prefill-Decode Disaggregation: Delivered KV transfer functionality using
Mooncake TransferEngine and enabled layerwise KV transfer
vllm-project#1568
vllm-project#2602
KV Pool: Developed the foundational KV Pool infrastructure and migrated
it to the latest ADXL stack
vllm-project#2913
vllm-project#3350
- Quality Contribution‌:
vllm-project#1568
vllm-project#2602
vllm-project#2913
vllm-project#3350
- Community Involvement‌:
He actively responds to [community
issues](https://github.com/vllm-project/vllm-ascend/issues?q=is%3Aissue%20commenter%3ALCAIZJ%20is%3Aopen%20-author%3ALCAIZJ),
continuously monitors functionality and accuracy issues related to PD
disaggregation and KV Pool, and proactively delivers [bug
fixes](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+author%3ALCAIZJ+is%3Amerged+bugfix).
- vLLM version: v0.12.0
- vLLM main:
vllm-project/vllm@ad32e3e

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants