Skip to content

[1/N][refactor] torchair deepseek modeling refactor#2384

Merged
wangxiyuan merged 1 commit into
vllm-project:mainfrom
linfeng-yuan:torhcair_deepseek_modeling_refactor_00
Aug 18, 2025
Merged

[1/N][refactor] torchair deepseek modeling refactor#2384
wangxiyuan merged 1 commit into
vllm-project:mainfrom
linfeng-yuan:torhcair_deepseek_modeling_refactor_00

Conversation

@linfeng-yuan
Copy link
Copy Markdown
Collaborator

@linfeng-yuan linfeng-yuan commented Aug 14, 2025

What this PR does / why we need it?

Move torchair related model arch into torchair moduel to make the code clear. Next step we'll remove all torchair related code outside of torchair moduel.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

@github-actions
Copy link
Copy Markdown
Contributor

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

  • A PR should do only one thing, smaller PRs enable faster reviews.
  • Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
  • Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the DeepSeek model implementations to use torchair on Ascend NPUs. This involves adding new torchair-specific model files for DeepSeek V2, V3, and MTP, and registering them with vLLM's model registry. The changes are well-structured and introduce necessary hardware-specific optimizations. I've found one critical issue with the model registration keys that would prevent the new models from being loaded.

Comment thread vllm_ascend/torchair/utils.py Outdated
@linfeng-yuan linfeng-yuan force-pushed the torhcair_deepseek_modeling_refactor_00 branch 2 times, most recently from 05c11db to 23dedb2 Compare August 14, 2025 18:34
write_kv_cache_bytes_to_file(torch.distributed.get_rank(),
self.new_kv_cache_bytes)

def load_model(self) -> None:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about register the model in __init__, so that we don't need to override this func?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about register the model in __init__, so that we don't need to override this func?

It works. I'll remove this block and register the torchair_models in init func.

@linfeng-yuan linfeng-yuan force-pushed the torhcair_deepseek_modeling_refactor_00 branch 4 times, most recently from 7832e60 to 39fc507 Compare August 15, 2025 04:11
@codecov
Copy link
Copy Markdown

codecov Bot commented Aug 15, 2025

Codecov Report

❌ Patch coverage is 20.00000% with 4 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (main@1b40665). Learn more about missing BASE report.
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
vllm_ascend/torchair/utils.py 20.00% 4 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main    #2384   +/-   ##
=======================================
  Coverage        ?   76.16%           
=======================================
  Files           ?      120           
  Lines           ?    13537           
  Branches        ?        0           
=======================================
  Hits            ?    10311           
  Misses          ?     3226           
  Partials        ?        0           
Flag Coverage Δ
unittests 76.16% <20.00%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@linfeng-yuan linfeng-yuan force-pushed the torhcair_deepseek_modeling_refactor_00 branch from 39fc507 to 415720b Compare August 18, 2025 05:51
@linfeng-yuan linfeng-yuan force-pushed the torhcair_deepseek_modeling_refactor_00 branch from 415720b to 0832aa5 Compare August 18, 2025 06:21
Signed-off-by: linfeng-yuan <1102311262@qq.com>
@linfeng-yuan linfeng-yuan force-pushed the torhcair_deepseek_modeling_refactor_00 branch from 0832aa5 to eb439e0 Compare August 18, 2025 06:28
@wangxiyuan wangxiyuan merged commit 3fc31ee into vllm-project:main Aug 18, 2025
17 of 19 checks passed
@wangxiyuan
Copy link
Copy Markdown
Collaborator

merge this to unblock other refactor work. The CI failure related to another known issue.

wangxiyuan pushed a commit that referenced this pull request Sep 4, 2025
### What this PR does / why we need it?

1. Similar to #2384 , this PR add a torchair-specific modeling for
pangu.
2. Fixes a bug introduced by routed_scaling_factor in #2675 .
3. remove eager test case for pangu since there has already been a
torchair test case.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?


- vLLM version: v0.10.1.1
- vLLM main:
vllm-project/vllm@6997a25

---------

Signed-off-by: zengyanjia <z00883269@china.huawei.com>
Signed-off-by: Angazenn <supperccell@163.com>
Co-authored-by: zengyanjia <z00883269@china.huawei.com>
Angazenn added a commit to Angazenn/vllm-ascend that referenced this pull request Sep 10, 2025
…ect#2437)

1. Similar to vllm-project#2384 , this PR add a torchair-specific modeling for
pangu.
2. Fixes a bug introduced by routed_scaling_factor in vllm-project#2675 .
3. remove eager test case for pangu since there has already been a
torchair test case.

No.

- vLLM version: v0.10.1.1
- vLLM main:
vllm-project/vllm@6997a25

---------

Signed-off-by: Angazenn <supperccell@163.com>
offline893 pushed a commit to offline893/vllm-ascend that referenced this pull request Sep 16, 2025
…ect#2437)

### What this PR does / why we need it?

1. Similar to vllm-project#2384 , this PR add a torchair-specific modeling for
pangu.
2. Fixes a bug introduced by routed_scaling_factor in vllm-project#2675 .
3. remove eager test case for pangu since there has already been a
torchair test case.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

- vLLM version: v0.10.1.1
- vLLM main:
vllm-project/vllm@6997a25

---------

Signed-off-by: zengyanjia <z00883269@china.huawei.com>
Signed-off-by: Angazenn <supperccell@163.com>
Co-authored-by: zengyanjia <z00883269@china.huawei.com>
Signed-off-by: offline0806 <z00858301@china.huawei.com>
wangxiaoteng888 pushed a commit to LCAIZJ/vllm-ascend that referenced this pull request Sep 25, 2025
…ect#2437)

### What this PR does / why we need it?

1. Similar to vllm-project#2384 , this PR add a torchair-specific modeling for
pangu.
2. Fixes a bug introduced by routed_scaling_factor in vllm-project#2675 .
3. remove eager test case for pangu since there has already been a
torchair test case.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?


- vLLM version: v0.10.1.1
- vLLM main:
vllm-project/vllm@6997a25

---------

Signed-off-by: zengyanjia <z00883269@china.huawei.com>
Signed-off-by: Angazenn <supperccell@163.com>
Co-authored-by: zengyanjia <z00883269@china.huawei.com>
chopper0126 pushed a commit to chopper0126/vllm-ascend that referenced this pull request Sep 26, 2025
### What this PR does / why we need it?

Move torchair related model arch into torchair moduel to make the code
clear. Next step we'll remove all torchair related code outside of
torchair moduel.

### Does this PR introduce _any_ user-facing change?
No.

- vLLM version: v0.10.0
- vLLM main:
vllm-project/vllm@08d5f71

Signed-off-by: linfeng-yuan <1102311262@qq.com>
chopper0126 pushed a commit to chopper0126/vllm-ascend that referenced this pull request Sep 26, 2025
…ect#2437)

### What this PR does / why we need it?

1. Similar to vllm-project#2384 , this PR add a torchair-specific modeling for
pangu.
2. Fixes a bug introduced by routed_scaling_factor in vllm-project#2675 .
3. remove eager test case for pangu since there has already been a
torchair test case.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?


- vLLM version: v0.10.1.1
- vLLM main:
vllm-project/vllm@6997a25

---------

Signed-off-by: zengyanjia <z00883269@china.huawei.com>
Signed-off-by: Angazenn <supperccell@163.com>
Co-authored-by: zengyanjia <z00883269@china.huawei.com>
Angazenn pushed a commit to Angazenn/vllm-ascend that referenced this pull request Oct 21, 2025
### What this PR does / why we need it?

Move torchair related model arch into torchair moduel to make the code
clear. Next step we'll remove all torchair related code outside of
torchair moduel.

### Does this PR introduce _any_ user-facing change?
No.

- vLLM version: v0.10.0
- vLLM main:
vllm-project/vllm@08d5f71

Signed-off-by: linfeng-yuan <1102311262@qq.com>
Angazenn added a commit to Angazenn/vllm-ascend that referenced this pull request Oct 21, 2025
…ect#2437)

### What this PR does / why we need it?

1. Similar to vllm-project#2384 , this PR add a torchair-specific modeling for
pangu.
2. Fixes a bug introduced by routed_scaling_factor in vllm-project#2675 .
3. remove eager test case for pangu since there has already been a
torchair test case.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?


- vLLM version: v0.10.1.1
- vLLM main:
vllm-project/vllm@6997a25

---------

Signed-off-by: zengyanjia <z00883269@china.huawei.com>
Signed-off-by: Angazenn <supperccell@163.com>
Co-authored-by: zengyanjia <z00883269@china.huawei.com>
Clorist33 pushed a commit to Clorist33/vllm-ascend that referenced this pull request Dec 9, 2025
### What this PR does / why we need it?

Move torchair related model arch into torchair moduel to make the code
clear. Next step we'll remove all torchair related code outside of
torchair moduel.

### Does this PR introduce _any_ user-facing change?
No.

- vLLM version: v0.10.0
- vLLM main:
vllm-project/vllm@08d5f71

Signed-off-by: linfeng-yuan <1102311262@qq.com>
Clorist33 pushed a commit to Clorist33/vllm-ascend that referenced this pull request Dec 9, 2025
…ect#2437)

### What this PR does / why we need it?

1. Similar to vllm-project#2384 , this PR add a torchair-specific modeling for
pangu.
2. Fixes a bug introduced by routed_scaling_factor in vllm-project#2675 .
3. remove eager test case for pangu since there has already been a
torchair test case.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?


- vLLM version: v0.10.1.1
- vLLM main:
vllm-project/vllm@6997a25

---------

Signed-off-by: zengyanjia <z00883269@china.huawei.com>
Signed-off-by: Angazenn <supperccell@163.com>
Co-authored-by: zengyanjia <z00883269@china.huawei.com>
yangzhe-2026 pushed a commit to yangzhe-2026/vllm-ascend that referenced this pull request May 6, 2026
### What this PR does / why we need it?

Move torchair related model arch into torchair moduel to make the code
clear. Next step we'll remove all torchair related code outside of
torchair moduel.

### Does this PR introduce _any_ user-facing change?
No.

- vLLM version: v0.10.0
- vLLM main:
vllm-project/vllm@08d5f71

Signed-off-by: linfeng-yuan <1102311262@qq.com>
yangzhe-2026 pushed a commit to yangzhe-2026/vllm-ascend that referenced this pull request May 6, 2026
…ect#2437)

### What this PR does / why we need it?

1. Similar to vllm-project#2384 , this PR add a torchair-specific modeling for
pangu.
2. Fixes a bug introduced by routed_scaling_factor in vllm-project#2675 .
3. remove eager test case for pangu since there has already been a
torchair test case.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?


- vLLM version: v0.10.1.1
- vLLM main:
vllm-project/vllm@6997a25

---------

Signed-off-by: zengyanjia <z00883269@china.huawei.com>
Signed-off-by: Angazenn <supperccell@163.com>
Co-authored-by: zengyanjia <z00883269@china.huawei.com>
linfeng-yuan pushed a commit that referenced this pull request May 9, 2026
- ✅ **Review Quality:**
He has completed [50+
reviews](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+reviewed-by%3Alinfeng-yuan)
since April 2025, covering graph mode, MoE, quantization, model support,
and performance-related changes.

In addition to regular review work, he has also participated in complex
feature development and review, such as
[#6670](#6670) (MoE
MXFP8 quantization), where he helped with A5 MXFP8 integration,
compatibility cleanup, dispatch updates, and implementation fixes.

- ✅ **Sustained Contributions:**
He has [60+ merged
PRs](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+is%3Amerged+author%3Alinfeng-yuan)
since April 2025, with continuous activity across major release cycles.

- ✅ **Quality Contributions:**

  **Torchair Graph Mode & Wide-EP / MoE — Feature Owner (2025 Q2~Q4):**
He was the Feature Owner for DeepSeek high-throughput inference under
torchair graph mode and the Wide-EP project. He drove graph mode
performance optimization
([#731](#731)), landed
super-kernel fusion for quantized DSR1
([#3485](#3485)), and
added initial MoE support for Model Runner v2
([#7922](#7922)).

  **Ascend950 (A5) — Feature Owner:**
He authored the [RFC roadmap
(#7157)](#7157) for A5
support, landed initial build support
([#7151](#7151)),
co-authored MXFP8 and MXFP4 quantization support for A5
([#6670](#6670),
[#7877](#7877)), and
fixed the MXFP8 scale normalization issue that unblocked A5 quantized
inference
([#7573](#7573)).

  **DeepSeek Low-Latency & Post-Processing:**
He improved DSv3.2 performance by eliminating HD synchronization
([#4805](#4805)),
improved rejection sampler performance and eliminated D2H sync in
TopKTopPSampler
([#4154](#4154)), and
added a penalty-related Triton kernel for sampling performance
([#7794](#7794)).

- ✅ **Community Involvement:**
He led a 2-part torchair modeling refactor
([#2384](#2384),
[#2459](#2459)) and
deleted ~2K lines of redundant DeepSeek modeling code as upstream
absorbed the changes
([#2849](#2849)). He
also replaced scattered business kwargs with typed request objects
across MoE stage boundaries
([#7024](#7024)).

Since March 2026, he has taken part in issue triage and user support,
responding to [30+
issues](https://github.com/vllm-project/vllm-ascend/issues?q=is%3Aissue+commenter%3Alinfeng-yuan+updated%3A%3E2026-03-01)
covering graph mode failures, quantization accuracy regressions, MoE
deployment problems, and multi-node communication issues.

- vLLM version: v0.19.1
- vLLM main:
vllm-project/vllm@4d51588

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
yangzhe-2026 pushed a commit to yangzhe-2026/vllm-ascend that referenced this pull request May 10, 2026
- ✅ **Review Quality:**
He has completed [50+
reviews](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+reviewed-by%3Alinfeng-yuan)
since April 2025, covering graph mode, MoE, quantization, model support,
and performance-related changes.

In addition to regular review work, he has also participated in complex
feature development and review, such as
[vllm-project#6670](vllm-project#6670) (MoE
MXFP8 quantization), where he helped with A5 MXFP8 integration,
compatibility cleanup, dispatch updates, and implementation fixes.

- ✅ **Sustained Contributions:**
He has [60+ merged
PRs](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+is%3Amerged+author%3Alinfeng-yuan)
since April 2025, with continuous activity across major release cycles.

- ✅ **Quality Contributions:**

  **Torchair Graph Mode & Wide-EP / MoE — Feature Owner (2025 Q2~Q4):**
He was the Feature Owner for DeepSeek high-throughput inference under
torchair graph mode and the Wide-EP project. He drove graph mode
performance optimization
([vllm-project#731](vllm-project#731)), landed
super-kernel fusion for quantized DSR1
([vllm-project#3485](vllm-project#3485)), and
added initial MoE support for Model Runner v2
([vllm-project#7922](vllm-project#7922)).

  **Ascend950 (A5) — Feature Owner:**
He authored the [RFC roadmap
(vllm-project#7157)](vllm-project#7157) for A5
support, landed initial build support
([vllm-project#7151](vllm-project#7151)),
co-authored MXFP8 and MXFP4 quantization support for A5
([vllm-project#6670](vllm-project#6670),
[vllm-project#7877](vllm-project#7877)), and
fixed the MXFP8 scale normalization issue that unblocked A5 quantized
inference
([vllm-project#7573](vllm-project#7573)).

  **DeepSeek Low-Latency & Post-Processing:**
He improved DSv3.2 performance by eliminating HD synchronization
([vllm-project#4805](vllm-project#4805)),
improved rejection sampler performance and eliminated D2H sync in
TopKTopPSampler
([vllm-project#4154](vllm-project#4154)), and
added a penalty-related Triton kernel for sampling performance
([vllm-project#7794](vllm-project#7794)).

- ✅ **Community Involvement:**
He led a 2-part torchair modeling refactor
([vllm-project#2384](vllm-project#2384),
[vllm-project#2459](vllm-project#2459)) and
deleted ~2K lines of redundant DeepSeek modeling code as upstream
absorbed the changes
([vllm-project#2849](vllm-project#2849)). He
also replaced scattered business kwargs with typed request objects
across MoE stage boundaries
([vllm-project#7024](vllm-project#7024)).

Since March 2026, he has taken part in issue triage and user support,
responding to [30+
issues](https://github.com/vllm-project/vllm-ascend/issues?q=is%3Aissue+commenter%3Alinfeng-yuan+updated%3A%3E2026-03-01)
covering graph mode failures, quantization accuracy regressions, MoE
deployment problems, and multi-node communication issues.

- vLLM version: v0.19.1
- vLLM main:
vllm-project/vllm@4d51588

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
Signed-off-by: yangzhe-2026 <yangzhe@isrc.iscas.ac.cn>
SOMEONEUNSEEN pushed a commit to SOMEONEUNSEEN/vllm-ascend that referenced this pull request May 11, 2026
- ✅ **Review Quality:**
He has completed [50+
reviews](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+reviewed-by%3Alinfeng-yuan)
since April 2025, covering graph mode, MoE, quantization, model support,
and performance-related changes.

In addition to regular review work, he has also participated in complex
feature development and review, such as
[vllm-project#6670](vllm-project#6670) (MoE
MXFP8 quantization), where he helped with A5 MXFP8 integration,
compatibility cleanup, dispatch updates, and implementation fixes.

- ✅ **Sustained Contributions:**
He has [60+ merged
PRs](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+is%3Amerged+author%3Alinfeng-yuan)
since April 2025, with continuous activity across major release cycles.

- ✅ **Quality Contributions:**

  **Torchair Graph Mode & Wide-EP / MoE — Feature Owner (2025 Q2~Q4):**
He was the Feature Owner for DeepSeek high-throughput inference under
torchair graph mode and the Wide-EP project. He drove graph mode
performance optimization
([vllm-project#731](vllm-project#731)), landed
super-kernel fusion for quantized DSR1
([vllm-project#3485](vllm-project#3485)), and
added initial MoE support for Model Runner v2
([vllm-project#7922](vllm-project#7922)).

  **Ascend950 (A5) — Feature Owner:**
He authored the [RFC roadmap
(vllm-project#7157)](vllm-project#7157) for A5
support, landed initial build support
([vllm-project#7151](vllm-project#7151)),
co-authored MXFP8 and MXFP4 quantization support for A5
([vllm-project#6670](vllm-project#6670),
[vllm-project#7877](vllm-project#7877)), and
fixed the MXFP8 scale normalization issue that unblocked A5 quantized
inference
([vllm-project#7573](vllm-project#7573)).

  **DeepSeek Low-Latency & Post-Processing:**
He improved DSv3.2 performance by eliminating HD synchronization
([vllm-project#4805](vllm-project#4805)),
improved rejection sampler performance and eliminated D2H sync in
TopKTopPSampler
([vllm-project#4154](vllm-project#4154)), and
added a penalty-related Triton kernel for sampling performance
([vllm-project#7794](vllm-project#7794)).

- ✅ **Community Involvement:**
He led a 2-part torchair modeling refactor
([vllm-project#2384](vllm-project#2384),
[vllm-project#2459](vllm-project#2459)) and
deleted ~2K lines of redundant DeepSeek modeling code as upstream
absorbed the changes
([vllm-project#2849](vllm-project#2849)). He
also replaced scattered business kwargs with typed request objects
across MoE stage boundaries
([vllm-project#7024](vllm-project#7024)).

Since March 2026, he has taken part in issue triage and user support,
responding to [30+
issues](https://github.com/vllm-project/vllm-ascend/issues?q=is%3Aissue+commenter%3Alinfeng-yuan+updated%3A%3E2026-03-01)
covering graph mode failures, quantization accuracy regressions, MoE
deployment problems, and multi-node communication issues.

- vLLM version: v0.19.1
- vLLM main:
vllm-project/vllm@4d51588

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
ZhuQi-seu pushed a commit to ZhuQi-seu/vllm-ascend that referenced this pull request May 11, 2026
- ✅ **Review Quality:**
He has completed [50+
reviews](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+reviewed-by%3Alinfeng-yuan)
since April 2025, covering graph mode, MoE, quantization, model support,
and performance-related changes.

In addition to regular review work, he has also participated in complex
feature development and review, such as
[vllm-project#6670](vllm-project#6670) (MoE
MXFP8 quantization), where he helped with A5 MXFP8 integration,
compatibility cleanup, dispatch updates, and implementation fixes.

- ✅ **Sustained Contributions:**
He has [60+ merged
PRs](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+is%3Amerged+author%3Alinfeng-yuan)
since April 2025, with continuous activity across major release cycles.

- ✅ **Quality Contributions:**

  **Torchair Graph Mode & Wide-EP / MoE — Feature Owner (2025 Q2~Q4):**
He was the Feature Owner for DeepSeek high-throughput inference under
torchair graph mode and the Wide-EP project. He drove graph mode
performance optimization
([vllm-project#731](vllm-project#731)), landed
super-kernel fusion for quantized DSR1
([vllm-project#3485](vllm-project#3485)), and
added initial MoE support for Model Runner v2
([vllm-project#7922](vllm-project#7922)).

  **Ascend950 (A5) — Feature Owner:**
He authored the [RFC roadmap
(vllm-project#7157)](vllm-project#7157) for A5
support, landed initial build support
([vllm-project#7151](vllm-project#7151)),
co-authored MXFP8 and MXFP4 quantization support for A5
([vllm-project#6670](vllm-project#6670),
[vllm-project#7877](vllm-project#7877)), and
fixed the MXFP8 scale normalization issue that unblocked A5 quantized
inference
([vllm-project#7573](vllm-project#7573)).

  **DeepSeek Low-Latency & Post-Processing:**
He improved DSv3.2 performance by eliminating HD synchronization
([vllm-project#4805](vllm-project#4805)),
improved rejection sampler performance and eliminated D2H sync in
TopKTopPSampler
([vllm-project#4154](vllm-project#4154)), and
added a penalty-related Triton kernel for sampling performance
([vllm-project#7794](vllm-project#7794)).

- ✅ **Community Involvement:**
He led a 2-part torchair modeling refactor
([vllm-project#2384](vllm-project#2384),
[vllm-project#2459](vllm-project#2459)) and
deleted ~2K lines of redundant DeepSeek modeling code as upstream
absorbed the changes
([vllm-project#2849](vllm-project#2849)). He
also replaced scattered business kwargs with typed request objects
across MoE stage boundaries
([vllm-project#7024](vllm-project#7024)).

Since March 2026, he has taken part in issue triage and user support,
responding to [30+
issues](https://github.com/vllm-project/vllm-ascend/issues?q=is%3Aissue+commenter%3Alinfeng-yuan+updated%3A%3E2026-03-01)
covering graph mode failures, quantization accuracy regressions, MoE
deployment problems, and multi-node communication issues.

- vLLM version: v0.19.1
- vLLM main:
vllm-project/vllm@4d51588

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
Signed-off-by: zhuqi <z00480217@china.huawei.com>
ZhuQi-seu pushed a commit to ZhuQi-seu/vllm-ascend that referenced this pull request May 11, 2026
- ✅ **Review Quality:**
He has completed [50+
reviews](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+reviewed-by%3Alinfeng-yuan)
since April 2025, covering graph mode, MoE, quantization, model support,
and performance-related changes.

In addition to regular review work, he has also participated in complex
feature development and review, such as
[vllm-project#6670](vllm-project#6670) (MoE
MXFP8 quantization), where he helped with A5 MXFP8 integration,
compatibility cleanup, dispatch updates, and implementation fixes.

- ✅ **Sustained Contributions:**
He has [60+ merged
PRs](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+is%3Amerged+author%3Alinfeng-yuan)
since April 2025, with continuous activity across major release cycles.

- ✅ **Quality Contributions:**

  **Torchair Graph Mode & Wide-EP / MoE — Feature Owner (2025 Q2~Q4):**
He was the Feature Owner for DeepSeek high-throughput inference under
torchair graph mode and the Wide-EP project. He drove graph mode
performance optimization
([vllm-project#731](vllm-project#731)), landed
super-kernel fusion for quantized DSR1
([vllm-project#3485](vllm-project#3485)), and
added initial MoE support for Model Runner v2
([vllm-project#7922](vllm-project#7922)).

  **Ascend950 (A5) — Feature Owner:**
He authored the [RFC roadmap
(vllm-project#7157)](vllm-project#7157) for A5
support, landed initial build support
([vllm-project#7151](vllm-project#7151)),
co-authored MXFP8 and MXFP4 quantization support for A5
([vllm-project#6670](vllm-project#6670),
[vllm-project#7877](vllm-project#7877)), and
fixed the MXFP8 scale normalization issue that unblocked A5 quantized
inference
([vllm-project#7573](vllm-project#7573)).

  **DeepSeek Low-Latency & Post-Processing:**
He improved DSv3.2 performance by eliminating HD synchronization
([vllm-project#4805](vllm-project#4805)),
improved rejection sampler performance and eliminated D2H sync in
TopKTopPSampler
([vllm-project#4154](vllm-project#4154)), and
added a penalty-related Triton kernel for sampling performance
([vllm-project#7794](vllm-project#7794)).

- ✅ **Community Involvement:**
He led a 2-part torchair modeling refactor
([vllm-project#2384](vllm-project#2384),
[vllm-project#2459](vllm-project#2459)) and
deleted ~2K lines of redundant DeepSeek modeling code as upstream
absorbed the changes
([vllm-project#2849](vllm-project#2849)). He
also replaced scattered business kwargs with typed request objects
across MoE stage boundaries
([vllm-project#7024](vllm-project#7024)).

Since March 2026, he has taken part in issue triage and user support,
responding to [30+
issues](https://github.com/vllm-project/vllm-ascend/issues?q=is%3Aissue+commenter%3Alinfeng-yuan+updated%3A%3E2026-03-01)
covering graph mode failures, quantization accuracy regressions, MoE
deployment problems, and multi-node communication issues.

- vLLM version: v0.19.1
- vLLM main:
vllm-project/vllm@4d51588

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
Signed-off-by: zhuqi <z00480217@china.huawei.com>
Signed-off-by: ZhuQi-seu <zhuqi12@huawei.com>
ZhuQi-seu pushed a commit to ZhuQi-seu/vllm-ascend that referenced this pull request May 12, 2026
- ✅ **Review Quality:**
He has completed [50+
reviews](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+reviewed-by%3Alinfeng-yuan)
since April 2025, covering graph mode, MoE, quantization, model support,
and performance-related changes.

In addition to regular review work, he has also participated in complex
feature development and review, such as
[vllm-project#6670](vllm-project#6670) (MoE
MXFP8 quantization), where he helped with A5 MXFP8 integration,
compatibility cleanup, dispatch updates, and implementation fixes.

- ✅ **Sustained Contributions:**
He has [60+ merged
PRs](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+is%3Amerged+author%3Alinfeng-yuan)
since April 2025, with continuous activity across major release cycles.

- ✅ **Quality Contributions:**

  **Torchair Graph Mode & Wide-EP / MoE — Feature Owner (2025 Q2~Q4):**
He was the Feature Owner for DeepSeek high-throughput inference under
torchair graph mode and the Wide-EP project. He drove graph mode
performance optimization
([vllm-project#731](vllm-project#731)), landed
super-kernel fusion for quantized DSR1
([vllm-project#3485](vllm-project#3485)), and
added initial MoE support for Model Runner v2
([vllm-project#7922](vllm-project#7922)).

  **Ascend950 (A5) — Feature Owner:**
He authored the [RFC roadmap
(vllm-project#7157)](vllm-project#7157) for A5
support, landed initial build support
([vllm-project#7151](vllm-project#7151)),
co-authored MXFP8 and MXFP4 quantization support for A5
([vllm-project#6670](vllm-project#6670),
[vllm-project#7877](vllm-project#7877)), and
fixed the MXFP8 scale normalization issue that unblocked A5 quantized
inference
([vllm-project#7573](vllm-project#7573)).

  **DeepSeek Low-Latency & Post-Processing:**
He improved DSv3.2 performance by eliminating HD synchronization
([vllm-project#4805](vllm-project#4805)),
improved rejection sampler performance and eliminated D2H sync in
TopKTopPSampler
([vllm-project#4154](vllm-project#4154)), and
added a penalty-related Triton kernel for sampling performance
([vllm-project#7794](vllm-project#7794)).

- ✅ **Community Involvement:**
He led a 2-part torchair modeling refactor
([vllm-project#2384](vllm-project#2384),
[vllm-project#2459](vllm-project#2459)) and
deleted ~2K lines of redundant DeepSeek modeling code as upstream
absorbed the changes
([vllm-project#2849](vllm-project#2849)). He
also replaced scattered business kwargs with typed request objects
across MoE stage boundaries
([vllm-project#7024](vllm-project#7024)).

Since March 2026, he has taken part in issue triage and user support,
responding to [30+
issues](https://github.com/vllm-project/vllm-ascend/issues?q=is%3Aissue+commenter%3Alinfeng-yuan+updated%3A%3E2026-03-01)
covering graph mode failures, quantization accuracy regressions, MoE
deployment problems, and multi-node communication issues.

- vLLM version: v0.19.1
- vLLM main:
vllm-project/vllm@4d51588

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
Signed-off-by: ZhuQi-seu <zhuqi12@huawei.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants