[1/N][refactor] torchair deepseek modeling refactor by linfeng-yuan · Pull Request #2384 · vllm-project/vllm-ascend

linfeng-yuan · 2025-08-14T17:52:04Z

What this PR does / why we need it?

Move torchair related model arch into torchair moduel to make the code clear. Next step we'll remove all torchair related code outside of torchair moduel.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

vLLM version: v0.10.0
vLLM main: vllm-project/vllm@08d5f71

github-actions · 2025-08-14T17:52:12Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

gemini-code-assist

Code Review

This pull request refactors the DeepSeek model implementations to use torchair on Ascend NPUs. This involves adding new torchair-specific model files for DeepSeek V2, V3, and MTP, and registering them with vLLM's model registry. The changes are well-structured and introduce necessary hardware-specific optimizations. I've found one critical issue with the model registration keys that would prevent the new models from being loaded.

wangxiyuan · 2025-08-15T01:14:38Z

            write_kv_cache_bytes_to_file(torch.distributed.get_rank(),
                                         self.new_kv_cache_bytes)
+
+    def load_model(self) -> None:


how about register the model in __init__, so that we don't need to override this func?

how about register the model in __init__, so that we don't need to override this func?

It works. I'll remove this block and register the torchair_models in init func.

codecov · 2025-08-15T04:47:36Z

Codecov Report

❌ Patch coverage is 20.00000% with 4 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (main@1b40665). Learn more about missing BASE report.
⚠️ Report is 1 commits behind head on main.

Files with missing lines	Patch %	Lines
vllm_ascend/torchair/utils.py	20.00%	4 Missing ⚠️

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #2384   +/-   ##
=======================================
  Coverage        ?   76.16%           
=======================================
  Files           ?      120           
  Lines           ?    13537           
  Branches        ?        0           
=======================================
  Hits            ?    10311           
  Misses          ?     3226           
  Partials        ?        0

Flag	Coverage Δ
unittests	`76.16% <20.00%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Signed-off-by: linfeng-yuan <1102311262@qq.com>

wangxiyuan · 2025-08-18T07:01:08Z

merge this to unblock other refactor work. The CI failure related to another known issue.

### What this PR does / why we need it? 1. Similar to #2384 , this PR add a torchair-specific modeling for pangu. 2. Fixes a bug introduced by routed_scaling_factor in #2675 . 3. remove eager test case for pangu since there has already been a torchair test case. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? - vLLM version: v0.10.1.1 - vLLM main: vllm-project/vllm@6997a25 --------- Signed-off-by: zengyanjia <z00883269@china.huawei.com> Signed-off-by: Angazenn <supperccell@163.com> Co-authored-by: zengyanjia <z00883269@china.huawei.com>

…ect#2437) 1. Similar to vllm-project#2384 , this PR add a torchair-specific modeling for pangu. 2. Fixes a bug introduced by routed_scaling_factor in vllm-project#2675 . 3. remove eager test case for pangu since there has already been a torchair test case. No. - vLLM version: v0.10.1.1 - vLLM main: vllm-project/vllm@6997a25 --------- Signed-off-by: Angazenn <supperccell@163.com>

…ect#2437) ### What this PR does / why we need it? 1. Similar to vllm-project#2384 , this PR add a torchair-specific modeling for pangu. 2. Fixes a bug introduced by routed_scaling_factor in vllm-project#2675 . 3. remove eager test case for pangu since there has already been a torchair test case. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? - vLLM version: v0.10.1.1 - vLLM main: vllm-project/vllm@6997a25 --------- Signed-off-by: zengyanjia <z00883269@china.huawei.com> Signed-off-by: Angazenn <supperccell@163.com> Co-authored-by: zengyanjia <z00883269@china.huawei.com> Signed-off-by: offline0806 <z00858301@china.huawei.com>

…ect#2437) ### What this PR does / why we need it? 1. Similar to vllm-project#2384 , this PR add a torchair-specific modeling for pangu. 2. Fixes a bug introduced by routed_scaling_factor in vllm-project#2675 . 3. remove eager test case for pangu since there has already been a torchair test case. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? - vLLM version: v0.10.1.1 - vLLM main: vllm-project/vllm@6997a25 --------- Signed-off-by: zengyanjia <z00883269@china.huawei.com> Signed-off-by: Angazenn <supperccell@163.com> Co-authored-by: zengyanjia <z00883269@china.huawei.com>

### What this PR does / why we need it? Move torchair related model arch into torchair moduel to make the code clear. Next step we'll remove all torchair related code outside of torchair moduel. ### Does this PR introduce _any_ user-facing change? No. - vLLM version: v0.10.0 - vLLM main: vllm-project/vllm@08d5f71 Signed-off-by: linfeng-yuan <1102311262@qq.com>

…ect#2437) ### What this PR does / why we need it? 1. Similar to vllm-project#2384 , this PR add a torchair-specific modeling for pangu. 2. Fixes a bug introduced by routed_scaling_factor in vllm-project#2675 . 3. remove eager test case for pangu since there has already been a torchair test case. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? - vLLM version: v0.10.1.1 - vLLM main: vllm-project/vllm@6997a25 --------- Signed-off-by: zengyanjia <z00883269@china.huawei.com> Signed-off-by: Angazenn <supperccell@163.com> Co-authored-by: zengyanjia <z00883269@china.huawei.com>

### What this PR does / why we need it? Move torchair related model arch into torchair moduel to make the code clear. Next step we'll remove all torchair related code outside of torchair moduel. ### Does this PR introduce _any_ user-facing change? No. - vLLM version: v0.10.0 - vLLM main: vllm-project/vllm@08d5f71 Signed-off-by: linfeng-yuan <1102311262@qq.com>

…ect#2437) ### What this PR does / why we need it? 1. Similar to vllm-project#2384 , this PR add a torchair-specific modeling for pangu. 2. Fixes a bug introduced by routed_scaling_factor in vllm-project#2675 . 3. remove eager test case for pangu since there has already been a torchair test case. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? - vLLM version: v0.10.1.1 - vLLM main: vllm-project/vllm@6997a25 --------- Signed-off-by: zengyanjia <z00883269@china.huawei.com> Signed-off-by: Angazenn <supperccell@163.com> Co-authored-by: zengyanjia <z00883269@china.huawei.com>

### What this PR does / why we need it? Move torchair related model arch into torchair moduel to make the code clear. Next step we'll remove all torchair related code outside of torchair moduel. ### Does this PR introduce _any_ user-facing change? No. - vLLM version: v0.10.0 - vLLM main: vllm-project/vllm@08d5f71 Signed-off-by: linfeng-yuan <1102311262@qq.com>

…ect#2437) ### What this PR does / why we need it? 1. Similar to vllm-project#2384 , this PR add a torchair-specific modeling for pangu. 2. Fixes a bug introduced by routed_scaling_factor in vllm-project#2675 . 3. remove eager test case for pangu since there has already been a torchair test case. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? - vLLM version: v0.10.1.1 - vLLM main: vllm-project/vllm@6997a25 --------- Signed-off-by: zengyanjia <z00883269@china.huawei.com> Signed-off-by: Angazenn <supperccell@163.com> Co-authored-by: zengyanjia <z00883269@china.huawei.com>

### What this PR does / why we need it? Move torchair related model arch into torchair moduel to make the code clear. Next step we'll remove all torchair related code outside of torchair moduel. ### Does this PR introduce _any_ user-facing change? No. - vLLM version: v0.10.0 - vLLM main: vllm-project/vllm@08d5f71 Signed-off-by: linfeng-yuan <1102311262@qq.com>

…ect#2437) ### What this PR does / why we need it? 1. Similar to vllm-project#2384 , this PR add a torchair-specific modeling for pangu. 2. Fixes a bug introduced by routed_scaling_factor in vllm-project#2675 . 3. remove eager test case for pangu since there has already been a torchair test case. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? - vLLM version: v0.10.1.1 - vLLM main: vllm-project/vllm@6997a25 --------- Signed-off-by: zengyanjia <z00883269@china.huawei.com> Signed-off-by: Angazenn <supperccell@163.com> Co-authored-by: zengyanjia <z00883269@china.huawei.com>

- ✅ **Review Quality:** He has completed [50+ reviews](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+reviewed-by%3Alinfeng-yuan) since April 2025, covering graph mode, MoE, quantization, model support, and performance-related changes. In addition to regular review work, he has also participated in complex feature development and review, such as [#6670](#6670) (MoE MXFP8 quantization), where he helped with A5 MXFP8 integration, compatibility cleanup, dispatch updates, and implementation fixes. - ✅ **Sustained Contributions:** He has [60+ merged PRs](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+is%3Amerged+author%3Alinfeng-yuan) since April 2025, with continuous activity across major release cycles. - ✅ **Quality Contributions:** **Torchair Graph Mode & Wide-EP / MoE — Feature Owner (2025 Q2~Q4):** He was the Feature Owner for DeepSeek high-throughput inference under torchair graph mode and the Wide-EP project. He drove graph mode performance optimization ([#731](#731)), landed super-kernel fusion for quantized DSR1 ([#3485](#3485)), and added initial MoE support for Model Runner v2 ([#7922](#7922)). **Ascend950 (A5) — Feature Owner:** He authored the [RFC roadmap (#7157)](#7157) for A5 support, landed initial build support ([#7151](#7151)), co-authored MXFP8 and MXFP4 quantization support for A5 ([#6670](#6670), [#7877](#7877)), and fixed the MXFP8 scale normalization issue that unblocked A5 quantized inference ([#7573](#7573)). **DeepSeek Low-Latency & Post-Processing:** He improved DSv3.2 performance by eliminating HD synchronization ([#4805](#4805)), improved rejection sampler performance and eliminated D2H sync in TopKTopPSampler ([#4154](#4154)), and added a penalty-related Triton kernel for sampling performance ([#7794](#7794)). - ✅ **Community Involvement:** He led a 2-part torchair modeling refactor ([#2384](#2384), [#2459](#2459)) and deleted ~2K lines of redundant DeepSeek modeling code as upstream absorbed the changes ([#2849](#2849)). He also replaced scattered business kwargs with typed request objects across MoE stage boundaries ([#7024](#7024)). Since March 2026, he has taken part in issue triage and user support, responding to [30+ issues](https://github.com/vllm-project/vllm-ascend/issues?q=is%3Aissue+commenter%3Alinfeng-yuan+updated%3A%3E2026-03-01) covering graph mode failures, quantization accuracy regressions, MoE deployment problems, and multi-node communication issues. - vLLM version: v0.19.1 - vLLM main: vllm-project/vllm@4d51588 Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>

- ✅ **Review Quality:** He has completed [50+ reviews](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+reviewed-by%3Alinfeng-yuan) since April 2025, covering graph mode, MoE, quantization, model support, and performance-related changes. In addition to regular review work, he has also participated in complex feature development and review, such as [vllm-project#6670](vllm-project#6670) (MoE MXFP8 quantization), where he helped with A5 MXFP8 integration, compatibility cleanup, dispatch updates, and implementation fixes. - ✅ **Sustained Contributions:** He has [60+ merged PRs](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+is%3Amerged+author%3Alinfeng-yuan) since April 2025, with continuous activity across major release cycles. - ✅ **Quality Contributions:** **Torchair Graph Mode & Wide-EP / MoE — Feature Owner (2025 Q2~Q4):** He was the Feature Owner for DeepSeek high-throughput inference under torchair graph mode and the Wide-EP project. He drove graph mode performance optimization ([vllm-project#731](vllm-project#731)), landed super-kernel fusion for quantized DSR1 ([vllm-project#3485](vllm-project#3485)), and added initial MoE support for Model Runner v2 ([vllm-project#7922](vllm-project#7922)). **Ascend950 (A5) — Feature Owner:** He authored the [RFC roadmap (vllm-project#7157)](vllm-project#7157) for A5 support, landed initial build support ([vllm-project#7151](vllm-project#7151)), co-authored MXFP8 and MXFP4 quantization support for A5 ([vllm-project#6670](vllm-project#6670), [vllm-project#7877](vllm-project#7877)), and fixed the MXFP8 scale normalization issue that unblocked A5 quantized inference ([vllm-project#7573](vllm-project#7573)). **DeepSeek Low-Latency & Post-Processing:** He improved DSv3.2 performance by eliminating HD synchronization ([vllm-project#4805](vllm-project#4805)), improved rejection sampler performance and eliminated D2H sync in TopKTopPSampler ([vllm-project#4154](vllm-project#4154)), and added a penalty-related Triton kernel for sampling performance ([vllm-project#7794](vllm-project#7794)). - ✅ **Community Involvement:** He led a 2-part torchair modeling refactor ([vllm-project#2384](vllm-project#2384), [vllm-project#2459](vllm-project#2459)) and deleted ~2K lines of redundant DeepSeek modeling code as upstream absorbed the changes ([vllm-project#2849](vllm-project#2849)). He also replaced scattered business kwargs with typed request objects across MoE stage boundaries ([vllm-project#7024](vllm-project#7024)). Since March 2026, he has taken part in issue triage and user support, responding to [30+ issues](https://github.com/vllm-project/vllm-ascend/issues?q=is%3Aissue+commenter%3Alinfeng-yuan+updated%3A%3E2026-03-01) covering graph mode failures, quantization accuracy regressions, MoE deployment problems, and multi-node communication issues. - vLLM version: v0.19.1 - vLLM main: vllm-project/vllm@4d51588 Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com> Signed-off-by: yangzhe-2026 <yangzhe@isrc.iscas.ac.cn>

- ✅ **Review Quality:** He has completed [50+ reviews](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+reviewed-by%3Alinfeng-yuan) since April 2025, covering graph mode, MoE, quantization, model support, and performance-related changes. In addition to regular review work, he has also participated in complex feature development and review, such as [vllm-project#6670](vllm-project#6670) (MoE MXFP8 quantization), where he helped with A5 MXFP8 integration, compatibility cleanup, dispatch updates, and implementation fixes. - ✅ **Sustained Contributions:** He has [60+ merged PRs](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+is%3Amerged+author%3Alinfeng-yuan) since April 2025, with continuous activity across major release cycles. - ✅ **Quality Contributions:** **Torchair Graph Mode & Wide-EP / MoE — Feature Owner (2025 Q2~Q4):** He was the Feature Owner for DeepSeek high-throughput inference under torchair graph mode and the Wide-EP project. He drove graph mode performance optimization ([vllm-project#731](vllm-project#731)), landed super-kernel fusion for quantized DSR1 ([vllm-project#3485](vllm-project#3485)), and added initial MoE support for Model Runner v2 ([vllm-project#7922](vllm-project#7922)). **Ascend950 (A5) — Feature Owner:** He authored the [RFC roadmap (vllm-project#7157)](vllm-project#7157) for A5 support, landed initial build support ([vllm-project#7151](vllm-project#7151)), co-authored MXFP8 and MXFP4 quantization support for A5 ([vllm-project#6670](vllm-project#6670), [vllm-project#7877](vllm-project#7877)), and fixed the MXFP8 scale normalization issue that unblocked A5 quantized inference ([vllm-project#7573](vllm-project#7573)). **DeepSeek Low-Latency & Post-Processing:** He improved DSv3.2 performance by eliminating HD synchronization ([vllm-project#4805](vllm-project#4805)), improved rejection sampler performance and eliminated D2H sync in TopKTopPSampler ([vllm-project#4154](vllm-project#4154)), and added a penalty-related Triton kernel for sampling performance ([vllm-project#7794](vllm-project#7794)). - ✅ **Community Involvement:** He led a 2-part torchair modeling refactor ([vllm-project#2384](vllm-project#2384), [vllm-project#2459](vllm-project#2459)) and deleted ~2K lines of redundant DeepSeek modeling code as upstream absorbed the changes ([vllm-project#2849](vllm-project#2849)). He also replaced scattered business kwargs with typed request objects across MoE stage boundaries ([vllm-project#7024](vllm-project#7024)). Since March 2026, he has taken part in issue triage and user support, responding to [30+ issues](https://github.com/vllm-project/vllm-ascend/issues?q=is%3Aissue+commenter%3Alinfeng-yuan+updated%3A%3E2026-03-01) covering graph mode failures, quantization accuracy regressions, MoE deployment problems, and multi-node communication issues. - vLLM version: v0.19.1 - vLLM main: vllm-project/vllm@4d51588 Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>

- ✅ **Review Quality:** He has completed [50+ reviews](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+reviewed-by%3Alinfeng-yuan) since April 2025, covering graph mode, MoE, quantization, model support, and performance-related changes. In addition to regular review work, he has also participated in complex feature development and review, such as [vllm-project#6670](vllm-project#6670) (MoE MXFP8 quantization), where he helped with A5 MXFP8 integration, compatibility cleanup, dispatch updates, and implementation fixes. - ✅ **Sustained Contributions:** He has [60+ merged PRs](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+is%3Amerged+author%3Alinfeng-yuan) since April 2025, with continuous activity across major release cycles. - ✅ **Quality Contributions:** **Torchair Graph Mode & Wide-EP / MoE — Feature Owner (2025 Q2~Q4):** He was the Feature Owner for DeepSeek high-throughput inference under torchair graph mode and the Wide-EP project. He drove graph mode performance optimization ([vllm-project#731](vllm-project#731)), landed super-kernel fusion for quantized DSR1 ([vllm-project#3485](vllm-project#3485)), and added initial MoE support for Model Runner v2 ([vllm-project#7922](vllm-project#7922)). **Ascend950 (A5) — Feature Owner:** He authored the [RFC roadmap (vllm-project#7157)](vllm-project#7157) for A5 support, landed initial build support ([vllm-project#7151](vllm-project#7151)), co-authored MXFP8 and MXFP4 quantization support for A5 ([vllm-project#6670](vllm-project#6670), [vllm-project#7877](vllm-project#7877)), and fixed the MXFP8 scale normalization issue that unblocked A5 quantized inference ([vllm-project#7573](vllm-project#7573)). **DeepSeek Low-Latency & Post-Processing:** He improved DSv3.2 performance by eliminating HD synchronization ([vllm-project#4805](vllm-project#4805)), improved rejection sampler performance and eliminated D2H sync in TopKTopPSampler ([vllm-project#4154](vllm-project#4154)), and added a penalty-related Triton kernel for sampling performance ([vllm-project#7794](vllm-project#7794)). - ✅ **Community Involvement:** He led a 2-part torchair modeling refactor ([vllm-project#2384](vllm-project#2384), [vllm-project#2459](vllm-project#2459)) and deleted ~2K lines of redundant DeepSeek modeling code as upstream absorbed the changes ([vllm-project#2849](vllm-project#2849)). He also replaced scattered business kwargs with typed request objects across MoE stage boundaries ([vllm-project#7024](vllm-project#7024)). Since March 2026, he has taken part in issue triage and user support, responding to [30+ issues](https://github.com/vllm-project/vllm-ascend/issues?q=is%3Aissue+commenter%3Alinfeng-yuan+updated%3A%3E2026-03-01) covering graph mode failures, quantization accuracy regressions, MoE deployment problems, and multi-node communication issues. - vLLM version: v0.19.1 - vLLM main: vllm-project/vllm@4d51588 Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com> Signed-off-by: zhuqi <z00480217@china.huawei.com>

- ✅ **Review Quality:** He has completed [50+ reviews](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+reviewed-by%3Alinfeng-yuan) since April 2025, covering graph mode, MoE, quantization, model support, and performance-related changes. In addition to regular review work, he has also participated in complex feature development and review, such as [vllm-project#6670](vllm-project#6670) (MoE MXFP8 quantization), where he helped with A5 MXFP8 integration, compatibility cleanup, dispatch updates, and implementation fixes. - ✅ **Sustained Contributions:** He has [60+ merged PRs](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+is%3Amerged+author%3Alinfeng-yuan) since April 2025, with continuous activity across major release cycles. - ✅ **Quality Contributions:** **Torchair Graph Mode & Wide-EP / MoE — Feature Owner (2025 Q2~Q4):** He was the Feature Owner for DeepSeek high-throughput inference under torchair graph mode and the Wide-EP project. He drove graph mode performance optimization ([vllm-project#731](vllm-project#731)), landed super-kernel fusion for quantized DSR1 ([vllm-project#3485](vllm-project#3485)), and added initial MoE support for Model Runner v2 ([vllm-project#7922](vllm-project#7922)). **Ascend950 (A5) — Feature Owner:** He authored the [RFC roadmap (vllm-project#7157)](vllm-project#7157) for A5 support, landed initial build support ([vllm-project#7151](vllm-project#7151)), co-authored MXFP8 and MXFP4 quantization support for A5 ([vllm-project#6670](vllm-project#6670), [vllm-project#7877](vllm-project#7877)), and fixed the MXFP8 scale normalization issue that unblocked A5 quantized inference ([vllm-project#7573](vllm-project#7573)). **DeepSeek Low-Latency & Post-Processing:** He improved DSv3.2 performance by eliminating HD synchronization ([vllm-project#4805](vllm-project#4805)), improved rejection sampler performance and eliminated D2H sync in TopKTopPSampler ([vllm-project#4154](vllm-project#4154)), and added a penalty-related Triton kernel for sampling performance ([vllm-project#7794](vllm-project#7794)). - ✅ **Community Involvement:** He led a 2-part torchair modeling refactor ([vllm-project#2384](vllm-project#2384), [vllm-project#2459](vllm-project#2459)) and deleted ~2K lines of redundant DeepSeek modeling code as upstream absorbed the changes ([vllm-project#2849](vllm-project#2849)). He also replaced scattered business kwargs with typed request objects across MoE stage boundaries ([vllm-project#7024](vllm-project#7024)). Since March 2026, he has taken part in issue triage and user support, responding to [30+ issues](https://github.com/vllm-project/vllm-ascend/issues?q=is%3Aissue+commenter%3Alinfeng-yuan+updated%3A%3E2026-03-01) covering graph mode failures, quantization accuracy regressions, MoE deployment problems, and multi-node communication issues. - vLLM version: v0.19.1 - vLLM main: vllm-project/vllm@4d51588 Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com> Signed-off-by: zhuqi <z00480217@china.huawei.com> Signed-off-by: ZhuQi-seu <zhuqi12@huawei.com>

- ✅ **Review Quality:** He has completed [50+ reviews](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+reviewed-by%3Alinfeng-yuan) since April 2025, covering graph mode, MoE, quantization, model support, and performance-related changes. In addition to regular review work, he has also participated in complex feature development and review, such as [vllm-project#6670](vllm-project#6670) (MoE MXFP8 quantization), where he helped with A5 MXFP8 integration, compatibility cleanup, dispatch updates, and implementation fixes. - ✅ **Sustained Contributions:** He has [60+ merged PRs](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+is%3Amerged+author%3Alinfeng-yuan) since April 2025, with continuous activity across major release cycles. - ✅ **Quality Contributions:** **Torchair Graph Mode & Wide-EP / MoE — Feature Owner (2025 Q2~Q4):** He was the Feature Owner for DeepSeek high-throughput inference under torchair graph mode and the Wide-EP project. He drove graph mode performance optimization ([vllm-project#731](vllm-project#731)), landed super-kernel fusion for quantized DSR1 ([vllm-project#3485](vllm-project#3485)), and added initial MoE support for Model Runner v2 ([vllm-project#7922](vllm-project#7922)). **Ascend950 (A5) — Feature Owner:** He authored the [RFC roadmap (vllm-project#7157)](vllm-project#7157) for A5 support, landed initial build support ([vllm-project#7151](vllm-project#7151)), co-authored MXFP8 and MXFP4 quantization support for A5 ([vllm-project#6670](vllm-project#6670), [vllm-project#7877](vllm-project#7877)), and fixed the MXFP8 scale normalization issue that unblocked A5 quantized inference ([vllm-project#7573](vllm-project#7573)). **DeepSeek Low-Latency & Post-Processing:** He improved DSv3.2 performance by eliminating HD synchronization ([vllm-project#4805](vllm-project#4805)), improved rejection sampler performance and eliminated D2H sync in TopKTopPSampler ([vllm-project#4154](vllm-project#4154)), and added a penalty-related Triton kernel for sampling performance ([vllm-project#7794](vllm-project#7794)). - ✅ **Community Involvement:** He led a 2-part torchair modeling refactor ([vllm-project#2384](vllm-project#2384), [vllm-project#2459](vllm-project#2459)) and deleted ~2K lines of redundant DeepSeek modeling code as upstream absorbed the changes ([vllm-project#2849](vllm-project#2849)). He also replaced scattered business kwargs with typed request objects across MoE stage boundaries ([vllm-project#7024](vllm-project#7024)). Since March 2026, he has taken part in issue triage and user support, responding to [30+ issues](https://github.com/vllm-project/vllm-ascend/issues?q=is%3Aissue+commenter%3Alinfeng-yuan+updated%3A%3E2026-03-01) covering graph mode failures, quantization accuracy regressions, MoE deployment problems, and multi-node communication issues. - vLLM version: v0.19.1 - vLLM main: vllm-project/vllm@4d51588 Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com> Signed-off-by: ZhuQi-seu <zhuqi12@huawei.com>

gemini-code-assist Bot reviewed Aug 14, 2025

View reviewed changes

Comment thread vllm_ascend/torchair/utils.py Outdated

linfeng-yuan force-pushed the torhcair_deepseek_modeling_refactor_00 branch 2 times, most recently from 05c11db to 23dedb2 Compare August 14, 2025 18:34

wangxiyuan reviewed Aug 15, 2025

View reviewed changes

linfeng-yuan force-pushed the torhcair_deepseek_modeling_refactor_00 branch 4 times, most recently from 7832e60 to 39fc507 Compare August 15, 2025 04:11

linfeng-yuan force-pushed the torhcair_deepseek_modeling_refactor_00 branch from 39fc507 to 415720b Compare August 18, 2025 05:51

github-actions Bot added the module:tests label Aug 18, 2025

linfeng-yuan force-pushed the torhcair_deepseek_modeling_refactor_00 branch from 415720b to 0832aa5 Compare August 18, 2025 06:21

[1/N][refactor] torchair deepseek modeling refactor

eb439e0

Signed-off-by: linfeng-yuan <1102311262@qq.com>

linfeng-yuan force-pushed the torhcair_deepseek_modeling_refactor_00 branch from 0832aa5 to eb439e0 Compare August 18, 2025 06:28

wangxiyuan approved these changes Aug 18, 2025

View reviewed changes

wangxiyuan merged commit 3fc31ee into vllm-project:main Aug 18, 2025
17 of 19 checks passed

Angazenn mentioned this pull request Sep 2, 2025

[1/N][Draft][Refactor]torchair pangu_moe modeling refactor #2437

Merged

Yikun mentioned this pull request Sep 20, 2025

[Bug]: Remove outofdate commits to improve perf test #3051

Closed

wangxiyuan mentioned this pull request May 9, 2026

[Community] Nominate new maintainer @linfeng-yuan #8996

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[1/N][refactor] torchair deepseek modeling refactor#2384

[1/N][refactor] torchair deepseek modeling refactor#2384
wangxiyuan merged 1 commit into
vllm-project:mainfrom
linfeng-yuan:torhcair_deepseek_modeling_refactor_00

linfeng-yuan commented Aug 14, 2025 •

edited by wangxiyuan

Loading

Uh oh!

github-actions Bot commented Aug 14, 2025

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

wangxiyuan Aug 15, 2025

Uh oh!

linfeng-yuan Aug 15, 2025

Uh oh!

codecov Bot commented Aug 15, 2025 •

edited

Loading

Uh oh!

Uh oh!

wangxiyuan commented Aug 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

linfeng-yuan commented Aug 14, 2025 • edited by wangxiyuan Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

github-actions Bot commented Aug 14, 2025

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

wangxiyuan Aug 15, 2025

Choose a reason for hiding this comment

Uh oh!

linfeng-yuan Aug 15, 2025

Choose a reason for hiding this comment

Uh oh!

codecov Bot commented Aug 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

wangxiyuan commented Aug 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

linfeng-yuan commented Aug 14, 2025 •

edited by wangxiyuan

Loading

codecov Bot commented Aug 15, 2025 •

edited

Loading