Skip to content

[Feature] Support kv nz feature for DeepSeek decode node in disagg-prefill scenario#3072

Merged
jianzs merged 7 commits intovllm-project:mainfrom
jianzs:gh-kvnz-wo-torchair
Dec 31, 2025
Merged

[Feature] Support kv nz feature for DeepSeek decode node in disagg-prefill scenario#3072
jianzs merged 7 commits intovllm-project:mainfrom
jianzs:gh-kvnz-wo-torchair

Conversation

@jianzs
Copy link
Copy Markdown
Collaborator

@jianzs jianzs commented Sep 21, 2025

What this PR does / why we need it?

By converting the KV cache from ND to NZ format when the decode node receives it, this PR ensures that the KV NZ feature works correctly during the decoding phase in disagg-prefill scenario.

Does this PR introduce any user-facing change?

Add enable_kv_nz configuration option in additional_config.

How was this patch tested?

CI pass.

@github-actions
Copy link
Copy Markdown
Contributor

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

  • A PR should do only one thing, smaller PRs enable faster reviews.
  • Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
  • Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the enable_kv_nz configuration by moving it from TorchairGraphConfig to the higher-level AscendConfig. This change successfully removes the restriction that kv nz can only be used when torchair is enabled. The implementation looks correct and consistent across the modified files. However, a significant issue is that the unit tests in tests/ut/test_ascend_config.py have not been updated to reflect this refactoring, which could lead to a broken test suite and potential regressions. It is crucial to update these tests to align with the new configuration structure and behavior.

Comment thread vllm_ascend/ascend_config.py Outdated
ascend_scheduler_config)
# Todo: Once https://github.com/vllm-project/vllm/issues/22246 is merged in vllm. Remove this config

self.enable_kv_nz = additional_config.get("enable_kv_nz", False)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

While moving enable_kv_nz to AscendConfig is a good refactoring to generalize its usage, the corresponding unit tests in tests/ut/test_ascend_config.py appear to be outdated. The tests still reference torchair_graph_config.enable_kv_nz and check for behavior that was removed (i.e., that enable_kv_nz is only valid with torchair). Please update the tests to reflect these changes to ensure correctness and prevent future regressions.

@github-actions github-actions bot added documentation Improvements or additions to documentation module:tests labels Sep 21, 2025
@jianzs jianzs force-pushed the gh-kvnz-wo-torchair branch from 7281669 to ba1c909 Compare September 21, 2025 13:44
@jianzs jianzs marked this pull request as draft September 21, 2025 15:56
@github-actions
Copy link
Copy Markdown
Contributor

This pull request has conflicts, please resolve those before we can evaluate the pull request.

@jianzs jianzs force-pushed the gh-kvnz-wo-torchair branch from 05ffcbd to 4cdd047 Compare November 11, 2025 15:37
@jianzs jianzs changed the title [Feature] Remove restriction on kv nz usage without torchair [Feature] Support kv nz feature for DeepSeek decode node in disagg-prefill scenario Nov 11, 2025
@jianzs jianzs marked this pull request as ready for review November 11, 2025 15:48
@jianzs jianzs force-pushed the gh-kvnz-wo-torchair branch from 70d23de to 10bd91d Compare November 11, 2025 15:49
@zzzzwwjj
Copy link
Copy Markdown
Collaborator

Please test the case without torchair and post the performance data, thks!

@github-actions
Copy link
Copy Markdown
Contributor

This pull request has conflicts, please resolve those before we can evaluate the pull request.

@jianzs jianzs force-pushed the gh-kvnz-wo-torchair branch from 244da13 to eabdc89 Compare December 11, 2025 15:47
@jianzs jianzs force-pushed the gh-kvnz-wo-torchair branch 2 times, most recently from 24041aa to 51c6cc5 Compare December 12, 2025 02:59
@github-actions
Copy link
Copy Markdown
Contributor

This pull request has conflicts, please resolve those before we can evaluate the pull request.

1 similar comment
@github-actions
Copy link
Copy Markdown
Contributor

This pull request has conflicts, please resolve those before we can evaluate the pull request.

@jianzs jianzs force-pushed the gh-kvnz-wo-torchair branch from 51c6cc5 to ff0acc5 Compare December 13, 2025 13:24
@jianzs jianzs added ready read for review ready-for-test start test by label for PR labels Dec 13, 2025
@jianzs jianzs force-pushed the gh-kvnz-wo-torchair branch 2 times, most recently from 07132ce to 6783787 Compare December 30, 2025 07:24
@github-actions
Copy link
Copy Markdown
Contributor

This pull request has conflicts, please resolve those before we can evaluate the pull request.

jianzs and others added 7 commits December 30, 2025 20:53
Co-authored-by: ghphotoframe <854746559@qq.com>
Co-authored-by: alex101-ops <alex1015718386@gmail.com>
Signed-off-by: Jade Zheng <zheng.shoujian@outlook.com>
Signed-off-by: Jade Zheng <zheng.shoujian@outlook.com>
Signed-off-by: Jade Zheng <zheng.shoujian@outlook.com>
Signed-off-by: Jade Zheng <zheng.shoujian@outlook.com>
Signed-off-by: Jade Zheng <zheng.shoujian@outlook.com>
Signed-off-by: Jade Zheng <zheng.shoujian@outlook.com>
Signed-off-by: Jade Zheng <zheng.shoujian@outlook.com>
@jianzs jianzs force-pushed the gh-kvnz-wo-torchair branch from d541b16 to 1e1ac42 Compare December 30, 2025 12:54
@jianzs jianzs merged commit 38570cf into vllm-project:main Dec 31, 2025
19 checks passed
wjunLu pushed a commit to wjunLu/vllm-ascend that referenced this pull request Jan 4, 2026
…efill scenario (vllm-project#3072)

By converting the KV cache from ND to NZ format when the decode node
receives it, this PR ensures that the KV NZ feature works correctly
during the decoding phase in disagg-prefill scenario.

- vLLM version: v0.11.0
- vLLM main:
vllm-project/vllm@83f478b

---------

Signed-off-by: Jade Zheng <zheng.shoujian@outlook.com>
Co-authored-by: ghphotoframe <854746559@qq.com>
Co-authored-by: alex101-ops <alex1015718386@gmail.com>
Signed-off-by: wjunLu <wjunlu217@gmail.com>
Rozwel-dx pushed a commit to Rozwel-dx/vllm-ascend that referenced this pull request Jan 8, 2026
…efill scenario (vllm-project#3072)

By converting the KV cache from ND to NZ format when the decode node
receives it, this PR ensures that the KV NZ feature works correctly
during the decoding phase in disagg-prefill scenario.

- vLLM version: v0.11.0
- vLLM main:
vllm-project/vllm@83f478b

---------

Signed-off-by: Jade Zheng <zheng.shoujian@outlook.com>
Co-authored-by: ghphotoframe <854746559@qq.com>
Co-authored-by: alex101-ops <alex1015718386@gmail.com>
wangxiyuan pushed a commit that referenced this pull request Feb 3, 2026
…s_nq.py (#6505)

### What this PR does / why we need it?
Remove kv_cache nz test case for test_mla_preprocess_nq.py. This case is
added by #3072 but has
not been tested on bf16 scenario. Results show that this is not
currently supported.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
CI passed with existing test.


- vLLM version: v0.15.0
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.15.0

Signed-off-by: whx-sjtu <2952154980@qq.com>
chenchuw886 pushed a commit to chenchuw886/vllm-ascend that referenced this pull request Feb 12, 2026
…s_nq.py (vllm-project#6505)

### What this PR does / why we need it?
Remove kv_cache nz test case for test_mla_preprocess_nq.py. This case is
added by vllm-project#3072 but has
not been tested on bf16 scenario. Results show that this is not
currently supported.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
CI passed with existing test.

- vLLM version: v0.15.0
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.15.0

Signed-off-by: whx-sjtu <2952154980@qq.com>
Signed-off-by: momochenchuw <chenchuw@huawei.com>
ZRJ026 pushed a commit to ZRJ026/vllm-ascend that referenced this pull request Feb 28, 2026
…efill scenario (vllm-project#3072)

By converting the KV cache from ND to NZ format when the decode node
receives it, this PR ensures that the KV NZ feature works correctly
during the decoding phase in disagg-prefill scenario.

- vLLM version: v0.11.0
- vLLM main:
vllm-project/vllm@83f478b

---------

Signed-off-by: Jade Zheng <zheng.shoujian@outlook.com>
Co-authored-by: ghphotoframe <854746559@qq.com>
Co-authored-by: alex101-ops <alex1015718386@gmail.com>
Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
ZRJ026 pushed a commit to ZRJ026/vllm-ascend that referenced this pull request Feb 28, 2026
…s_nq.py (vllm-project#6505)

### What this PR does / why we need it?
Remove kv_cache nz test case for test_mla_preprocess_nq.py. This case is
added by vllm-project#3072 but has
not been tested on bf16 scenario. Results show that this is not
currently supported.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
CI passed with existing test.

- vLLM version: v0.15.0
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.15.0

Signed-off-by: whx-sjtu <2952154980@qq.com>
Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
maoxx241 pushed a commit to maoxx241/vllm-ascend that referenced this pull request Mar 2, 2026
…efill scenario (vllm-project#3072)

By converting the KV cache from ND to NZ format when the decode node
receives it, this PR ensures that the KV NZ feature works correctly
during the decoding phase in disagg-prefill scenario.

- vLLM version: v0.11.0
- vLLM main:
vllm-project/vllm@83f478b

---------

Signed-off-by: Jade Zheng <zheng.shoujian@outlook.com>
Co-authored-by: ghphotoframe <854746559@qq.com>
Co-authored-by: alex101-ops <alex1015718386@gmail.com>
maoxx241 pushed a commit to maoxx241/vllm-ascend that referenced this pull request Mar 2, 2026
…s_nq.py (vllm-project#6505)

### What this PR does / why we need it?
Remove kv_cache nz test case for test_mla_preprocess_nq.py. This case is
added by vllm-project#3072 but has
not been tested on bf16 scenario. Results show that this is not
currently supported.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
CI passed with existing test.


- vLLM version: v0.15.0
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.15.0

Signed-off-by: whx-sjtu <2952154980@qq.com>
ZRJ026 pushed a commit to ZRJ026/vllm-ascend that referenced this pull request Mar 4, 2026
…efill scenario (vllm-project#3072)

By converting the KV cache from ND to NZ format when the decode node
receives it, this PR ensures that the KV NZ feature works correctly
during the decoding phase in disagg-prefill scenario.

- vLLM version: v0.11.0
- vLLM main:
vllm-project/vllm@83f478b

---------

Signed-off-by: Jade Zheng <zheng.shoujian@outlook.com>
Co-authored-by: ghphotoframe <854746559@qq.com>
Co-authored-by: alex101-ops <alex1015718386@gmail.com>
Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
ZRJ026 pushed a commit to ZRJ026/vllm-ascend that referenced this pull request Mar 4, 2026
…s_nq.py (vllm-project#6505)

### What this PR does / why we need it?
Remove kv_cache nz test case for test_mla_preprocess_nq.py. This case is
added by vllm-project#3072 but has
not been tested on bf16 scenario. Results show that this is not
currently supported.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
CI passed with existing test.

- vLLM version: v0.15.0
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.15.0

Signed-off-by: whx-sjtu <2952154980@qq.com>
Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
LCAIZJ pushed a commit to LCAIZJ/vllm-ascend that referenced this pull request Mar 7, 2026
…s_nq.py (vllm-project#6505)

### What this PR does / why we need it?
Remove kv_cache nz test case for test_mla_preprocess_nq.py. This case is
added by vllm-project#3072 but has
not been tested on bf16 scenario. Results show that this is not
currently supported.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
CI passed with existing test.


- vLLM version: v0.15.0
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.15.0

Signed-off-by: whx-sjtu <2952154980@qq.com>
jiangyunfan1 pushed a commit to jiangyunfan1/vllm-ascend that referenced this pull request Apr 9, 2026
…s_nq.py (vllm-project#6505)

### What this PR does / why we need it?
Remove kv_cache nz test case for test_mla_preprocess_nq.py. This case is
added by vllm-project#3072 but has
not been tested on bf16 scenario. Results show that this is not
currently supported.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
CI passed with existing test.


- vLLM version: v0.15.0
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.15.0

Signed-off-by: whx-sjtu <2952154980@qq.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation module:core module:tests ready read for review ready-for-test start test by label for PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants