Skip to content

[Feat][UT] Support Deepseekv32 FULL_DECODE_ONLY mode and add unit test of sfa_v1#3763

Merged
yiz-liu merged 2 commits intovllm-project:mainfrom
1Fire4:deepseekv32_fullgraph
Nov 3, 2025
Merged

[Feat][UT] Support Deepseekv32 FULL_DECODE_ONLY mode and add unit test of sfa_v1#3763
yiz-liu merged 2 commits intovllm-project:mainfrom
1Fire4:deepseekv32_fullgraph

Conversation

@1Fire4
Copy link
Copy Markdown
Contributor

@1Fire4 1Fire4 commented Oct 25, 2025

What this PR does / why we need it?

  • Add support for DeepSeek v3.2 in FULL_DECODE_ONLY mode.
  • Add unit test for sfa_v1.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds support for DeepSeek v3.2 in FULL_DECODE_ONLY mode and includes a new end-to-end test to verify this functionality. Additionally, it introduces a comprehensive suite of unit tests for the sfa_v1 attention mechanism. The changes appear to be well-structured and align with the PR's objectives. However, I've identified a critical issue in the newly added unit tests that needs to be addressed.

Comment thread tests/ut/attention/test_sfa_v1.py Outdated
Comment on lines +375 to +381
self.assertEqual(self.impl.W_UK_T.shape[0], self.impl.num_heads)
self.assertEqual(self.impl.W_UK_T.shape[1], self.impl.qk_nope_head_dim)
self.assertEqual(self.impl.W_UK_T.shape[2], self.impl.kv_lora_rank)

self.assertEqual(self.impl.W_UV.shape[0], self.impl.num_heads)
self.assertEqual(self.impl.W_UV.shape[1], self.impl.kv_lora_rank)
self.assertEqual(self.impl.W_UV.shape[2], self.impl.v_head_dim)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The test asserts attributes W_UK_T and W_UV on the self.impl object. However, looking at the implementation of process_weights_after_loading in vllm_ascend/attention/sfa_v1.py, the attributes being set are kv_b_proj_w_k and kv_b_proj_w_v. This discrepancy will cause the test to fail. The test should be updated to assert against the correct attribute names.

Suggested change
self.assertEqual(self.impl.W_UK_T.shape[0], self.impl.num_heads)
self.assertEqual(self.impl.W_UK_T.shape[1], self.impl.qk_nope_head_dim)
self.assertEqual(self.impl.W_UK_T.shape[2], self.impl.kv_lora_rank)
self.assertEqual(self.impl.W_UV.shape[0], self.impl.num_heads)
self.assertEqual(self.impl.W_UV.shape[1], self.impl.kv_lora_rank)
self.assertEqual(self.impl.W_UV.shape[2], self.impl.v_head_dim)
self.assertEqual(self.impl.kv_b_proj_w_k.shape[0], self.impl.num_heads)
self.assertEqual(self.impl.kv_b_proj_w_k.shape[1], self.impl.qk_nope_head_dim)
self.assertEqual(self.impl.kv_b_proj_w_k.shape[2], self.impl.kv_lora_rank)
self.assertEqual(self.impl.kv_b_proj_w_v.shape[0], self.impl.num_heads)
self.assertEqual(self.impl.kv_b_proj_w_v.shape[1], self.impl.kv_lora_rank)
self.assertEqual(self.impl.kv_b_proj_w_v.shape[2], self.impl.v_head_dim)

@github-actions
Copy link
Copy Markdown
Contributor

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

  • A PR should do only one thing, smaller PRs enable faster reviews.
  • Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
  • Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

@github-actions
Copy link
Copy Markdown
Contributor

This pull request has conflicts, please resolve those before we can evaluate the pull request.

@1Fire4 1Fire4 force-pushed the deepseekv32_fullgraph branch 3 times, most recently from 07d98a4 to adb45ed Compare October 27, 2025 12:02
@1Fire4 1Fire4 force-pushed the deepseekv32_fullgraph branch 4 times, most recently from c1e3831 to 48863c6 Compare October 28, 2025 07:23
Comment thread vllm_ascend/worker/model_runner_v1.py Outdated
Comment on lines +1780 to +1781
if forward_context.cudagraph_runtime_mode == CUDAGraphMode.FULL \
and not self.ascend_config.use_sfa:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why skip SFA if we are to support DeepSeek-v3.2 in FULL_DECODE_ONLY?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tiling update of the SFA has already been delegated to the device, so manual updates are no longer required.

@yiz-liu yiz-liu added ready read for review ready-for-test start test by label for PR labels Oct 28, 2025
@1Fire4 1Fire4 force-pushed the deepseekv32_fullgraph branch from 48863c6 to 38eb602 Compare October 28, 2025 12:59
@github-actions
Copy link
Copy Markdown
Contributor

This pull request has conflicts, please resolve those before we can evaluate the pull request.

@1Fire4 1Fire4 force-pushed the deepseekv32_fullgraph branch from 38eb602 to 1777c6e Compare October 28, 2025 13:29
@1Fire4 1Fire4 changed the title support Deepseekv32 FULL_DECODE_ONLY mode and add unit test of sfa_v1 [Fea] [UT] support Deepseekv32 FULL_DECODE_ONLY mode and add unit test of sfa_v1 Oct 30, 2025
@whx-sjtu
Copy link
Copy Markdown
Collaborator

whx-sjtu commented Oct 30, 2025

There will be a big refactor to sfa_v1 and deepseek v3.2 today. But I think this refactor will actually lighten up your work. Please refer to PR #3769 for more information.

@1Fire4
Copy link
Copy Markdown
Contributor Author

1Fire4 commented Oct 30, 2025

There will be a big refactor to sfa_v1 and deepseek v3.2 today. But I think this refactor will actually lighten up your work. Please refer to PR #3769 for more information.

Thank you for the reminder. If your PR gets merged, I’ll update my PR accordingly.

@github-actions
Copy link
Copy Markdown
Contributor

This pull request has conflicts, please resolve those before we can evaluate the pull request.

@1Fire4 1Fire4 force-pushed the deepseekv32_fullgraph branch from 1777c6e to 363ca90 Compare October 30, 2025 12:43
@1Fire4 1Fire4 force-pushed the deepseekv32_fullgraph branch 2 times, most recently from eb36228 to 6aa77f9 Compare October 31, 2025 01:32
Signed-off-by: 1Fire4 <wangdingyi2@huawei.com>
@1Fire4 1Fire4 force-pushed the deepseekv32_fullgraph branch 4 times, most recently from 2b7f5fc to 72ac33a Compare October 31, 2025 04:04
Signed-off-by: 1Fire4 <wangdingyi2@huawei.com>
@1Fire4 1Fire4 force-pushed the deepseekv32_fullgraph branch from 72ac33a to bfb20ab Compare October 31, 2025 06:12
@yiz-liu yiz-liu changed the title [Fea] [UT] support Deepseekv32 FULL_DECODE_ONLY mode and add unit test of sfa_v1 [Feat][UT] Support Deepseekv32 FULL_DECODE_ONLY mode and add unit test of sfa_v1 Nov 3, 2025
@yiz-liu yiz-liu merged commit 0b9b6d7 into vllm-project:main Nov 3, 2025
24 checks passed
luolun pushed a commit to luolun/vllm-ascend that referenced this pull request Nov 19, 2025
…t of sfa_v1 (vllm-project#3763)

### What this PR does / why we need it?
- Add support for DeepSeek v3.2 in FULL_DECODE_ONLY mode.
- Add unit test for sfa_v1.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?


- vLLM version: v0.11.0
- vLLM main:
vllm-project/vllm@83f478b

---------

Signed-off-by: 1Fire4 <wangdingyi2@huawei.com>
Signed-off-by: luolun <luolun1995@cmbchina.com>
hwhaokun pushed a commit to hwhaokun/vllm-ascend that referenced this pull request Nov 19, 2025
…t of sfa_v1 (vllm-project#3763)

### What this PR does / why we need it?
- Add support for DeepSeek v3.2 in FULL_DECODE_ONLY mode.
- Add unit test for sfa_v1.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?

- vLLM version: v0.11.0
- vLLM main:
vllm-project/vllm@83f478b

---------

Signed-off-by: 1Fire4 <wangdingyi2@huawei.com>
Signed-off-by: hwhaokun <haokun0405@163.com>
NSDie pushed a commit to NSDie/vllm-ascend that referenced this pull request Nov 24, 2025
…t of sfa_v1 (vllm-project#3763)

### What this PR does / why we need it?
- Add support for DeepSeek v3.2 in FULL_DECODE_ONLY mode.
- Add unit test for sfa_v1.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?

- vLLM version: v0.11.0
- vLLM main:
vllm-project/vllm@83f478b

---------

Signed-off-by: 1Fire4 <wangdingyi2@huawei.com>
Signed-off-by: nsdie <yeyifan@huawei.com>
Clorist33 pushed a commit to Clorist33/vllm-ascend that referenced this pull request Dec 10, 2025
…t of sfa_v1 (vllm-project#3763)

### What this PR does / why we need it?
- Add support for DeepSeek v3.2 in FULL_DECODE_ONLY mode.
- Add unit test for sfa_v1.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?


- vLLM version: v0.11.0
- vLLM main:
vllm-project/vllm@83f478b

---------

Signed-off-by: 1Fire4 <wangdingyi2@huawei.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

module:tests ready read for review ready-for-test start test by label for PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants