[Feat][UT] Support Deepseekv32 FULL_DECODE_ONLY mode and add unit test of sfa_v1#3763
Conversation
There was a problem hiding this comment.
Code Review
This pull request adds support for DeepSeek v3.2 in FULL_DECODE_ONLY mode and includes a new end-to-end test to verify this functionality. Additionally, it introduces a comprehensive suite of unit tests for the sfa_v1 attention mechanism. The changes appear to be well-structured and align with the PR's objectives. However, I've identified a critical issue in the newly added unit tests that needs to be addressed.
| self.assertEqual(self.impl.W_UK_T.shape[0], self.impl.num_heads) | ||
| self.assertEqual(self.impl.W_UK_T.shape[1], self.impl.qk_nope_head_dim) | ||
| self.assertEqual(self.impl.W_UK_T.shape[2], self.impl.kv_lora_rank) | ||
|
|
||
| self.assertEqual(self.impl.W_UV.shape[0], self.impl.num_heads) | ||
| self.assertEqual(self.impl.W_UV.shape[1], self.impl.kv_lora_rank) | ||
| self.assertEqual(self.impl.W_UV.shape[2], self.impl.v_head_dim) |
There was a problem hiding this comment.
The test asserts attributes W_UK_T and W_UV on the self.impl object. However, looking at the implementation of process_weights_after_loading in vllm_ascend/attention/sfa_v1.py, the attributes being set are kv_b_proj_w_k and kv_b_proj_w_v. This discrepancy will cause the test to fail. The test should be updated to assert against the correct attribute names.
| self.assertEqual(self.impl.W_UK_T.shape[0], self.impl.num_heads) | |
| self.assertEqual(self.impl.W_UK_T.shape[1], self.impl.qk_nope_head_dim) | |
| self.assertEqual(self.impl.W_UK_T.shape[2], self.impl.kv_lora_rank) | |
| self.assertEqual(self.impl.W_UV.shape[0], self.impl.num_heads) | |
| self.assertEqual(self.impl.W_UV.shape[1], self.impl.kv_lora_rank) | |
| self.assertEqual(self.impl.W_UV.shape[2], self.impl.v_head_dim) | |
| self.assertEqual(self.impl.kv_b_proj_w_k.shape[0], self.impl.num_heads) | |
| self.assertEqual(self.impl.kv_b_proj_w_k.shape[1], self.impl.qk_nope_head_dim) | |
| self.assertEqual(self.impl.kv_b_proj_w_k.shape[2], self.impl.kv_lora_rank) | |
| self.assertEqual(self.impl.kv_b_proj_w_v.shape[0], self.impl.num_heads) | |
| self.assertEqual(self.impl.kv_b_proj_w_v.shape[1], self.impl.kv_lora_rank) | |
| self.assertEqual(self.impl.kv_b_proj_w_v.shape[2], self.impl.v_head_dim) |
|
👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:
If CI fails, you can run linting and testing checks locally according Contributing and Testing. |
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
07d98a4 to
adb45ed
Compare
c1e3831 to
48863c6
Compare
| if forward_context.cudagraph_runtime_mode == CUDAGraphMode.FULL \ | ||
| and not self.ascend_config.use_sfa: |
There was a problem hiding this comment.
Why skip SFA if we are to support DeepSeek-v3.2 in FULL_DECODE_ONLY?
There was a problem hiding this comment.
The tiling update of the SFA has already been delegated to the device, so manual updates are no longer required.
48863c6 to
38eb602
Compare
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
38eb602 to
1777c6e
Compare
|
There will be a big refactor to |
Thank you for the reminder. If your PR gets merged, I’ll update my PR accordingly. |
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
1777c6e to
363ca90
Compare
eb36228 to
6aa77f9
Compare
Signed-off-by: 1Fire4 <wangdingyi2@huawei.com>
2b7f5fc to
72ac33a
Compare
Signed-off-by: 1Fire4 <wangdingyi2@huawei.com>
72ac33a to
bfb20ab
Compare
…t of sfa_v1 (vllm-project#3763) ### What this PR does / why we need it? - Add support for DeepSeek v3.2 in FULL_DECODE_ONLY mode. - Add unit test for sfa_v1. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? - vLLM version: v0.11.0 - vLLM main: vllm-project/vllm@83f478b --------- Signed-off-by: 1Fire4 <wangdingyi2@huawei.com> Signed-off-by: luolun <luolun1995@cmbchina.com>
…t of sfa_v1 (vllm-project#3763) ### What this PR does / why we need it? - Add support for DeepSeek v3.2 in FULL_DECODE_ONLY mode. - Add unit test for sfa_v1. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? - vLLM version: v0.11.0 - vLLM main: vllm-project/vllm@83f478b --------- Signed-off-by: 1Fire4 <wangdingyi2@huawei.com> Signed-off-by: hwhaokun <haokun0405@163.com>
…t of sfa_v1 (vllm-project#3763) ### What this PR does / why we need it? - Add support for DeepSeek v3.2 in FULL_DECODE_ONLY mode. - Add unit test for sfa_v1. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? - vLLM version: v0.11.0 - vLLM main: vllm-project/vllm@83f478b --------- Signed-off-by: 1Fire4 <wangdingyi2@huawei.com> Signed-off-by: nsdie <yeyifan@huawei.com>
…t of sfa_v1 (vllm-project#3763) ### What this PR does / why we need it? - Add support for DeepSeek v3.2 in FULL_DECODE_ONLY mode. - Add unit test for sfa_v1. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? - vLLM version: v0.11.0 - vLLM main: vllm-project/vllm@83f478b --------- Signed-off-by: 1Fire4 <wangdingyi2@huawei.com>
What this PR does / why we need it?
Does this PR introduce any user-facing change?
No
How was this patch tested?