[TRITON][TEST]Add nightly test for triton split_qkv_rmsnorm_rope#5267
[TRITON][TEST]Add nightly test for triton split_qkv_rmsnorm_rope#5267zzzzwwjj merged 4 commits intovllm-project:mainfrom
Conversation
Signed-off-by: Angazenn <supperccell@163.com>
There was a problem hiding this comment.
Code Review
This pull request adds a nightly test for the Triton split_rmsnorm_rope operator. While the intention is good, the new test file contains several critical bugs that will prevent it from running correctly. These include missing imports, incorrect logic in the reference implementation, incorrect variable calculations, and numerous undefined variables and typos. I have provided detailed comments and code suggestions to fix these issues. Addressing them will make the tests functional and provide valuable coverage for the operator.
| if norm_bias is not None: | ||
| norm_bias = norm_bias.to(torch.float32).cpu().numpy() | ||
| reciprocal_std = 1 / np.sqrt(np.mean(input**2, axis=-1, keepdims=True) + eps) | ||
| out = input * reciprocal_std * norm_weight + norm_bias |
There was a problem hiding this comment.
If norm_bias is None, this line will raise a TypeError because you cannot add None to a numpy array. You should handle the case where norm_bias is None.
| out = input * reciprocal_std * norm_weight + norm_bias | |
| out = input * reciprocal_std * norm_weight + (norm_bias if norm_bias is not None else 0.0) |
| init_device_properties_triton() | ||
|
|
||
| q_hidden_size = num_q_heads * head_size | ||
| kv_hidden_size = num_kv_heads * head_size * 2 |
There was a problem hiding this comment.
The kv_hidden_size is incorrectly calculated. It should be the product of num_kv_heads and head_size. The multiplication by 2 is incorrect here, as the total size of k and v tensors is accounted for later when creating the qkv tensor (kv_hidden_size * 2).
| kv_hidden_size = num_kv_heads * head_size * 2 | |
| kv_hidden_size = num_kv_heads * head_size |
| _q = _q.reshape(bsz, 1, -1, head_size) | ||
| _k = _k.reshape(bsz, 1, -1, head_size) | ||
|
|
||
| # rope | ||
| q_gold, k_gold = custom_rope(_q, _k, sin, cos) | ||
| q_gold = cus_q.reshape(bsz, -1) | ||
| k_gold = cus_k.reshape(bsz, -1) | ||
|
|
||
| # Compare the results. | ||
| torch.testing.assert_close(q.to(torch.float32), | ||
| q_gold, | ||
| atol=DEFAULT_ATOL, | ||
| rtol=DEFAULT_RTOL) | ||
|
|
||
| torch.testing.assert_close(k.to(torch.float32), | ||
| k_gold, | ||
| atol=DEFAULT_ATOL, | ||
| rtol=DEFAULT_RTOL) | ||
|
|
||
| torch.testing.assert_close(v.to(torch.float32), | ||
| v_gold, | ||
| atol=DEFAULT_ATOL, | ||
| rtol=DEFAULT_RTOL) |
There was a problem hiding this comment.
This block of code contains several undefined variables that will cause the test to fail:
bszis not defined. It should benum_tokens.cus_qandcus_kare not defined. These are likely typos forq_goldandk_gold.v_goldis not defined. The reference tensor forvshould be_vfrom the split operation.
The call to custom_rope also needs to be updated to pass head_size as suggested in another comment.
_q = _q.reshape(num_tokens, 1, -1, head_size)
_k = _k.reshape(num_tokens, 1, -1, head_size)
# rope
q_gold, k_gold = custom_rope(_q, _k, sin, cos, head_size)
q_gold = q_gold.reshape(num_tokens, -1)
k_gold = k_gold.reshape(num_tokens, -1)
# Compare the results.
torch.testing.assert_close(q.to(torch.float32),
q_gold,
atol=DEFAULT_ATOL,
rtol=DEFAULT_RTOL)
torch.testing.assert_close(k.to(torch.float32),
k_gold,
atol=DEFAULT_ATOL,
rtol=DEFAULT_RTOL)
torch.testing.assert_close(v.to(torch.float32),
_v.to(torch.float32),
atol=DEFAULT_ATOL,
rtol=DEFAULT_RTOL)| init_device_properties_triton() | ||
|
|
||
| q_hidden_size = num_q_heads * head_size | ||
| kv_hidden_size = num_kv_heads * head_size * 2 |
There was a problem hiding this comment.
The kv_hidden_size is incorrectly calculated. It should be the product of num_kv_heads and head_size. The multiplication by 2 is incorrect here, as the total size of k and v tensors is accounted for later when creating the qkv tensor (kv_hidden_size * 2).
| kv_hidden_size = num_kv_heads * head_size * 2 | |
| kv_hidden_size = num_kv_heads * head_size |
| _q = _q.reshape(bsz, 1, -1, head_size) | ||
| _k = _k.reshape(bsz, 1, -1, head_size) | ||
|
|
||
| # rope | ||
| q_gold, k_gold = custom_rope(_q, _k, sin, cos) | ||
| q_gold = cus_q.reshape(bsz, -1) | ||
| k_gold = cus_k.reshape(bsz, -1) | ||
|
|
||
| # Compare the results. | ||
| torch.testing.assert_close(q.to(torch.float32), | ||
| q_gold, | ||
| atol=DEFAULT_ATOL, | ||
| rtol=DEFAULT_RTOL) | ||
|
|
||
| torch.testing.assert_close(k.to(torch.float32), | ||
| k_gold, | ||
| atol=DEFAULT_ATOL, | ||
| rtol=DEFAULT_RTOL) | ||
|
|
||
| torch.testing.assert_close(v.to(torch.float32), | ||
| v_gold, | ||
| atol=DEFAULT_ATOL, | ||
| rtol=DEFAULT_RTOL) |
There was a problem hiding this comment.
This block of code contains several undefined variables that will cause the test to fail:
bszis not defined. It should benum_tokens.cus_qandcus_kare not defined. These are likely typos forq_goldandk_gold.v_goldis not defined. The reference tensor forvshould be_vfrom the split operation.
The call to custom_rope also needs to be updated to pass head_size as suggested in another comment.
_q = _q.reshape(num_tokens, 1, -1, head_size)
_k = _k.reshape(num_tokens, 1, -1, head_size)
# rope
q_gold, k_gold = custom_rope(_q, _k, sin, cos, head_size)
q_gold = q_gold.reshape(num_tokens, -1)
k_gold = k_gold.reshape(num_tokens, -1)
# Compare the results.
torch.testing.assert_close(q.to(torch.float32),
q_gold,
atol=DEFAULT_ATOL,
rtol=DEFAULT_RTOL)
torch.testing.assert_close(k.to(torch.float32),
k_gold,
atol=DEFAULT_ATOL,
rtol=DEFAULT_RTOL)
torch.testing.assert_close(v.to(torch.float32),
_v.to(torch.float32),
atol=DEFAULT_ATOL,
rtol=DEFAULT_RTOL)|
👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:
If CI fails, you can run linting and testing checks locally according Contributing and Testing. |
|
parallel test in #5320 . |
…to FIA_rebase * 'main' of https://github.com/vllm-project/vllm-ascend: (58 commits) [Main2Main] Upgrade vllm commit to 0106 (vllm-project#5617) [CI]update bisheng version (vllm-project#5621) [UT][PCP&DCP] UT for block_table.py (vllm-project#5032) [Main2Main] Upgrade vllm commit to 0105 (vllm-project#5595) [CI] mv ops to correct path (vllm-project#5615) [BugFix] Fix Smoke Testing Bug for DSR1 longseq (vllm-project#5613) Revert "[Feat] enable hierarchical mc2 ops on A2 by default (vllm-project#5545)" (vllm-project#5611) [TRITON][TEST]Add nightly test for triton split_qkv_rmsnorm_rope (vllm-project#5267) [perf] Fix MLAPO weight disposal for KV-consumer MLA in PD-mix deploy... (vllm-project#5192) [docs] Correct image about prefill phase of PCP (vllm-project#5598) [CI] update triton-ascend version (vllm-project#5584) [P/D]Remove mooncake kvpool unused parameter `local_hostname` (vllm-project#5574) [Bugfix] record cos and sin cache in AscendRotaryEmbedding (vllm-project#5516) [bugfix] fix test_camem failed with triton-ascend (vllm-project#5492) [UT]add triton ops ut : test_fused_qkvzba_split_reshape_cat (vllm-project#5474) [CI] Download models from ms (vllm-project#5405) Docs: Add A3 Docker image guidance for Atlas A3 machines (vllm-project#5256) [Doc] Add NNAL installation guide and requirements (vllm-project#5235) Add the requirement of arctic-inference which speculative decoding with suffix_decode (vllm-project#5045) [BugFix][Fusion] Fix graph fusion failure problem (vllm-project#5253) ...
…m-project#5267) ### What this PR does / why we need it? Add nightly test for triton split_rmsnorm_rope ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: release/v0.13.0 - vLLM main: vllm-project/vllm@ad32e3e --------- Signed-off-by: Angazenn <supperccell@163.com>
…m-project#5267) ### What this PR does / why we need it? Add nightly test for triton split_rmsnorm_rope ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: release/v0.13.0 - vLLM main: vllm-project/vllm@ad32e3e --------- Signed-off-by: Angazenn <supperccell@163.com>
…m-project#5267) ### What this PR does / why we need it? Add nightly test for triton split_rmsnorm_rope ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: release/v0.13.0 - vLLM main: vllm-project/vllm@ad32e3e --------- Signed-off-by: Angazenn <supperccell@163.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
…m-project#5267) ### What this PR does / why we need it? Add nightly test for triton split_rmsnorm_rope ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: release/v0.13.0 - vLLM main: vllm-project/vllm@ad32e3e --------- Signed-off-by: Angazenn <supperccell@163.com>
…m-project#5267) ### What this PR does / why we need it? Add nightly test for triton split_rmsnorm_rope ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: release/v0.13.0 - vLLM main: vllm-project/vllm@ad32e3e --------- Signed-off-by: Angazenn <supperccell@163.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
…m-project#5267) ### What this PR does / why we need it? Add nightly test for triton split_rmsnorm_rope ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: release/v0.13.0 - vLLM main: vllm-project/vllm@ad32e3e --------- Signed-off-by: Angazenn <supperccell@163.com>
What this PR does / why we need it?
Add nightly test for triton split_rmsnorm_rope
Does this PR introduce any user-facing change?
How was this patch tested?