[TEST]test e2e for split_qkv_rmsnorm_rope by Angazenn · Pull Request #5320 · vllm-project/vllm-ascend

Angazenn · 2025-12-24T07:17:18Z

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

vLLM version: release/v0.13.0
vLLM main: vllm-project/vllm@5fbfa8d

Signed-off-by: Angazenn <supperccell@163.com>

gemini-code-assist

Code Review

This pull request introduces end-to-end tests for the split_qkv_rmsnorm_rope custom operator, covering cases with and without bias. The change to make bias arguments optional in the operator implementation is correct. However, the new test file contains two nearly identical test functions, one for the case with bias and one without. This significant code duplication can be avoided by merging them into a single, parameterized test function, which would improve maintainability. I've provided a suggestion to refactor the tests accordingly.

gemini-code-assist · 2025-12-24T07:19:17Z

+@pytest.mark.parametrize("num_tokens", NUM_TOKENS)
+@pytest.mark.parametrize("num_q_heads, num_kv_heads", NUM_QKV_HEADS)
+@pytest.mark.parametrize("head_size", HEAD_SIZES)
+@pytest.mark.parametrize("eps", EPS)
+@pytest.mark.parametrize("dtype", DTYPES)
+@pytest.mark.parametrize("seed", SEEDS)
+@pytest.mark.parametrize("device", DEVICES)
+@torch.inference_mode()
+def test_split_qkv_rmsnorm_rope(num_tokens, num_q_heads, num_kv_heads,
+                                head_size, eps, dtype, seed, device):
+    torch.manual_seed(seed)
+    torch.set_default_device(device)
+    init_device_properties_triton()
+
+    q_hidden_size = num_q_heads * head_size
+    kv_hidden_size = num_kv_heads * head_size
+    qkv = torch.randn(num_tokens,
+                      q_hidden_size + kv_hidden_size * 2,
+                      dtype=dtype,
+                      device=device)
+    q_weight = torch.randn(head_size, dtype=dtype, device=device)
+    k_weight = torch.randn(head_size, dtype=dtype, device=device)
+    sin = torch.from_numpy(
+        np.random.uniform(0, 1,
+                          [num_tokens, 1, 1, head_size])).to(dtype).npu()
+    cos = torch.from_numpy(
+        np.random.uniform(0, 1,
+                          [num_tokens, 1, 1, head_size])).to(dtype).npu()
+    # fused kernel
+    q, k, v = torch.ops.vllm.qkv_rmsnorm_rope(input=qkv,
+                                              q_weight=q_weight,
+                                              k_weight=k_weight,
+                                              q_hidden_size=q_hidden_size,
+                                              kv_hidden_size=kv_hidden_size,
+                                              head_dim=head_size,
+                                              eps=eps,
+                                              cos=cos,
+                                              sin=sin)
+
+    # split
+    _q, _k, v_gold = qkv.cpu().split(
+        [q_hidden_size, kv_hidden_size, kv_hidden_size], dim=-1)
+    # norm
+    _q = rms_norm(_q.reshape(-1, head_size), q_weight.cpu(), eps)
+    _k = rms_norm(_k.reshape(-1, head_size), k_weight.cpu(), eps)
+    _q = _q.reshape(num_tokens, 1, -1, head_size)
+    _k = _k.reshape(num_tokens, 1, -1, head_size)
+
+    # rope
+    q_gold, k_gold = custom_rope(_q, _k, sin.cpu(), cos.cpu())
+    q_gold = q_gold.reshape(num_tokens, -1)
+    k_gold = k_gold.reshape(num_tokens, -1)
+
+    # Compare the results.
+    torch.testing.assert_close(q.to(torch.float32).cpu(),
+                               q_gold,
+                               atol=DEFAULT_ATOL,
+                               rtol=DEFAULT_RTOL)
+
+    torch.testing.assert_close(k.to(torch.float32).cpu(),
+                               k_gold,
+                               atol=DEFAULT_ATOL,
+                               rtol=DEFAULT_RTOL)
+
+    torch.testing.assert_close(v.to(torch.float32).cpu(),
+                               v_gold.to(torch.float32),
+                               atol=DEFAULT_ATOL,
+                               rtol=DEFAULT_RTOL)
+
+    gc.collect()
+    torch.npu.empty_cache()
+    torch.npu.reset_peak_memory_stats()
+
+
+@pytest.mark.parametrize("num_tokens", NUM_TOKENS)
+@pytest.mark.parametrize("num_q_heads, num_kv_heads", NUM_QKV_HEADS)
+@pytest.mark.parametrize("head_size", HEAD_SIZES)
+@pytest.mark.parametrize("eps", EPS)
+@pytest.mark.parametrize("dtype", DTYPES)
+@pytest.mark.parametrize("seed", SEEDS)
+@pytest.mark.parametrize("device", DEVICES)
+@torch.inference_mode()
+def test_split_qkv_rmsnorm_rope_with_bias(num_tokens, num_q_heads,
+                                          num_kv_heads, head_size, eps, dtype,
+                                          seed, device):
+    torch.manual_seed(seed)
+    torch.set_default_device(device)
+    init_device_properties_triton()
+
+    q_hidden_size = num_q_heads * head_size
+    kv_hidden_size = num_kv_heads * head_size
+    qkv = torch.randn(num_tokens,
+                      q_hidden_size + kv_hidden_size * 2,
+                      dtype=dtype,
+                      device=device)
+    q_weight = torch.randn(head_size, dtype=dtype, device=device)
+    k_weight = torch.randn(head_size, dtype=dtype, device=device)
+    q_bias = torch.randn(head_size, dtype=dtype, device=device)
+    k_bias = torch.randn(head_size, dtype=dtype, device=device)
+    sin = torch.from_numpy(
+        np.random.uniform(0, 1,
+                          [num_tokens, 1, 1, head_size])).to(dtype).npu()
+    cos = torch.from_numpy(
+        np.random.uniform(0, 1,
+                          [num_tokens, 1, 1, head_size])).to(dtype).npu()
+    # fused kernel
+    q, k, v = torch.ops.vllm.qkv_rmsnorm_rope(input=qkv,
+                                              q_weight=q_weight,
+                                              k_weight=k_weight,
+                                              q_hidden_size=q_hidden_size,
+                                              kv_hidden_size=kv_hidden_size,
+                                              head_dim=head_size,
+                                              eps=eps,
+                                              q_bias=q_bias,
+                                              k_bias=k_bias,
+                                              cos=cos,
+                                              sin=sin)
+
+    # split
+    _q, _k, v_gold = qkv.cpu().split(
+        [q_hidden_size, kv_hidden_size, kv_hidden_size], dim=-1)
+    # norm
+    _q = rms_norm(_q.reshape(-1, head_size),
+                  q_weight.cpu(),
+                  eps,
+                  norm_bias=q_bias.cpu())
+    _k = rms_norm(_k.reshape(-1, head_size),
+                  k_weight.cpu(),
+                  eps,
+                  norm_bias=k_bias.cpu())
+    _q = _q.reshape(num_tokens, 1, -1, head_size)
+    _k = _k.reshape(num_tokens, 1, -1, head_size)
+
+    # rope
+    q_gold, k_gold = custom_rope(_q, _k, sin.cpu(), cos.cpu())
+    q_gold = q_gold.reshape(num_tokens, -1)
+    k_gold = k_gold.reshape(num_tokens, -1)
+
+    # Compare the results.
+    torch.testing.assert_close(q.to(torch.float32).cpu(),
+                               q_gold,
+                               atol=DEFAULT_ATOL,
+                               rtol=DEFAULT_RTOL)
+
+    torch.testing.assert_close(k.to(torch.float32).cpu(),
+                               k_gold,
+                               atol=DEFAULT_ATOL,
+                               rtol=DEFAULT_RTOL)
+
+    torch.testing.assert_close(v.to(torch.float32).cpu(),
+                               v_gold.to(torch.float32),
+                               atol=DEFAULT_ATOL,
+                               rtol=DEFAULT_RTOL)
+
+    gc.collect()
+    torch.npu.empty_cache()
+    torch.npu.reset_peak_memory_stats()


The two test functions test_split_qkv_rmsnorm_rope and test_split_qkv_rmsnorm_rope_with_bias are largely identical, leading to significant code duplication. This makes the tests harder to read and maintain, as any future changes would need to be applied in two places.

To improve maintainability, these can be consolidated into a single test function parameterized by a with_bias boolean flag. This will remove over 80 lines of duplicated code and make the test logic clearer.

@pytest.mark.parametrize("with_bias", [False, True]) @pytest.mark.parametrize("num_tokens", NUM_TOKENS) @pytest.mark.parametrize("num_q_heads, num_kv_heads", NUM_QKV_HEADS) @pytest.mark.parametrize("head_size", HEAD_SIZES) @pytest.mark.parametrize("eps", EPS) @pytest.mark.parametrize("dtype", DTYPES) @pytest.mark.parametrize("seed", SEEDS) @pytest.mark.parametrize("device", DEVICES) @torch.inference_mode() def test_split_qkv_rmsnorm_rope(with_bias, num_tokens, num_q_heads, num_kv_heads, head_size, eps, dtype, seed, device): torch.manual_seed(seed) torch.set_default_device(device) init_device_properties_triton() q_hidden_size = num_q_heads * head_size kv_hidden_size = num_kv_heads * head_size qkv = torch.randn(num_tokens, q_hidden_size + kv_hidden_size * 2, dtype=dtype, device=device) q_weight = torch.randn(head_size, dtype=dtype, device=device) k_weight = torch.randn(head_size, dtype=dtype, device=device) sin = torch.from_numpy( np.random.uniform(0, 1, [num_tokens, 1, 1, head_size])).to(dtype).npu() cos = torch.from_numpy( np.random.uniform(0, 1, [num_tokens, 1, 1, head_size])).to(dtype).npu() q_bias, k_bias = None, None norm_q_bias, norm_k_bias = None, None if with_bias: q_bias = torch.randn(head_size, dtype=dtype, device=device) k_bias = torch.randn(head_size, dtype=dtype, device=device) norm_q_bias = q_bias.cpu() norm_k_bias = k_bias.cpu() # fused kernel q, k, v = torch.ops.vllm.qkv_rmsnorm_rope(input=qkv, q_weight=q_weight, k_weight=k_weight, q_hidden_size=q_hidden_size, kv_hidden_size=kv_hidden_size, head_dim=head_size, eps=eps, q_bias=q_bias, k_bias=k_bias, cos=cos, sin=sin) # split _q, _k, v_gold = qkv.cpu().split( [q_hidden_size, kv_hidden_size, kv_hidden_size], dim=-1) # norm _q = rms_norm(_q.reshape(-1, head_size), q_weight.cpu(), eps, norm_bias=norm_q_bias) _k = rms_norm(_k.reshape(-1, head_size), k_weight.cpu(), eps, norm_bias=norm_k_bias) _q = _q.reshape(num_tokens, 1, -1, head_size) _k = _k.reshape(num_tokens, 1, -1, head_size) # rope q_gold, k_gold = custom_rope(_q, _k, sin.cpu(), cos.cpu()) q_gold = q_gold.reshape(num_tokens, -1) k_gold = k_gold.reshape(num_tokens, -1) # Compare the results. torch.testing.assert_close(q.to(torch.float32).cpu(), q_gold, atol=DEFAULT_ATOL, rtol=DEFAULT_RTOL) torch.testing.assert_close(k.to(torch.float32).cpu(), k_gold, atol=DEFAULT_ATOL, rtol=DEFAULT_RTOL) torch.testing.assert_close(v.to(torch.float32).cpu(), v_gold.to(torch.float32), atol=DEFAULT_ATOL, rtol=DEFAULT_RTOL) gc.collect() torch.npu.empty_cache() torch.npu.reset_peak_memory_stats()

github-actions · 2025-12-24T10:12:41Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

github-actions · 2025-12-24T22:12:33Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

wangxiyuan · 2026-01-05T11:51:02Z

Any progress? If this PR is still alive, please rebase to main and make CI happy, otherwise you can close it. Thanks

Angazenn · 2026-01-05T13:39:40Z

merged by #5267 .

test e2e

75ffc32

Signed-off-by: Angazenn <supperccell@163.com>

Angazenn added ready read for review ready-for-test start test by label for PR labels Dec 24, 2025

Angazenn mentioned this pull request Dec 24, 2025

[TRITON][TEST]Add nightly test for triton split_qkv_rmsnorm_rope #5267

Merged

gemini-code-assist Bot reviewed Dec 24, 2025

View reviewed changes

github-actions Bot added ci/build module:tests module:ops labels Dec 24, 2025

github-actions Bot added the merge-conflicts label Dec 24, 2025

Angazenn closed this Jan 5, 2026

Angazenn deleted the triton_e2e branch February 4, 2026 06:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[TEST]test e2e for split_qkv_rmsnorm_rope#5320

[TEST]test e2e for split_qkv_rmsnorm_rope#5320
Angazenn wants to merge 1 commit intovllm-project:mainfrom
Angazenn:triton_e2e

Angazenn commented Dec 24, 2025 •

edited by github-actions Bot

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Dec 24, 2025

Uh oh!

github-actions Bot commented Dec 24, 2025

Uh oh!

github-actions Bot commented Dec 24, 2025

Uh oh!

wangxiyuan commented Jan 5, 2026

Uh oh!

Angazenn commented Jan 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Angazenn commented Dec 24, 2025 • edited by github-actions Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Dec 24, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Dec 24, 2025

Uh oh!

github-actions Bot commented Dec 24, 2025

Uh oh!

wangxiyuan commented Jan 5, 2026

Uh oh!

Angazenn commented Jan 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Angazenn commented Dec 24, 2025 •

edited by github-actions Bot

Loading