test: add nan/inf clamp regression test for fused_topk_bias#40553
Conversation
Add test_fused_topk_bias_nan_inf_clamp to cover the same NaN/Inf scenario as test_fused_topk_nan_inf_clamp but for the fused_topk_bias entry point. On CUDA, fused_topk_bias routes through the same topk_softmax/topk_sigmoid kernels fixed in vllm-project#39391 (lines 131/154 of topk_softmax_kernels.cu), so the clamp already applies. Closes vllm-project#40457 Signed-off-by: Jhao-Ting Chen <jhaotingc@nvidia.com>
|
@claude review |
|
cc @vadiklyutiy |
There was a problem hiding this comment.
Code Review
This pull request adds a regression test for the fused_topk_bias kernel to ensure that NaN or Inf values in gating logits do not lead to duplicate expert IDs or non-finite weights when expert score correction bias is present. I have no feedback to provide as there were no review comments.
|
I think we also should add this test to CI. I don't see the tests name or corresponding dir in |
Verified the test is automatically collected by the existing Kernels MoE Test CI step (no .buildkite/ changes needed): Does this suffice? I'm not familiar with VLLM CI. Thanks. |
Oops, missed it. All good. |
…ject#40553) Signed-off-by: Jhao-Ting Chen <jhaotingc@nvidia.com> Co-authored-by: hongbolv <33214277+hongbolv@users.noreply.github.com>
…ject#40553) Signed-off-by: Jhao-Ting Chen <jhaotingc@nvidia.com>
…ject#40553) Signed-off-by: Jhao-Ting Chen <jhaotingc@nvidia.com> Signed-off-by: Yifan <yzong@redhat.com>
…ject#40553) Signed-off-by: Jhao-Ting Chen <jhaotingc@nvidia.com> Signed-off-by: Avinash Singh <avinashsingh.rcoem@gmail.com>
…ject#40553) Signed-off-by: Jhao-Ting Chen <jhaotingc@nvidia.com> Signed-off-by: Adrian <info@zzit.ch>
…ject#40553) Signed-off-by: Jhao-Ting Chen <jhaotingc@nvidia.com> Co-authored-by: hongbolv <33214277+hongbolv@users.noreply.github.com>
…ject#40553) Signed-off-by: Jhao-Ting Chen <jhaotingc@nvidia.com>
…ject#40553) Signed-off-by: Jhao-Ting Chen <jhaotingc@nvidia.com>
…ject#40553) Signed-off-by: Jhao-Ting Chen <jhaotingc@nvidia.com>
…ject#40553) Signed-off-by: Jhao-Ting Chen <jhaotingc@nvidia.com>
…ject#40553) Signed-off-by: Jhao-Ting Chen <jhaotingc@nvidia.com>
Add test_fused_topk_bias_nan_inf_clamp to cover the same NaN/Inf scenario as test_fused_topk_nan_inf_clamp but for the fused_topk_bias entry point. On CUDA, fused_topk_bias routes through the same topk_softmax/topk_sigmoid kernels fixed in #39391 (lines 131/154 of topk_softmax_kernels.cu), so the clamp already applies.
Closes #40457
Purpose
PR #39391 fixed NaN/Inf gating logits producing duplicate expert IDs in
fused_topkby clamping scores to 0 intopk_softmax_kernels.cu(lines 131/154 for the warp-level path, line 457 for the fallback path). However,the regression test added in that PR only covered
fused_topk. This PR adds the same nan/inf regression test forfused_topk_bias, which is used by DeepSeek-style models withe_score_correction_bias.On CUDA,
fused_topk_biasroutes through the sametopk_softmax/topk_sigmoidC++ kernels that were patched in #39391, so the clamp already applies. This PR confirms that coverage with an explicit test.Test Plan
test_fused_topk_bias_nan_inf_clamptotests/kernels/moe/test_fused_topk.py, parametrized over:dtype: bfloat16, float16, float32scoring_func: softmax, sigmoidbad_value: NaN, Infnum_experts: 6, 8, 16topk: 3, 4Kernels MoE Test CI step (no
.buildkite/changes needed):test_fused_topk.pysuite (720 tests) to check for regressions.Test Result
nan/inf regression tests (144 cases)
Full test suite (720 cases, 2×H200, CUDA 13.0)
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.