Skip to content

[Bugfix] Fix unstable silu_mul+nvfp4 quant fusion test#24370

Merged
ProExpertProg merged 2 commits intovllm-project:mainfrom
elvischenv:elvischenv/fix-unstable-silu-nvfp4-quant-test
Sep 6, 2025
Merged

[Bugfix] Fix unstable silu_mul+nvfp4 quant fusion test#24370
ProExpertProg merged 2 commits intovllm-project:mainfrom
elvischenv:elvischenv/fix-unstable-silu-nvfp4-quant-test

Conversation

@elvischenv
Copy link
Contributor

@elvischenv elvischenv commented Sep 6, 2025

Purpose

After the silu_mul + quant fusion enabled on CI, I observed the fusion with nvfp4 quant test is unstable:
Passed:
https://buildkite.com/vllm/ci/builds/29637/steps/canvas?sid=01991d7c-263a-4e93-b62a-00096c5d7489
Failed:
https://buildkite.com/vllm/ci/builds/29627/steps/canvas?sid=01991ce7-509c-4c7d-9b45-a893b15243e7

        # Check that it gives the same answer
>       torch.testing.assert_close(result[0].to(dtype=torch.float16),
                                   result2[0].to(dtype=torch.float16),
                                   atol=1e-3,
                                   rtol=1e-3)
E       AssertionError: Tensor-likes are not close!
E       
E       Mismatched elements: 107 / 128 (83.6%)
E       Greatest absolute difference: 0.205078125 at index (25,) (up to 0.001 allowed)
E       Greatest relative difference: 3.34375 at index (30,) (up to 0.001 allowed)
tests/compile/test_silu_mul_quant_fusion.py:134: AssertionError

The reason is in the unit test we created the nvfp4 tensor/scale randomly, this may break the accuracy. This PR use a more consistent way to create the initial nvfp4 tensor like the way in nvfp4 matmul test:

a_dtype = torch.randn((m, k), dtype=dtype, device=device)
b_dtype = torch.randn((n, k), dtype=dtype, device=device)
a_global_scale = ((FLOAT8_E4M3_MAX * FLOAT4_E2M1_MAX) /
torch.amax(a_dtype.flatten(), dim=-1)).to(torch.float32)
b_global_scale = ((FLOAT8_E4M3_MAX * FLOAT4_E2M1_MAX) /
torch.amax(b_dtype.flatten(), dim=-1)).to(torch.float32)
alpha = 1. / (a_global_scale * b_global_scale)
# ops.scaled_fp4_quant returns swizzled scales, while weights
# from checkpoints are in linear scales.
a_fp4, a_scale_interleaved = ops.scaled_fp4_quant(a_dtype, a_global_scale)
b_fp4, b_scale_interleaved = ops.scaled_fp4_quant(b_dtype, b_global_scale)
# get_ref_results unswizzles the scales internally.
expected_out = get_ref_results(a_fp4, b_fp4, a_scale_interleaved,
b_scale_interleaved, a_global_scale,
b_global_scale, m, n, dtype, block_size,
device)
out = ops.cutlass_scaled_fp4_mm(a_fp4, b_fp4, a_scale_interleaved,
b_scale_interleaved, alpha, dtype)
torch.testing.assert_close(out,
expected_out.to(dtype=dtype),
atol=1e-1,
rtol=1e-1)

Test Plan && Test Result

tests/compile/test_silu_mul_quant_fusion.py::test_fusion_silu_and_mul_quant[False-TestSiluMulNvfp4QuantModel-128-64]
Run the test 500 times:
main:

======= 23 failed, 477 passed, 6 warnings in 259.20s (0:04:19) =====

PR:

======= 500 passed, 6 warnings in 182.94s (0:03:02) ======

Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses an unstable silu_mul + nvfp4 quantization fusion test by improving the test data generation. Previously, random quantized tensors and scales were created independently, leading to potential accuracy issues and test flakiness. The new approach generates a floating-point tensor first and then quantizes it, ensuring consistency between the data and its scales. This is a solid improvement that makes the test more reliable. My review includes one high-severity suggestion to make a new utility function more robust by correctly handling tensors with negative values.

@elvischenv elvischenv force-pushed the elvischenv/fix-unstable-silu-nvfp4-quant-test branch 4 times, most recently from aba2c24 to 315f94e Compare September 6, 2025 16:51
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
@elvischenv elvischenv force-pushed the elvischenv/fix-unstable-silu-nvfp4-quant-test branch 4 times, most recently from bce6557 to af7edda Compare September 6, 2025 17:32
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
@elvischenv elvischenv force-pushed the elvischenv/fix-unstable-silu-nvfp4-quant-test branch from cf37285 to c56e5c3 Compare September 6, 2025 17:59
Comment on lines +104 to +106
"model_class",
cast(list[type], [TestSiluMulFp8QuantModel, TestSiluMulNvfp4QuantModel]
if is_nvfp4_supported() else [TestSiluMulFp8QuantModel]))
Copy link
Contributor Author

@elvischenv elvischenv Sep 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To fix the mypy issue:
https://github.com/vllm-project/vllm/actions/runs/17517499414/job/49756911227?pr=24370

Error: tests/compile/test_silu_mul_quant_fusion.py:102: error: List item 0 has incompatible type "type[TestSiluMulFp8QuantModel]"; expected "type[object]"  [list-item]
Error: tests/compile/test_silu_mul_quant_fusion.py:102: error: List item 1 has incompatible type "type[TestSiluMulNvfp4QuantModel]"; expected "type[object]"  [list-item]
Error: tests/compile/test_silu_mul_quant_fusion.py:103: error: List item 0 has incompatible type "type[TestSiluMulFp8QuantModel]"; expected "type[object]"  [list-item]

@ProExpertProg ProExpertProg enabled auto-merge (squash) September 6, 2025 19:13
@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Sep 6, 2025
@ProExpertProg ProExpertProg merged commit e68dc2f into vllm-project:main Sep 6, 2025
29 checks passed
@elvischenv elvischenv deleted the elvischenv/fix-unstable-silu-nvfp4-quant-test branch September 8, 2025 13:12
eicherseiji pushed a commit to eicherseiji/vllm that referenced this pull request Sep 9, 2025
…24370)

Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
skyloevil pushed a commit to skyloevil/vllm that referenced this pull request Sep 13, 2025
…24370)

Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
FeiDaLI pushed a commit to FeiDaLI/vllm that referenced this pull request Sep 25, 2025
…24370)

Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants