[Bugfix] Fix unstable silu_mul+nvfp4 quant fusion test by elvischenv · Pull Request #24370 · vllm-project/vllm

elvischenv · 2025-09-06T15:55:28Z

Purpose

After the silu_mul + quant fusion enabled on CI, I observed the fusion with nvfp4 quant test is unstable:
Passed:
https://buildkite.com/vllm/ci/builds/29637/steps/canvas?sid=01991d7c-263a-4e93-b62a-00096c5d7489
Failed:
https://buildkite.com/vllm/ci/builds/29627/steps/canvas?sid=01991ce7-509c-4c7d-9b45-a893b15243e7

        # Check that it gives the same answer
>       torch.testing.assert_close(result[0].to(dtype=torch.float16),
                                   result2[0].to(dtype=torch.float16),
                                   atol=1e-3,
                                   rtol=1e-3)
E       AssertionError: Tensor-likes are not close!
E       
E       Mismatched elements: 107 / 128 (83.6%)
E       Greatest absolute difference: 0.205078125 at index (25,) (up to 0.001 allowed)
E       Greatest relative difference: 3.34375 at index (30,) (up to 0.001 allowed)
tests/compile/test_silu_mul_quant_fusion.py:134: AssertionError

The reason is in the unit test we created the nvfp4 tensor/scale randomly, this may break the accuracy. This PR use a more consistent way to create the initial nvfp4 tensor like the way in nvfp4 matmul test:

vllm/tests/kernels/quantization/test_nvfp4_scaled_mm.py

Lines 60 to 84 in 0eadaef

    
           a_dtype = torch.randn((m, k), dtype=dtype, device=device) 
        
           b_dtype = torch.randn((n, k), dtype=dtype, device=device) 
        
           a_global_scale = ((FLOAT8_E4M3_MAX * FLOAT4_E2M1_MAX) / 
        
                             torch.amax(a_dtype.flatten(), dim=-1)).to(torch.float32) 
        
           b_global_scale = ((FLOAT8_E4M3_MAX * FLOAT4_E2M1_MAX) / 
        
                             torch.amax(b_dtype.flatten(), dim=-1)).to(torch.float32) 
        
           alpha = 1. / (a_global_scale * b_global_scale) 
        
           # ops.scaled_fp4_quant returns swizzled scales, while weights 
        
           # from checkpoints are in linear scales. 
        
           a_fp4, a_scale_interleaved = ops.scaled_fp4_quant(a_dtype, a_global_scale) 
        
           b_fp4, b_scale_interleaved = ops.scaled_fp4_quant(b_dtype, b_global_scale) 
        
           # get_ref_results unswizzles the scales internally. 
        
           expected_out = get_ref_results(a_fp4, b_fp4, a_scale_interleaved, 
        
                                          b_scale_interleaved, a_global_scale, 
        
                                          b_global_scale, m, n, dtype, block_size, 
        
                                          device) 
        
           out = ops.cutlass_scaled_fp4_mm(a_fp4, b_fp4, a_scale_interleaved, 
        
                                           b_scale_interleaved, alpha, dtype) 
        
           torch.testing.assert_close(out, 
        
                                      expected_out.to(dtype=dtype), 
        
                                      atol=1e-1, 
        
                                      rtol=1e-1)

Test Plan && Test Result

tests/compile/test_silu_mul_quant_fusion.py::test_fusion_silu_and_mul_quant[False-TestSiluMulNvfp4QuantModel-128-64]
Run the test 500 times:
main:

======= 23 failed, 477 passed, 6 warnings in 259.20s (0:04:19) =====

PR:

======= 500 passed, 6 warnings in 182.94s (0:03:02) ======

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

gemini-code-assist

Code Review

This pull request addresses an unstable silu_mul + nvfp4 quantization fusion test by improving the test data generation. Previously, random quantized tensors and scales were created independently, leading to potential accuracy issues and test flakiness. The new approach generates a floating-point tensor first and then quantizes it, ensuring consistency between the data and its scales. This is a solid improvement that makes the test more reliable. My review includes one high-severity suggestion to make a new utility function more robust by correctly handling tensors with negative values.

tests/kernels/quantization/nvfp4_utils.py

Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>

elvischenv · 2025-09-06T18:04:51Z

tests/compile/test_silu_mul_quant_fusion.py

+    "model_class",
+    cast(list[type], [TestSiluMulFp8QuantModel, TestSiluMulNvfp4QuantModel]
+         if is_nvfp4_supported() else [TestSiluMulFp8QuantModel]))


To fix the mypy issue:
https://github.com/vllm-project/vllm/actions/runs/17517499414/job/49756911227?pr=24370

Error: tests/compile/test_silu_mul_quant_fusion.py:102: error: List item 0 has incompatible type "type[TestSiluMulFp8QuantModel]"; expected "type[object]" [list-item] Error: tests/compile/test_silu_mul_quant_fusion.py:102: error: List item 1 has incompatible type "type[TestSiluMulNvfp4QuantModel]"; expected "type[object]" [list-item] Error: tests/compile/test_silu_mul_quant_fusion.py:103: error: List item 0 has incompatible type "type[TestSiluMulFp8QuantModel]"; expected "type[object]" [list-item]

…24370) Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>

elvischenv requested review from WoosukKwon, tlrmchlsmth and yewentao256 as code owners September 6, 2025 15:55

gemini-code-assist bot reviewed Sep 6, 2025

View reviewed changes

tests/kernels/quantization/nvfp4_utils.py Outdated Show resolved Hide resolved

elvischenv force-pushed the elvischenv/fix-unstable-silu-nvfp4-quant-test branch 4 times, most recently from aba2c24 to 315f94e Compare September 6, 2025 16:51

fix unstable silu_mul+nvfp4 quant fusion test

af7edda

Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>

elvischenv force-pushed the elvischenv/fix-unstable-silu-nvfp4-quant-test branch 4 times, most recently from bce6557 to af7edda Compare September 6, 2025 17:32

fix pre-commit

c56e5c3

Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>

elvischenv force-pushed the elvischenv/fix-unstable-silu-nvfp4-quant-test branch from cf37285 to c56e5c3 Compare September 6, 2025 17:59

elvischenv commented Sep 6, 2025

View reviewed changes

ProExpertProg approved these changes Sep 6, 2025

View reviewed changes

ProExpertProg enabled auto-merge (squash) September 6, 2025 19:13

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Sep 6, 2025

ProExpertProg merged commit e68dc2f into vllm-project:main Sep 6, 2025
29 checks passed

elvischenv deleted the elvischenv/fix-unstable-silu-nvfp4-quant-test branch September 8, 2025 13:12

eicherseiji pushed a commit to eicherseiji/vllm that referenced this pull request Sep 9, 2025

[Bugfix] Fix unstable silu_mul+nvfp4 quant fusion test (vllm-project#…

209ef14

…24370) Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>

skyloevil pushed a commit to skyloevil/vllm that referenced this pull request Sep 13, 2025

[Bugfix] Fix unstable silu_mul+nvfp4 quant fusion test (vllm-project#…

eaa48c1

…24370) Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>

FeiDaLI pushed a commit to FeiDaLI/vllm that referenced this pull request Sep 25, 2025

[Bugfix] Fix unstable silu_mul+nvfp4 quant fusion test (vllm-project#…

c8a5e30

…24370) Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bugfix] Fix unstable silu_mul+nvfp4 quant fusion test#24370

[Bugfix] Fix unstable silu_mul+nvfp4 quant fusion test#24370
ProExpertProg merged 2 commits intovllm-project:mainfrom
elvischenv:elvischenv/fix-unstable-silu-nvfp4-quant-test

elvischenv commented Sep 6, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

elvischenv Sep 6, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	a_dtype = torch.randn((m, k), dtype=dtype, device=device)
	b_dtype = torch.randn((n, k), dtype=dtype, device=device)

	a_global_scale = ((FLOAT8_E4M3_MAX * FLOAT4_E2M1_MAX) /
	torch.amax(a_dtype.flatten(), dim=-1)).to(torch.float32)
	b_global_scale = ((FLOAT8_E4M3_MAX * FLOAT4_E2M1_MAX) /
	torch.amax(b_dtype.flatten(), dim=-1)).to(torch.float32)
	alpha = 1. / (a_global_scale * b_global_scale)
	# ops.scaled_fp4_quant returns swizzled scales, while weights
	# from checkpoints are in linear scales.
	a_fp4, a_scale_interleaved = ops.scaled_fp4_quant(a_dtype, a_global_scale)
	b_fp4, b_scale_interleaved = ops.scaled_fp4_quant(b_dtype, b_global_scale)

	# get_ref_results unswizzles the scales internally.
	expected_out = get_ref_results(a_fp4, b_fp4, a_scale_interleaved,
	b_scale_interleaved, a_global_scale,
	b_global_scale, m, n, dtype, block_size,
	device)
	out = ops.cutlass_scaled_fp4_mm(a_fp4, b_fp4, a_scale_interleaved,
	b_scale_interleaved, alpha, dtype)

	torch.testing.assert_close(out,
	expected_out.to(dtype=dtype),
	atol=1e-1,
	rtol=1e-1)

Uh oh!

Conversation

elvischenv commented Sep 6, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan && Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

elvischenv Sep 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

elvischenv commented Sep 6, 2025 •

edited by github-actions bot

Loading

elvischenv Sep 6, 2025 •

edited

Loading