[Bugfix] Handle NaN in QuantFP8 Native Forward by alex-jw-brooks · Pull Request #41427 · vllm-project/vllm

alex-jw-brooks · 2026-04-30T23:58:55Z

Purpose

Encountered while investigating this issue in Granite Speech: #41284
Also related fix for the source of the NaNs: #41424

While digging into the the cause for FP8 outputs turning into garbage for 0.17 and 0.20, I eventually found that the cause was due to bias values being corrupted, which was causing a bunch of NaNs. It looks like the reason this didn't become a problem until recently is that the native forward doesn't handle NaNs, but the underlying CUDA kernels do, because they use fmaxf etc, which implicitly handle NaNs.

This PR adds handling for NaNs to the native forward implementation and a check to ensure that it lines up with the CUDA forward.

Test Plan

Also gives normal outputs with the broken example here #41424, although the other PR is the correct fix for the root cause of the NaNs, while this one is to align the behaviors between the forward implementations.

Test Result

Test fails on main with Native scales contain NaN and passes on this branch.

@DarkLight1337 @robertgshaw2-redhat Could you PTAL?

Signed-off-by: Alex Brooks <albrooks@redhat.com>

claude

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

gemini-code-assist

Code Review

This pull request introduces NaN handling in the forward_native method of QuantFP8 by converting NaNs to zeros, aligning its behavior with the CUDA kernel. It also adds a test case to verify this behavior. Feedback suggests moving the NaN conversion to the start of the method to ensure group quantization is also covered, expanding the test suite to include group quantization shapes, and strengthening the test assertions to verify exact alignment across the entire tensor.

gemini-code-assist · 2026-05-01T00:01:47Z


+        # Replace NaN with 0 to match the CUDA kernel's behavior, since the underlying
+        # CUDA kernels use fmaxf, which won't propagate NaNs if we have numeric values.
+        x_f = torch.nan_to_num(x.to(torch.float32), nan=0.0)


The NaN handling logic is currently placed after the check for is_group_quant (lines 186-188). This means that dynamic group quantization (e.g., block quantization) will still use the original tensor x containing NaNs when calling _quantize_group_native, leading to NaN scales and outputs in that path. To ensure consistent behavior across all native quantization modes, this nan_to_num conversion should be moved to the beginning of the forward_native method, before the is_group_quant check.

gemini-code-assist · 2026-05-01T00:01:47Z

+
+
+@pytest.mark.parametrize("dtype", DTYPES)
+@pytest.mark.parametrize("group_shape_name", ["PER_TOKEN", "PER_TENSOR"])


The test currently only covers PER_TOKEN and PER_TENSOR quantization. It should also include a group quantization shape (e.g., a block size like (1, 128)) to ensure that the is_group_quant path in forward_native (which calls _quantize_group_native) also handles NaNs correctly. This is particularly important as the current implementation in input_quant_fp8.py appears to skip NaN handling for the group quantization path.

gemini-code-assist · 2026-05-01T00:01:47Z

+
+    # Quantized outputs should match at non-NaN input positions.
+    valid = ~torch.isnan(x)
+    torch.testing.assert_close(out_native.float()[valid], out_cuda.float()[valid])


Since the goal of this PR is to align the native forward behavior with the CUDA kernel (which treats NaNs as 0), we should verify that the outputs match exactly across the entire tensor, including at the positions where NaNs were injected. Using [valid] masking allows the outputs to differ at NaN positions as long as they are not NaN, which doesn't fully guarantee alignment.

Suggested change

torch.testing.assert_close(out_native.float()[valid], out_cuda.float()[valid])

torch.testing.assert_close(out_native.float(), out_cuda.float())

alex-jw-brooks added 2 commits April 30, 2026 23:49

add test for fp8 native fwd

d88e581

Signed-off-by: Alex Brooks <albrooks@redhat.com>

handle nan in native forward

8d4c070

Signed-off-by: Alex Brooks <albrooks@redhat.com>

alex-jw-brooks requested review from WoosukKwon, mgoin, pavanimajety, robertgshaw2-redhat, tlrmchlsmth and yewentao256 as code owners April 30, 2026 23:58

claude Bot reviewed Apr 30, 2026

View reviewed changes

mergify Bot added the bug Something isn't working label Apr 30, 2026

gemini-code-assist Bot reviewed May 1, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bugfix] Handle NaN in QuantFP8 Native Forward#41427

[Bugfix] Handle NaN in QuantFP8 Native Forward#41427
alex-jw-brooks wants to merge 2 commits into
vllm-project:mainfrom
alex-jw-brooks:fp8_native_nan

alex-jw-brooks commented Apr 30, 2026 •

edited

Loading

Uh oh!

claude Bot left a comment

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot May 1, 2026

Uh oh!

gemini-code-assist Bot May 1, 2026

Uh oh!

gemini-code-assist Bot May 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant



		@pytest.mark.parametrize("dtype", DTYPES)
		@pytest.mark.parametrize("group_shape_name", ["PER_TOKEN", "PER_TENSOR"])

	torch.testing.assert_close(out_native.float()[valid], out_cuda.float()[valid])
	torch.testing.assert_close(out_native.float(), out_cuda.float())

Uh oh!

Conversation

alex-jw-brooks commented Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Claude Code Review

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 1, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 1, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 1, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

alex-jw-brooks commented Apr 30, 2026 •

edited

Loading