Skip to content

Conversation

@yucai-intel
Copy link
Contributor

@yucai-intel yucai-intel commented Nov 21, 2025

To resolve #2219
This PR is to temporarily work around the issue where FP16's -0.0 is erroneously converted to NaN during certain fusion passes (fp16 -> fp32 -> fp8), we are currently avoiding the use of the sycl::half data type in the intermediate conversion steps to prevent the problematic fusion from occurring.

@yucai-intel yucai-intel changed the title Float8 Conversion: Forced Correction for -0.0 Temporary Fix for FP16 -> FP8 conversion failure on -0.0 Nov 27, 2025
@yucai-intel yucai-intel marked this pull request as ready for review November 27, 2025 08:44
Copy link
Contributor

@guangyey guangyey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One question: how do you identify this is a compiler issue, any reproducer founded or regression compiler version detected?

@CuiYifeng
Copy link
Contributor

CuiYifeng commented Nov 28, 2025

One question: how do you identify this is a compiler issue, any reproducer founded or regression compiler version detected?

@guangyey Thanks for the question. We found that this issue does not occur with the following explicit fp16->fp32->fp8 conversion:

import torch
x = torch.tensor(-0.0, dtype=torch.float16).xpu()
y = x.to(torch.float32)
z = y.to(torch.float8_e4m3fn)
print(z)

however, we will get nan in the following usage with implicit fp16->fp32 conversion:

import torch
x = torch.tensor(-0.0, dtype=torch.float16).xpu()
z = x.to(torch.float8_e4m3fn)
print(z)

The key difference between these two cases is that the conversion in the first case is submitted as two kernels, but the conversion in the second one is submitted as one kernel, where some optimizations exist in the second case. Such conjecture has been confirmed by a reproducer in local.
Furthermore, we are currently not sure whether the problem is caused by compiler or IGC, so I have updated PR description.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@CuiYifeng CuiYifeng requested a review from Copilot November 28, 2025 13:13
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@CuiYifeng CuiYifeng self-requested a review December 1, 2025 02:59
@CuiYifeng CuiYifeng requested a review from Copilot December 1, 2025 06:08
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@CuiYifeng
Copy link
Contributor

@guangyey @EikanWang This fixing has been updated. Please take a look, thanks.

Copy link
Contributor

@guangyey guangyey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

c10::detail::fp16_ieee_to_fp32_value(src_val.x) is functionally equivalent to the fallback path where FP8 values are first converted to FP32 on the CPU. I don't know what the root cause is, however, the workaround seems good to me.
Let's @EikanWang make the final stamp.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

float8_e4m3fn precision overflow

4 participants