Temporary Fix for FP16 -> FP8 conversion failure on -0.0 #2387

yucai-intel · 2025-11-21T08:33:41Z

To resolve #2219
This PR is to temporarily work around the issue where FP16's -0.0 is erroneously converted to NaN during certain fusion passes (fp16 -> fp32 -> fp8), we are currently avoiding the use of the sycl::half data type in the intermediate conversion steps to prevent the problematic fusion from occurring.

guangyey

One question: how do you identify this is a compiler issue, any reproducer founded or regression compiler version detected?

CuiYifeng · 2025-11-28T08:31:32Z

One question: how do you identify this is a compiler issue, any reproducer founded or regression compiler version detected?

@guangyey Thanks for the question. We found that this issue does not occur with the following explicit fp16->fp32->fp8 conversion:

import torch
x = torch.tensor(-0.0, dtype=torch.float16).xpu()
y = x.to(torch.float32)
z = y.to(torch.float8_e4m3fn)
print(z)

however, we will get nan in the following usage with implicit fp16->fp32 conversion:

import torch
x = torch.tensor(-0.0, dtype=torch.float16).xpu()
z = x.to(torch.float8_e4m3fn)
print(z)

The key difference between these two cases is that the conversion in the first case is submitted as two kernels, but the conversion in the second one is submitted as one kernel, where some optimizations exist in the second case. Such conjecture has been confirmed by a reproducer in local.
Furthermore, we are currently not sure whether the problem is caused by compiler or IGC, so I have updated PR description.

Copilot

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Copilot

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 1 comment.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

src/ATen/native/xpu/sycl/CopyKernel.cpp

Copilot

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

src/ATen/native/xpu/sycl/CopyKernel.cpp

CuiYifeng · 2025-12-02T01:20:23Z

@guangyey @EikanWang This fixing has been updated. Please take a look, thanks.

guangyey

c10::detail::fp16_ieee_to_fp32_value(src_val.x) is functionally equivalent to the fallback path where FP8 values are first converted to FP32 on the CPU. I don't know what the root cause is, however, the workaround seems good to me.
Let's @EikanWang make the final stamp.

yucai-intel added 4 commits November 21, 2025 16:28

Update CopyKernel.cpp

ac60ff0

Update CopyKernel.cpp

3bb86ae

method2

d0d7327

Merge branch 'main' into yucai/fp8/fix1

df180a7

yucai-intel changed the title ~~Float8 Conversion: Forced Correction for -0.0~~ Temporary Fix for FP16 -> FP8 conversion failure on -0.0 Nov 27, 2025

yucai-intel marked this pull request as ready for review November 27, 2025 08:44

format

cb52f93

yucai-intel requested a review from CuiYifeng November 27, 2025 08:47

CuiYifeng requested review from EikanWang, Copilot and guangyey November 28, 2025 07:21

CuiYifeng approved these changes Nov 28, 2025

View reviewed changes

guangyey reviewed Nov 28, 2025

View reviewed changes

Copilot AI reviewed Nov 28, 2025

View reviewed changes

CuiYifeng requested a review from Copilot November 28, 2025 13:13

Copilot AI reviewed Nov 28, 2025

View reviewed changes

src/ATen/native/xpu/sycl/CopyKernel.cpp Outdated Show resolved Hide resolved

CuiYifeng self-requested a review December 1, 2025 02:59

CuiYifeng added 2 commits November 30, 2025 21:49

Fix FP16 to FP32 conversion

8f62152

Add test cases

42de29b

CuiYifeng requested a review from Copilot December 1, 2025 06:08

Copilot AI reviewed Dec 1, 2025

View reviewed changes

src/ATen/native/xpu/sycl/CopyKernel.cpp Outdated Show resolved Hide resolved

Simplify template

58f54b5

CuiYifeng approved these changes Dec 2, 2025

View reviewed changes

guangyey approved these changes Dec 2, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Temporary Fix for FP16 -> FP8 conversion failure on -0.0 #2387

Temporary Fix for FP16 -> FP8 conversion failure on -0.0 #2387

Uh oh!

yucai-intel commented Nov 21, 2025 •

edited by CuiYifeng

Loading

Uh oh!

guangyey left a comment

Uh oh!

CuiYifeng commented Nov 28, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

CuiYifeng commented Dec 2, 2025

Uh oh!

guangyey left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Temporary Fix for FP16 -> FP8 conversion failure on -0.0 #2387

Are you sure you want to change the base?

Temporary Fix for FP16 -> FP8 conversion failure on -0.0 #2387

Uh oh!

Conversation

yucai-intel commented Nov 21, 2025 • edited by CuiYifeng Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

guangyey left a comment

Choose a reason for hiding this comment

Uh oh!

CuiYifeng commented Nov 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

CuiYifeng commented Dec 2, 2025

Uh oh!

guangyey left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

yucai-intel commented Nov 21, 2025 •

edited by CuiYifeng

Loading

CuiYifeng commented Nov 28, 2025 •

edited

Loading