Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LLVM miscompiles passing/returning half on several backends by using lossy conversions #97981

Open
2 of 7 tasks
beetrees opened this issue Jul 8, 2024 · 9 comments
Open
2 of 7 tasks

Comments

@beetrees
Copy link
Contributor

beetrees commented Jul 8, 2024

Consider the following IR:

define half @to_half(i16 %bits) {
    %f = bitcast i16 %bits to half
    ret half %f
}

define i16 @from_half(half %f) {
    %bits = bitcast half %f to i16
    ret i16 %bits
}

As the only operation involved is a bitcast (in particular, there are no floating point type conversions in the LLVM IR), the values returned from to_half and from_half should be bit-for-bit identical to the value passed to them as their only argument (just a different type). However, several targets pass/return half as a float. On these targets, LLVM will use the default float conversion builtins (such as __gnu_h2f_ieee and __gnu_f2h_ieee) to convert between half and float. The issue is that these builtins silence signalling NaNs which changes the NaN payload, meaning that the roundtrip of half -> float -> half is not lossless and causes the generated ASM to violate the semantics of LLVM IR. This miscompilation is similar to #66803.

By inspecting the assembly LLVM emits, I've discovered that at least the following backends appear to be affected (in brackets are a specific target triple that I've checked):

As none of these target's ABI specifications (that I've been able to find) specify how half should be passed (nor does Clang support _Float16 on any of these targets), and given that these targets are a subset of those affected by #97975, I'm filing this as a single issue as the ABI has probably been selected as an automatic default by LLVM rather than a deliberate choice by the backends. Ultimately there are two possible solutions: either fix LLVM to codegen lossless conversions between half and float when needed for the ABI (one way to do this would be with a new pair of builtins that don't silence signalling NaNs), or change the ABIs to pass/return half without converting it to float (probably using the same ABI as i16, but some targets might have better options).

Related to #97975.

@llvmbot
Copy link
Member

llvmbot commented Jul 8, 2024

@llvm/issue-subscribers-backend-powerpc

Author: None (beetrees)

Consider the following IR: ```llvm define half @to_half(i16 %bits) { %f = bitcast i16 %bits to half ret half %f }

define i16 @from_half(half %f) {
%bits = bitcast half %f to i16
ret i16 %bits
}

As the only operation involved is a bitcast (in particular, there are no floating point type conversions in the LLVM IR), the values returned from `to_half` and `from_half` should be bit-for-bit identical to the value passed to them as their only argument (just a different type). However, several targets pass/return `half` as a `float`. On these targets, LLVM will use the default float conversion builtins (such as `__gnu_h2f_ieee` and `__gnu_f2h_ieee`) to convert between `half` and `float`. The issue is that these builtins silence signalling NaNs which changes the NaN payload, meaning that the roundtrip of `half` -&gt; `float` -&gt; `half` is not lossless and causes the generated ASM to violate the semantics of LLVM IR. This miscompilation is similar to #<!-- -->66803.

By inspecting the assembly LLVM emits, I've discovered that at least the following backends appear to be affected (in brackets are a specific target triple that I've checked):

- Hexagon (`hexagon-unknown-linux-musl`)
- MIPS (`mips64el-unknown-linux-gnuabi64`)
- PowerPC (`powerpc64le-unknown-linux-gnu`)
- SPARC (`sparc64-unknown-linux-gnu`)
- WASM (`wasm32-unknown-wasi`): Already reported in #<!-- -->96438

As none of these target's ABI specifications (that I've been able to find) specify how `half` should be passed (nor does Clang support `_Float16` on any of these targets), and given that these targets are a subset of those affected by #<!-- -->97975, I'm filing this as a single issue as the ABI has probably been selected as an automatic default by LLVM rather than a deliberate choice by the backends. Ultimately there are two possible solutions: either fix LLVM to codegen lossless conversions between `half` and `float` when needed for the ABI (one way to do this would be with a new pair of builtins that don't silence signalling NaNs), or change the ABIs to pass/return `half` without converting it to `float` (probably using the same ABI as `i16`, but some targets might have better options).
</details>

@llvmbot
Copy link
Member

llvmbot commented Jul 8, 2024

@llvm/issue-subscribers-backend-mips

Author: None (beetrees)

Consider the following IR: ```llvm define half @to_half(i16 %bits) { %f = bitcast i16 %bits to half ret half %f }

define i16 @from_half(half %f) {
%bits = bitcast half %f to i16
ret i16 %bits
}

As the only operation involved is a bitcast (in particular, there are no floating point type conversions in the LLVM IR), the values returned from `to_half` and `from_half` should be bit-for-bit identical to the value passed to them as their only argument (just a different type). However, several targets pass/return `half` as a `float`. On these targets, LLVM will use the default float conversion builtins (such as `__gnu_h2f_ieee` and `__gnu_f2h_ieee`) to convert between `half` and `float`. The issue is that these builtins silence signalling NaNs which changes the NaN payload, meaning that the roundtrip of `half` -&gt; `float` -&gt; `half` is not lossless and causes the generated ASM to violate the semantics of LLVM IR. This miscompilation is similar to #<!-- -->66803.

By inspecting the assembly LLVM emits, I've discovered that at least the following backends appear to be affected (in brackets are a specific target triple that I've checked):

- Hexagon (`hexagon-unknown-linux-musl`)
- MIPS (`mips64el-unknown-linux-gnuabi64`)
- PowerPC (`powerpc64le-unknown-linux-gnu`)
- SPARC (`sparc64-unknown-linux-gnu`)
- WASM (`wasm32-unknown-wasi`): Already reported in #<!-- -->96438

As none of these target's ABI specifications (that I've been able to find) specify how `half` should be passed (nor does Clang support `_Float16` on any of these targets), and given that these targets are a subset of those affected by #<!-- -->97975, I'm filing this as a single issue as the ABI has probably been selected as an automatic default by LLVM rather than a deliberate choice by the backends. Ultimately there are two possible solutions: either fix LLVM to codegen lossless conversions between `half` and `float` when needed for the ABI (one way to do this would be with a new pair of builtins that don't silence signalling NaNs), or change the ABIs to pass/return `half` without converting it to `float` (probably using the same ABI as `i16`, but some targets might have better options).
</details>

@llvmbot
Copy link
Member

llvmbot commented Jul 8, 2024

@llvm/issue-subscribers-backend-hexagon

Author: None (beetrees)

Consider the following IR: ```llvm define half @to_half(i16 %bits) { %f = bitcast i16 %bits to half ret half %f }

define i16 @from_half(half %f) {
%bits = bitcast half %f to i16
ret i16 %bits
}

As the only operation involved is a bitcast (in particular, there are no floating point type conversions in the LLVM IR), the values returned from `to_half` and `from_half` should be bit-for-bit identical to the value passed to them as their only argument (just a different type). However, several targets pass/return `half` as a `float`. On these targets, LLVM will use the default float conversion builtins (such as `__gnu_h2f_ieee` and `__gnu_f2h_ieee`) to convert between `half` and `float`. The issue is that these builtins silence signalling NaNs which changes the NaN payload, meaning that the roundtrip of `half` -&gt; `float` -&gt; `half` is not lossless and causes the generated ASM to violate the semantics of LLVM IR. This miscompilation is similar to #<!-- -->66803.

By inspecting the assembly LLVM emits, I've discovered that at least the following backends appear to be affected (in brackets are a specific target triple that I've checked):

- Hexagon (`hexagon-unknown-linux-musl`)
- MIPS (`mips64el-unknown-linux-gnuabi64`)
- PowerPC (`powerpc64le-unknown-linux-gnu`)
- SPARC (`sparc64-unknown-linux-gnu`)
- WASM (`wasm32-unknown-wasi`): Already reported in #<!-- -->96438

As none of these target's ABI specifications (that I've been able to find) specify how `half` should be passed (nor does Clang support `_Float16` on any of these targets), and given that these targets are a subset of those affected by #<!-- -->97975, I'm filing this as a single issue as the ABI has probably been selected as an automatic default by LLVM rather than a deliberate choice by the backends. Ultimately there are two possible solutions: either fix LLVM to codegen lossless conversions between `half` and `float` when needed for the ABI (one way to do this would be with a new pair of builtins that don't silence signalling NaNs), or change the ABIs to pass/return `half` without converting it to `float` (probably using the same ABI as `i16`, but some targets might have better options).
</details>

@llvmbot
Copy link
Member

llvmbot commented Jul 8, 2024

@llvm/issue-subscribers-backend-webassembly

Author: None (beetrees)

Consider the following IR: ```llvm define half @to_half(i16 %bits) { %f = bitcast i16 %bits to half ret half %f }

define i16 @from_half(half %f) {
%bits = bitcast half %f to i16
ret i16 %bits
}

As the only operation involved is a bitcast (in particular, there are no floating point type conversions in the LLVM IR), the values returned from `to_half` and `from_half` should be bit-for-bit identical to the value passed to them as their only argument (just a different type). However, several targets pass/return `half` as a `float`. On these targets, LLVM will use the default float conversion builtins (such as `__gnu_h2f_ieee` and `__gnu_f2h_ieee`) to convert between `half` and `float`. The issue is that these builtins silence signalling NaNs which changes the NaN payload, meaning that the roundtrip of `half` -&gt; `float` -&gt; `half` is not lossless and causes the generated ASM to violate the semantics of LLVM IR. This miscompilation is similar to #<!-- -->66803.

By inspecting the assembly LLVM emits, I've discovered that at least the following backends appear to be affected (in brackets are a specific target triple that I've checked):

- Hexagon (`hexagon-unknown-linux-musl`)
- MIPS (`mips64el-unknown-linux-gnuabi64`)
- PowerPC (`powerpc64le-unknown-linux-gnu`)
- SPARC (`sparc64-unknown-linux-gnu`)
- WASM (`wasm32-unknown-wasi`): Already reported in #<!-- -->96438

As none of these target's ABI specifications (that I've been able to find) specify how `half` should be passed (nor does Clang support `_Float16` on any of these targets), and given that these targets are a subset of those affected by #<!-- -->97975, I'm filing this as a single issue as the ABI has probably been selected as an automatic default by LLVM rather than a deliberate choice by the backends. Ultimately there are two possible solutions: either fix LLVM to codegen lossless conversions between `half` and `float` when needed for the ABI (one way to do this would be with a new pair of builtins that don't silence signalling NaNs), or change the ABIs to pass/return `half` without converting it to `float` (probably using the same ABI as `i16`, but some targets might have better options).
</details>

@vchuravy
Copy link
Contributor

vchuravy commented Jul 8, 2024

FYI: On PPC atleast you need to add zext I think. (On PPC i16 is being passed as i32 and that can cause issue)

define half @to_half(zext i16 %bits) {
    %f = bitcast i16 %bits to half
    ret half %f
}

@programmerjake
Copy link
Contributor

FYI: On PPC atleast you need to add zext I think. (On PPC i16 is being passed as i32 and that can cause issue)

I remember seeing somewhere that s390x requires either zext or sext for all integer arguments and returns, since they have different ABIs and the generated code assumes they've been sign/zero extended properly.

@arsenm
Copy link
Contributor

arsenm commented Jul 10, 2024

The issue is that these builtins silence signalling NaNs which changes the NaN payload, meaning that the roundtrip of half -> float -> half is not lossless and causes the generated ASM to violate the semantics of LLVM IR.

The handling here treats the ABI as implicitly adding an fpext/fptrunc in the argument/return lowering (or a strict_fpext in the strictfp case). LLVM semantics do not guarantee signaling nan quieting, but ideally there wouldn't be an implicit cast in the first place. This is a consequence of SelectionDAG's type legalization logic, where without a legal f16 types assumes promotion to float in all contexts.

I agree this isn't the most sensible behavior. AMDGPU suffers the same issue for the antique targets without legal f16, which I've always found annoying. I think it would make more sense to treat it as i16, or if we really want to keep it in float registers, as the low bits (or high on big endian) that just happen to be stored in a float that need to be extracted as an integer.

@beetrees
Copy link
Contributor Author

beetrees commented Aug 3, 2024

I've confirmed that the experimental C-SKY backend also appears to experience this issue.

@beetrees
Copy link
Contributor Author

The LoongArch backend also appears to have this issue.

tgross35 added a commit to tgross35/compiler-builtins that referenced this issue Nov 5, 2024
CI in [1] seems to indicate that there are cases where the `f16`
infinite recursion bug ([2], [3]) can make its way into what gets called
during tests, even though this doesn't seem to be the usual case. In
order to make sure that we avoid these completely, just unset
`f16_enabled` on any platforms that have the recursion problem.

This also refactors the `match` statement to be more in line with
`library/std/build.rs`.

[1]: rust-lang#729
[2]: llvm/llvm-project#97981
[3]: rust-lang#651
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants