Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Behavior of overflowing floating-point to integer conversions #66603

Open
dzaima opened this issue Sep 17, 2023 · 11 comments
Open

Behavior of overflowing floating-point to integer conversions #66603

dzaima opened this issue Sep 17, 2023 · 11 comments

Comments

@dzaima
Copy link

dzaima commented Sep 17, 2023

This issue is either:

a) a missed optimization in the x86-64 backend; or
b) a wrong optimization in the aarch64 backend.

Consider the following code:

#include<stdint.h>
#include<stdbool.h>
bool is_i8(double x) {
    return x == (int8_t) x;
}
bool is_i32(double x) {
    return x == (int32_t)x;
}

As per the C standard, these functions invoke undefined behavior if given arguments that, rounded to an integer, don't fit in the desired type. Thus, is_i8 can always be replaced with is_i32 (alive2 proof from the optimized IR).

However, the x86-64 backend does not do this, and thus is_i8 has an unnecessary movsx eax, al instruction that can be eliminated. The aarch64 backend does do this optimization. compiler explorer.

GCC preserves the int8_t cast on both x86-64 and ARM64 (and everything else I tested in CE): https://godbolt.org/z/78MbdGbPz. (EDIT: on gcc≤13.2, this is only the case with -ftrapping-math, which gcc has on by default - adding -fno-trapping-math (and -msse4.1 on x86-64) it'll convert it to a round-comparison; on gcc≥13.3 it keeps the integer conversion always as far as I can tell)

And while C does allow the optimization in question, it means that x == (T)x cannot be used as a check for whether the floating-point value x fits in the integer type T (even though it would work if the conversion gave any valid value of T in place of UB/poison). And, as far as I can tell, there is no alternative way to do a check like this anywhere near as performantly, without writing platform-specific assembly, which is, IMO, a quite problematic issue, though not really a clang-specific one.

@dzaima
Copy link
Author

dzaima commented Oct 10, 2023

Some additional info - in the C standard, Annex F, F.4, says that, on float to integer conversion of an overflowing value (NaN, infinity, exceeded range), "the resulting value is unspecified.".

Granted, LLVM does not, to my understanding, claim to support Annex F, but this is a nudge towards making the result unspecified instead of undefined (which would mean that x == (int8_t) x would be usable as a fits-in-int8 check, and would match option B in the original issue).

@nikic
Copy link
Contributor

nikic commented Oct 10, 2023

Clang considers this undefined behavior by default, but you can use -fno-strict-float-cast-overflow to make overflowing float to int casts well-defined. (The documentation does not mention it, but they become saturating.)

@dzaima
Copy link
Author

dzaima commented Oct 10, 2023

Thanks, that's a useful note. Though unfortunately it saturating means significantly more generated code on x86-64 & risc-v, so it's not much better in my case where I do want it to be as fast as possible.

I suppose, with, -fno-strict-float-cast-overflow, x == (int8_t)x could be optimized to not do the saturation, but it'd still slow down other things doing a cast.

@nikic
Copy link
Contributor

nikic commented Oct 10, 2023

As we don't seem to promise any specific behavior in the documentation, it might be possible to change it use freeze(fptoui) instead of fptui.sat, which would be about what you want. Not sure whether people rely on the current saturating implementation.

@dzaima
Copy link
Author

dzaima commented Oct 10, 2023

Seems adding a freeze over fptosi in my as_i8 example doesn't actually change the behavior of the aarch64 & risc-v output: https://godbolt.org/z/YznfaEsnb; I guess that's a miscompilation?

@nikic
Copy link
Contributor

nikic commented Oct 13, 2023

Simpler example: https://godbolt.org/z/afeaEf8oo

This does look like a miscompile to me. We have this type-legalized SDAG:

SelectionDAG has 11 nodes:
  t0: ch,glue = EntryToken
            t2: f64,ch = CopyFromReg t0, Register:f64 %0
          t9: i32 = fp_to_sint t2
        t11: i32 = AssertSext t9, ValueType:ch:i8
      t12: i32 = freeze t11
    t13: i32 = sign_extend_inreg t12, ValueType:ch:i8
  t7: ch,glue = CopyToReg t0, Register:i32 $w0, t13
  t8: ch = AArch64ISD::RET_GLUE t7, Register:i32 $w0, t7:1

Which gets DAGCombined to:

  t0: ch,glue = EntryToken
          t2: f64,ch = CopyFromReg t0, Register:f64 %0
        t9: i32 = fp_to_sint t2
      t14: i32 = freeze t9
    t11: i32 = AssertSext t14, ValueType:ch:i8
  t7: ch,glue = CopyToReg t0, Register:i32 $w0, t11
  t8: ch = AArch64ISD::RET_GLUE t7, Register:i32 $w0, t7:1

Note that the freeze was moved past the AssertSext. This happens because these are explicitly listed as not creation undef/poison:

case ISD::AssertSext:
case ISD::AssertZext:

This looks incorrect to me.

@nikic
Copy link
Contributor

nikic commented Oct 13, 2023

This was introduced in 7e294e676e32f by @RKSimon. What was the rationale for that change? Wouldn't these nodes return poison if the assertion does not hold?

@RKSimon
Copy link
Collaborator

RKSimon commented Oct 13, 2023

IIRC the descriptions at the moment doesn't allow for cases where the assert nodes don't hold.

@RKSimon
Copy link
Collaborator

RKSimon commented Oct 13, 2023

Feel free to remove the assertzext/sext cases if it's causing a problem and we can revisit this.

@RKSimon RKSimon self-assigned this Oct 18, 2023
@dzaima
Copy link
Author

dzaima commented Oct 19, 2023

Some issues remain - without a changed -fno-strict-float-cast-overflow or some other addition, the freeze-using version cannot be written in C.

And, less importantly, there's the missed optimization for x86-64 of dropping movsx eax, al for current x == (int8_t) x. (and fwiw, for ≥SSE4.1 (and aarch64), transforming all x == (intType)x to x == floor(x) might be an alternative)

@dzaima
Copy link
Author

dzaima commented Jul 18, 2024

As we don't seem to promise any specific behavior in the documentation, it might be possible to change it use freeze(fptoui) instead of fptui.sat, which would be about what you want. Not sure whether people rely on the current saturating implementation.

I would perhaps propose -ffloat-cast-overflow=(saturate|arbitrary|undefined), mapping to fpto[us]i.sat, fpto[us]i+freeze, and regular poisoning/UB fpto[us]i respectively; a feature of saturating others might be relying on would be having consistent results across multiple invocations or even platforms. Could maybe alternatively be as options under -fstrict-float-cast-overflow directly if wanting to avoid two options relating to this, but the name is slightly weird for that.

Also, --help does state, since clang 7:

  -fno-strict-float-cast-overflow
                          Relax language rules and try to match the behavior of the target's native float-to-int conversion instructions

which might've been the case back then (it definitely doesn't saturate at least on clang≤13), but since clang 14 it's rather inaccurate as x86's native behavior isn't saturating.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants