Behavior of overflowing floating-point to integer conversions #66603

dzaima · 2023-09-17T20:43:09Z

This issue is either:

a) a missed optimization in the x86-64 backend; or
~~b) a wrong optimization in the aarch64 backend.~~

Consider the following code:

#include<stdint.h>
#include<stdbool.h>
bool is_i8(double x) {
    return x == (int8_t) x;
}
bool is_i32(double x) {
    return x == (int32_t)x;
}

As per the C standard, these functions invoke undefined behavior if given arguments that, rounded to an integer, don't fit in the desired type. Thus, is_i8 can always be replaced with is_i32 (alive2 proof from the optimized IR).

However, the x86-64 backend does not do this, and thus is_i8 has an unnecessary movsx eax, al instruction that can be eliminated. The aarch64 backend does do this optimization. compiler explorer.

GCC preserves the int8_t cast on both x86-64 and ARM64 (and everything else I tested in CE): https://godbolt.org/z/78MbdGbPz. (EDIT: on gcc≤13.2, this is only the case with -ftrapping-math, which gcc has on by default - adding -fno-trapping-math (and -msse4.1 on x86-64) it'll convert it to a round-comparison; on gcc≥13.3 it keeps the integer conversion always as far as I can tell)

And while C does allow the optimization in question, it means that x == (T)x cannot be used as a check for whether the floating-point value x fits in the integer type T (even though it would work if the conversion gave any valid value of T in place of UB/poison). And, as far as I can tell, there is no alternative way to do a check like this anywhere near as performantly, without writing platform-specific assembly, which is, IMO, a quite problematic issue, though not really a clang-specific one.

The text was updated successfully, but these errors were encountered:

dzaima · 2023-10-10T17:15:13Z

Some additional info - in the C standard, Annex F, F.4, says that, on float to integer conversion of an overflowing value (NaN, infinity, exceeded range), "the resulting value is unspecified.".

Granted, LLVM does not, to my understanding, claim to support Annex F, but this is a nudge towards making the result unspecified instead of undefined (which would mean that x == (int8_t) x would be usable as a fits-in-int8 check, and would match option B in the original issue).

nikic · 2023-10-10T19:12:52Z

Clang considers this undefined behavior by default, but you can use -fno-strict-float-cast-overflow to make overflowing float to int casts well-defined. (The documentation does not mention it, but they become saturating.)

dzaima · 2023-10-10T19:21:32Z

Thanks, that's a useful note. Though unfortunately it saturating means significantly more generated code on x86-64 & risc-v, so it's not much better in my case where I do want it to be as fast as possible.

I suppose, with, -fno-strict-float-cast-overflow, x == (int8_t)x could be optimized to not do the saturation, but it'd still slow down other things doing a cast.

nikic · 2023-10-10T19:28:32Z

As we don't seem to promise any specific behavior in the documentation, it might be possible to change it use freeze(fptoui) instead of fptui.sat, which would be about what you want. Not sure whether people rely on the current saturating implementation.

dzaima · 2023-10-10T19:40:50Z

Seems adding a freeze over fptosi in my as_i8 example doesn't actually change the behavior of the aarch64 & risc-v output: https://godbolt.org/z/YznfaEsnb; I guess that's a miscompilation?

nikic · 2023-10-13T13:25:58Z

Simpler example: https://godbolt.org/z/afeaEf8oo

This does look like a miscompile to me. We have this type-legalized SDAG:

SelectionDAG has 11 nodes:
  t0: ch,glue = EntryToken
            t2: f64,ch = CopyFromReg t0, Register:f64 %0
          t9: i32 = fp_to_sint t2
        t11: i32 = AssertSext t9, ValueType:ch:i8
      t12: i32 = freeze t11
    t13: i32 = sign_extend_inreg t12, ValueType:ch:i8
  t7: ch,glue = CopyToReg t0, Register:i32 $w0, t13
  t8: ch = AArch64ISD::RET_GLUE t7, Register:i32 $w0, t7:1

Which gets DAGCombined to:

  t0: ch,glue = EntryToken
          t2: f64,ch = CopyFromReg t0, Register:f64 %0
        t9: i32 = fp_to_sint t2
      t14: i32 = freeze t9
    t11: i32 = AssertSext t14, ValueType:ch:i8
  t7: ch,glue = CopyToReg t0, Register:i32 $w0, t11
  t8: ch = AArch64ISD::RET_GLUE t7, Register:i32 $w0, t7:1

Note that the freeze was moved past the AssertSext. This happens because these are explicitly listed as not creation undef/poison:

llvm-project/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp

Lines 5006 to 5007 in 3f4bf99

    
           case ISD::AssertSext: 
        
           case ISD::AssertZext:

This looks incorrect to me.

nikic · 2023-10-13T14:48:44Z

This was introduced in 7e294e676e32f by @RKSimon. What was the rationale for that change? Wouldn't these nodes return poison if the assertion does not hold?

RKSimon · 2023-10-13T15:42:36Z

IIRC the descriptions at the moment doesn't allow for cases where the assert nodes don't hold.

RKSimon · 2023-10-13T15:55:10Z

Feel free to remove the assertzext/sext cases if it's causing a problem and we can revisit this.

dzaima · 2023-10-19T14:15:47Z

Some issues remain - without a changed -fno-strict-float-cast-overflow or some other addition, the freeze-using version cannot be written in C.

And, less importantly, there's the missed optimization for x86-64 of dropping movsx eax, al for current x == (int8_t) x. (and fwiw, for ≥SSE4.1 (and aarch64), transforming all x == (intType)x to x == floor(x) might be an alternative)

dzaima · 2024-07-18T17:40:52Z

As we don't seem to promise any specific behavior in the documentation, it might be possible to change it use freeze(fptoui) instead of fptui.sat, which would be about what you want. Not sure whether people rely on the current saturating implementation.

I would perhaps propose -ffloat-cast-overflow=(saturate|arbitrary|undefined), mapping to fpto[us]i.sat, fpto[us]i+freeze, and regular poisoning/UB fpto[us]i respectively; a feature of saturating others might be relying on would be having consistent results across multiple invocations or even platforms. Could maybe alternatively be as options under -fstrict-float-cast-overflow directly if wanting to avoid two options relating to this, but the name is slightly weird for that.

Also, --help does state, since clang 7:

  -fno-strict-float-cast-overflow
                          Relax language rules and try to match the behavior of the target's native float-to-int conversion instructions

which might've been the case back then (it definitely doesn't saturate at least on clang≤13), but since clang 14 it's rather inaccurate as x86's native behavior isn't saturating.

github-actions bot added the new issue label Sep 17, 2023

EugeneZelenko added llvm:optimizations undefined behaviour and removed new issue labels Sep 18, 2023

RKSimon self-assigned this Oct 18, 2023

RKSimon added a commit that referenced this issue Oct 19, 2023

[DAG] Add test coverage for Issue #66603

309e41d

RKSimon closed this as completed in 8505c3b Oct 19, 2023

nikic reopened this Oct 19, 2023

RKSimon removed their assignment Oct 19, 2023

madhur13490 mentioned this issue Oct 20, 2023

Revert commit ba8565fbcb975e2d067ce3ae5a7dbaae4953edd3 madhur13490/llvm-project#3

Closed

banach-space mentioned this issue Oct 24, 2023

[mlir][vector] Add scalable vectors to tests for vector.contract #70039

Merged

This was referenced Mar 13, 2024

[SelectionDAG] Handle more opcodes in canCreateUndefOrPoison #84921

Merged

[SelectionDAG] Treat CopyFromReg as freezing the value #85932

Merged

bjope mentioned this issue Apr 26, 2024

[DAGCombiner] Freeze maybe poison operands when folding select to logic #84924

Merged

bjope mentioned this issue Jun 6, 2024

[DAG] Add freeze(assertext(x)) -> assertext(x) folds #94491

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Behavior of overflowing floating-point to integer conversions #66603

Behavior of overflowing floating-point to integer conversions #66603

dzaima commented Sep 17, 2023 •

edited

Loading

dzaima commented Oct 10, 2023

nikic commented Oct 10, 2023

dzaima commented Oct 10, 2023 •

edited

Loading

nikic commented Oct 10, 2023

dzaima commented Oct 10, 2023

nikic commented Oct 13, 2023

nikic commented Oct 13, 2023

RKSimon commented Oct 13, 2023

RKSimon commented Oct 13, 2023

dzaima commented Oct 19, 2023

dzaima commented Jul 18, 2024 •

edited

Loading

Behavior of overflowing floating-point to integer conversions #66603

Behavior of overflowing floating-point to integer conversions #66603

Comments

dzaima commented Sep 17, 2023 • edited Loading

dzaima commented Oct 10, 2023

nikic commented Oct 10, 2023

dzaima commented Oct 10, 2023 • edited Loading

nikic commented Oct 10, 2023

dzaima commented Oct 10, 2023

nikic commented Oct 13, 2023

nikic commented Oct 13, 2023

RKSimon commented Oct 13, 2023

RKSimon commented Oct 13, 2023

dzaima commented Oct 19, 2023

dzaima commented Jul 18, 2024 • edited Loading

dzaima commented Sep 17, 2023 •

edited

Loading

dzaima commented Oct 10, 2023 •

edited

Loading

dzaima commented Jul 18, 2024 •

edited

Loading