Add addition, subtraction, multiplication, and compare operations for `f128` #606

tgross35 · 2024-05-10T23:43:41Z

Division is not yet working, but all others were straigtforward to add so I split them out of #587.

This includes some bigger changes that are split per commit:

Split Int into Int and MinInt so we can use traits with bigint types
Add 256-bit bigint types for widening operations on 128-bit integers
Refactor test macros so that systems without implementations test against rustc_apfloat rather than just being skipped. I needed this so I can actually debug.
Add implementations and tests for the new operations
Change powerpc symbol names to match what LLVM emits, from https://gcc.gnu.org/wiki/Ieee128PowerPC

tgross35 · 2024-05-11T08:39:14Z

Powerpc seems to be hitting a SIGILL on stxvd2x vs34,0,r9. From https://bugzilla.redhat.com/show_bug.cgi?id=1045384 and https://bugzilla.redhat.com/show_bug.cgi?id=1002077 it seems like this may be a limitation of qemu, but information is scarce. I may just need to use the apfloat fallback.

Also looks like symbol names may be different, I need to double check https://gcc.gnu.org/wiki/Ieee128PowerPC

tgross35 · 2024-05-11T09:50:03Z

Ok, some more info. Compiling C for 32-bit powerpc seems to always emit __gcc_qadd regardless of -mlong-double flag. When forced to use __addkf3 it segfaults as expected because of the stxvd2x instruction. I think we are okay just ignoring this one as a likely limitation of qemu.

Compiling C for powerpc64 seems to use an ifunc resolver for __addkf3, which goes to __addkf3_hw for the single-instruction xsaddqp. Compiling Rust seems to do the exact same thing but I am getting different results, __addtf3(0x00000000000000000000000000000000, 0x00000000000000000000000000000001): std: 0x00000000000000000000000000000000, builtins: 0x00000000000000000000000000000001. ~~I get the same results when calling __addkf3 directly in C~~ edit: no I don't. Maybe the operation format of xsaddqp needs to be set somehow to choose between ppc doubledouble and ieee f128?

tgross35 · 2024-05-12T10:23:29Z

Some asm for anyone who understands it better:

Assembly generated from Rust (incorrect result)

0000000000010d00 <.add_entry>:
   10d00:   7c 08 02 a6     mflr    r0
   10d04:   f8 21 ff 91     stdu    r1,-112(r1)
   10d08:   f8 01 00 80     std     r0,128(r1)
   10d0c:   4b ff c6 95     bl      d3a0 <00000143.plt_call.__addkf3>
   10d10:   e8 41 00 28     ld      r2,40(r1)
   10d14:   38 21 00 70     addi    r1,r1,112
   10d18:   e8 01 00 10     ld      r0,16(r1)
   10d1c:   7c 08 03 a6     mtlr    r0
   10d20:   4e 80 00 20     blr

000000000000d3a0 <00000143.plt_call.__addkf3>:
    d3a0:       f8 41 00 28     std     r2,40(r1)
    d3a4:       3d 62 ff ff     addis   r11,r2,-1
    d3a8:       e9 8b 7f 58     ld      r12,32600(r11)
    d3ac:       7d 89 03 a6     mtctr   r12
    d3b0:       e8 4b 7f 60     ld      r2,32608(r11)
    d3b4:       4e 80 04 20     bctr

0000000000056790 <.__addkf3_resolve>:
   56790:       81 2d 8f 9c     lwz     r9,-28772(r13)
   56794:       75 29 00 40     andis.  r9,r9,64
   56798:       41 82 00 18     beq     567b0 <.__addkf3_resolve+0x20>
   5679c:       e8 62 80 10     ld      r3,-32752(r2)
   567a0:       4e 80 00 20     blr
   567a4:       60 00 00 00     nop
   567a8:       60 00 00 00     nop
   567ac:       60 42 00 00     ori     r2,r2,0
   567b0:       e8 62 80 18     ld      r3,-32744(r2)
   567b4:       4e 80 00 20     blr
        ...
   567c4:       60 00 00 00     nop
   567c8:       60 00 00 00     nop
   567cc:       60 42 00 00     ori     r2,r2,0

000000000005e6e0 <.__addkf3_hw>:
   5e6e0:       fc 42 18 08     xsaddqp v2,v2,v3
   5e6e4:       4e 80 00 20     blr
        ...
   5e6f4:       60 00 00 00     nop
   5e6f8:       60 00 00 00     nop
   5e6fc:       60 00 00 00     nop

0000000000057050 <.__addkf3_sw>:
   57050:       fb 41 ff d0     std     r26,-48(r1)
   57054:       fb 61 ff d8     std     r27,-40(r1)
   57058:       fb 81 ff e0     std     r28,-32(r1)
   5705c:       fb a1 ff e8     std     r29,-24(r1)
   57060:       fb c1 ff f0     std     r30,-16(r1)
   57064:       fb e1 ff f8     std     r31,-8(r1)
   57068:       f8 21 ff 41     stdu    r1,-192(r1)
   5706c:       39 21 00 70     addi    r9,r1,112
   57070:       39 41 00 70     addi    r10,r1,112
   // ... full sw implementation

Assembly generated from C (correct result)

0000000010000af8 <.add_entry>:
    10000af8:   7c 08 02 a6     mflr    r0
    10000afc:   f8 01 00 10     std     r0,16(r1)
    10000b00:   fb e1 ff f8     std     r31,-8(r1)
    10000b04:   f8 21 ff 81     stdu    r1,-128(r1)
    10000b08:   7c 3f 0b 78     mr      r31,r1
    10000b0c:   39 20 00 30     li      r9,48
    10000b10:   39 5f 00 80     addi    r10,r31,128
    10000b14:   7c 4a 4f 99     stxvd2x vs34,r10,r9
    10000b18:   39 20 00 40     li      r9,64
    10000b1c:   39 5f 00 80     addi    r10,r31,128
    10000b20:   7c 6a 4f 99     stxvd2x vs35,r10,r9
    10000b24:   39 20 00 40     li      r9,64
    10000b28:   39 5f 00 80     addi    r10,r31,128
    10000b2c:   7c 6a 4e 99     lxvd2x  vs35,r10,r9
    10000b30:   39 20 00 30     li      r9,48
    10000b34:   39 5f 00 80     addi    r10,r31,128
    10000b38:   7c 4a 4e 99     lxvd2x  vs34,r10,r9
    10000b3c:   4b ff fb a5     bl      100006e0 <00000019.plt_call.__addkf3>
    10000b40:   e8 41 00 28     ld      r2,40(r1)
    10000b44:   f0 02 14 96     xxmr    vs0,vs34
    10000b48:   f0 40 04 91     xxmr    vs34,vs0
    10000b4c:   38 3f 00 80     addi    r1,r31,128
    10000b50:   e8 01 00 10     ld      r0,16(r1)
    10000b54:   7c 08 03 a6     mtlr    r0
    10000b58:   eb e1 ff f8     ld      r31,-8(r1)
    10000b5c:   4e 80 00 20     blr
    10000b60:   00 00 00 00     .long 0x0
    10000b64:   00 00 00 01     .long 0x1
    10000b68:   80 01 00 01     lwz     r0,1(r1)

0000000010000c40 <.__addkf3_resolve>:
    10000c40:   81 2d 8f 9c     lwz     r9,-28772(r13)
    10000c44:   75 29 00 40     andis.  r9,r9,64
    10000c48:   41 82 00 18     beq     10000c60 <.__addkf3_resolve+0x20>
    10000c4c:   e8 62 80 10     ld      r3,-32752(r2)
    10000c50:   4e 80 00 20     blr
    10000c54:   60 00 00 00     nop
    10000c58:   60 00 00 00     nop
    10000c5c:   60 42 00 00     ori     r2,r2,0
    10000c60:   e8 62 80 18     ld      r3,-32744(r2)
    10000c64:   4e 80 00 20     blr
        ...
    10000c74:   60 00 00 00     nop
    10000c78:   60 00 00 00     nop
    10000c7c:   60 42 00 00     ori     r2,r2,0

0000000010008b90 <.__addkf3_hw>:
    10008b90:   fc 42 18 08     xsaddqp v2,v2,v3
    10008b94:   4e 80 00 20     blr
        ...
    10008ba4:   60 00 00 00     nop
    10008ba8:   60 00 00 00     nop
    10008bac:   60 00 00 00     nop

0000000010001500 <.__addkf3_sw>:
    10001500:   fb 41 ff d0     std     r26,-48(r1)
    10001504:   fb 61 ff d8     std     r27,-40(r1)
    10001508:   fb 81 ff e0     std     r28,-32(r1)
    1000150c:   fb a1 ff e8     std     r29,-24(r1)
    10001510:   fb c1 ff f0     std     r30,-16(r1)
    10001514:   fb e1 ff f8     std     r31,-8(r1)
    10001518:   f8 21 ff 41     stdu    r1,-192(r1)
    1000151c:   39 21 00 70     addi    r9,r1,112
    10001520:   39 41 00 70     addi    r10,r1,112
    // ...

Rust code

#![feature(f128)]

#[no_mangle]
#[inline(never)]
fn add_entry(a: f128, b: f128) -> f128 {
    a + b
}

fn main() {
    let a = f128::from_bits(0x0);
    let b = f128::from_bits(0x1);
    dbg!(a, b);
    let c = add_entry(a, b);
    dbg!(c);
}

C code

#include <stdio.h>
#include <stdlib.h>
#include <inttypes.h>

#define _Float128 __float128

typedef struct {
#if __BYTE_ORDER == __LITTLE_ENDIAN
    uint64_t lower, upper;
#elif __BYTE_ORDER == __BIG_ENDIAN
    uint64_t upper, lower;
#else
#error missing endian check
#endif
} __attribute__((aligned(_Alignof(_Float128)))) u128;

_Float128 __addkf3(_Float128, _Float128);

void f128_print(_Float128 val) {
    u128 ival = *((u128 *)(&val));
    printf("%#018" PRIx64 "%016" PRIx64 " %lf\n", ival.upper, ival.lower, (double)val);
}

_Float128 new_f128(uint64_t upper, uint64_t lower) {
    u128 val = { .lower = lower, .upper = upper };
    return *((_Float128 *)(&val));
}

_Float128 add_entry(_Float128 a, _Float128 b) {
#ifdef USE_ADDKF3
  return __addkf3(a, b);
#else
  return a + b;
#endif
}

int main() {
    _Float128 a = new_f128(0x0000000000000000, 0x0000000000000000);
    _Float128 b = new_f128(0x0000000000000000, 0x0000000000000001);
    f128_print(a);
    f128_print(b);
    _Float128 c = add_entry(a, b);
    f128_print(c);

    return 0;
}

tgross35 · 2024-05-12T22:58:51Z

I don't have any further insight on powerpc64, so I'll just disable testing against the system for now unless you have any suggestions @Amanieu. We're still testing against apfloat so our compiler-builtins is correct, but using the system symbols may cause precision issues for users. I think we can probably just address this more after everything merges.

This PR is probably best reviewed by-commit.

Amanieu · 2024-05-13T10:16:53Z

@lu-zero Maybe you have some insight into the powerpc issues?

tgross35 · 2024-05-14T07:28:20Z

Discussed some more at https://rust-lang.zulipchat.com/#narrow/stream/122651-general/topic/f128.20system.20libraries.20noncompliant.20platforms/near/438486364, @ecnelises is going to take a look at some point. This PR doesn't break anything new so I don't think it is blocked, opened rust-lang/rust#125109 to track the issue further.

lu-zero · 2024-05-14T07:31:42Z

@Amanieu I can build and try on real hardware. We can ask for real runners, I think.

tgross35 · 2024-05-14T07:39:31Z

@lu-zero could you try the code at #606 (comment) on powerpc64? I put more about how I am building at rust-lang/rust#125109

Less pressing but it would also be nice if you have a way to confirm that stxvd2x does not sigill on real powerpc-* targets, as in #606 (comment)

lu-zero · 2024-05-14T07:56:37Z

so this is what I'm getting.

$ cargo run
   Compiling test_f128 v0.1.0 (/home/lu_zero/test_f128)
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.23s
     Running `target/debug/test_f128`
[src/main.rs:12:5] a = 0x00000000000000000000000000000000
[src/main.rs:12:5] b = 0x00000000000000000000000000000001
[src/main.rs:14:5] c = 0x00000000000000000000000000000001

tgross35 · 2024-05-14T08:01:49Z

Well that is interesting. That is 64-bit, correct?

lu-zero · 2024-05-14T08:12:09Z

Yes, tested on power9, used as ppc64le

tgross35 · 2024-05-14T08:19:41Z

LE seems to work fine in qemu, the green test in CI isn't using the fallback. Only powerpc- and powerpc64- are causing the issues https://github.com/rust-lang/compiler-builtins/blob/3133d8f555580d93645eb763f94ec4cb59ac5880/testcrate/build.rs

lu-zero · 2024-05-14T08:25:36Z

Sadly less and less people care about BE and it is fairly annoying to set up a BE environment =/

tgross35 · 2024-05-14T08:33:57Z

Ah, well thanks for taking a look! I'll dig a little bit more but I think we are probably okay to not worry about it unless rust-lang/rust unit f128 unit tests have failures (we currently can't test much without this)

lu-zero · 2024-05-14T08:39:51Z

Either qemu gets fixed for BE or we'll need a BE runner.

tgross35 · 2024-05-14T08:50:27Z

I don't think the ppc64 BE is a qemu issue since the C version works fine. 32-bit hopefully should be just qemu.

Agreed that native runners wouldn't be a bad thing.

`MinInt` contains the basic methods that are only needed by integers involved in widening operations, i.e. big integers. `Int` retains all other operations and convenience methods.

Change float test macros to fall back to testing against `rustc_apfloat` when system implementations are not available, rather than just skipping tests. This allows for easier debugging where operations may not be supported.

PowerPC uses `kf` instead of `tf`: <https://gcc.gnu.org/wiki/Ieee128PowerPC>

Amanieu · 2024-05-16T14:08:04Z

This took a while to fully review, but LGTM!

tgross35 · 2024-05-16T17:47:05Z

Thanks for taking a look!

tgross35 force-pushed the f16-f128-intrinsics-min branch 7 times, most recently from 2e9bb45 to 7871c81 Compare May 11, 2024 05:43

tgross35 force-pushed the f16-f128-intrinsics-min branch 11 times, most recently from 5119b1a to 3133d8f Compare May 12, 2024 22:25

tgross35 marked this pull request as ready for review May 12, 2024 22:58

This was referenced May 13, 2024

Add missing functions for f16 and f128 #587

Closed

f128 symbols on powerpc64 give inaccurate results rust-lang/rust#125109

Open

This was referenced May 14, 2024

Tracking Issue for f16 and f128 float types rust-lang/rust#116909

Open

Add more intrinsics #611

Closed

tgross35 added 8 commits May 15, 2024 07:19

Split Int into Int and MinInt

9c6fcb5

`MinInt` contains the basic methods that are only needed by integers involved in widening operations, i.e. big integers. `Int` retains all other operations and convenience methods.

Add i256 and u256 bigint types

2868c26

Refactor float test macros to have a fallback

77faba1

Change float test macros to fall back to testing against `rustc_apfloat` when system implementations are not available, rather than just skipping tests. This allows for easier debugging where operations may not be supported.

Enable no-fail-fast for more usable test output

255c9f3

Implement f128 addition and subtraction

c8cc819

Implement f128 multiplication

58ad317

Implement f128 comparison

9bea196

Correct f128 extend and truncate symbol names on powerpc

6a847ab

PowerPC uses `kf` instead of `tf`: <https://gcc.gnu.org/wiki/Ieee128PowerPC>

tgross35 force-pushed the f16-f128-intrinsics-min branch from 3133d8f to 6a847ab Compare May 15, 2024 12:19

Amanieu merged commit 449643f into rust-lang:master May 16, 2024
23 checks passed

tgross35 deleted the f16-f128-intrinsics-min branch May 16, 2024 17:47

This was referenced May 17, 2024

Unable to find long double symbols on aarch64-apple-darwin #567

Closed

[PowerPC] SIGILL in PPCTargetLowering for powerpc-unknown-linux-gnu with pwr9 target feature llvm/llvm-project#92233

Closed

tgross35 mentioned this pull request Aug 26, 2024

release/19.x: [PowerPC] Respect endianness when bitcasting to fp128 (#95931) llvm/llvm-project#105623

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add addition, subtraction, multiplication, and compare operations for `f128` #606

Add addition, subtraction, multiplication, and compare operations for `f128` #606

tgross35 commented May 10, 2024 •

edited

Loading

tgross35 commented May 11, 2024 •

edited

Loading

tgross35 commented May 11, 2024 •

edited

Loading

tgross35 commented May 12, 2024 •

edited

Loading

tgross35 commented May 12, 2024

Amanieu commented May 13, 2024

tgross35 commented May 14, 2024

lu-zero commented May 14, 2024

tgross35 commented May 14, 2024 •

edited

Loading

lu-zero commented May 14, 2024

tgross35 commented May 14, 2024

lu-zero commented May 14, 2024 •

edited

Loading

tgross35 commented May 14, 2024

lu-zero commented May 14, 2024

tgross35 commented May 14, 2024

lu-zero commented May 14, 2024

tgross35 commented May 14, 2024

Amanieu commented May 16, 2024

tgross35 commented May 16, 2024

Add addition, subtraction, multiplication, and compare operations for f128 #606

Add addition, subtraction, multiplication, and compare operations for f128 #606

Conversation

tgross35 commented May 10, 2024 • edited Loading

tgross35 commented May 11, 2024 • edited Loading

tgross35 commented May 11, 2024 • edited Loading

tgross35 commented May 12, 2024 • edited Loading

tgross35 commented May 12, 2024

Amanieu commented May 13, 2024

tgross35 commented May 14, 2024

lu-zero commented May 14, 2024

tgross35 commented May 14, 2024 • edited Loading

lu-zero commented May 14, 2024

tgross35 commented May 14, 2024

lu-zero commented May 14, 2024 • edited Loading

tgross35 commented May 14, 2024

lu-zero commented May 14, 2024

tgross35 commented May 14, 2024

lu-zero commented May 14, 2024

tgross35 commented May 14, 2024

Amanieu commented May 16, 2024

tgross35 commented May 16, 2024

Add addition, subtraction, multiplication, and compare operations for `f128` #606

Add addition, subtraction, multiplication, and compare operations for `f128` #606

tgross35 commented May 10, 2024 •

edited

Loading

tgross35 commented May 11, 2024 •

edited

Loading

tgross35 commented May 11, 2024 •

edited

Loading

tgross35 commented May 12, 2024 •

edited

Loading

tgross35 commented May 14, 2024 •

edited

Loading

lu-zero commented May 14, 2024 •

edited

Loading