unsigned char is not uint1, truncations != C casts != promises #1606

andres-erbsen · 2023-05-23T04:37:42Z

Currently, variables for which range analysis determines the range [0, 1] are assigned the type uint1. That type is then printed to C, and perhaps to other languages, as a typedef for unsigned char. That's no good: casting to and adding in these two types produce different results.

We should stop printing uint1. If we really want to output code for which it is locally evident that a variable is in the range [0, 1], refencing it as x&1 (which is what we were also often doing up to recently) is less bad. We can probably do even better.

Our use of signed types also warrants some review.

The text was updated successfully, but these errors were encountered:

andres-erbsen · 2023-05-23T16:21:12Z

Re #1605 (comment):

We should probably parametrize the pass on a list of acceptable bitwidths to remove, then pass in [8; 16; 32; 64; 128] or something in BoundsPipeline

Yes, we could restrict the redundant-& optinization to uint8+, but could we remove uint1 altogether? Do you remember why/when we added it in the first place?

real-or-random · 2023-05-23T16:22:47Z

I'd recommend against using any type that has a rank less than the rank of int. (That is, don't use (unsigned) char and (unsigned) short. In practice, it also means don't use (u)int8_t and uint16_t ). These can be very nasty due to C's integer promotion rules, which say that before any arithmetic operation, any type A of rank less than int will first be promoted to either int or unsigned int, depending on whether all values of A fit in int or in unsigned int.

Want a nasty example? The following code has UB for some inputs due to signed (!) overflow:

uint16_t mul(uint16_t a, uint16_t b) {
    return a*b;
}

See here for more background: https://stackoverflow.com/questions/73600275/c-undefined-behavior-when-multiplying-uint16-t
A standard technique to avoid against these problems is to start multiplications with 1U * ... This will make sure that arithmetic only ever happens on unsigned types.

Our use of signed types also warrants some review.

Happy to look at some generated code here.
I think unsigned overflow can also happen with left shifts: 1 << 31 should also be UB on a platform where int has 32 bits (use 1U << 31 instead), but multiplication and left shift should be the only operations that can be problematic (unless you're on a non-existent platform where short is just one bit shorter than int, so overflow could happen with additions too).

(Moreover, small types may be slower.)

samoht · 2023-05-23T21:04:45Z

Hello,

I am not sure it is related, but we also just noticed some wrong behaviour of the inverse function when upgrading from clang 14.0.0 to 14.0.3 on macOS. I haven't managed to create a short repro yet, but I have this which use mirage-crypto: https://gist.github.com/samoht/7c27b6418148ee04590e571f390f1997

mc_p256_inv is implemented by https://github.com/mirage/mirage-crypto/blob/main/ec/native/p256_stubs.c#L94-L99 which calls https://github.com/mirage/mirage-crypto/blob/main/ec/native/inversion_template.h (which is almost a raw import of the extracted C code, with a small patch on top where the "inversion" function is declared "static", and the convenience function "inversion" is provided)

JasonGross · 2023-05-24T01:34:24Z

Yes, we could restrict the redundant-& optinization to uint8+, but could we remove uint1 altogether? Do you remember why/when we added it in the first place?

I think we need it because we want code that relaxes uint8 to uint32, but does not relax carries to uint32? Plausibly we should support for non-idempotent bounds relaxation functions that turn uint1 into uint8 but turn larger things into uint32? I could be misremembering though. (It's also possible we need it for bounds analysis, or that the rewrite rules that deal with carries expect casts to be uint1?)

andres-erbsen · 2023-05-26T01:19:51Z

@samoht take a look at #1605 -- if you got fiat-crypto code in the range of bad commits described there, it's reasonably likely the same issue. If not, I'm not sure.

hannesm · 2023-05-26T08:20:45Z

@samoht take a look at #1605 -- if you got fiat-crypto code in the range of bad commits described there, it's reasonably likely the same issue. If not, I'm not sure.

unfortunately, the error on some macOS machines / versions seem to be present with a commit after the revert (i.e. with c79cf60)

andres-erbsen · 2023-05-26T11:13:50Z

I took a look at divstep_precomp in https://github.com/mirage/mirage-crypto/blob/main/ec/native/np256_64.h#L1227 and it looks like that function has no arithmetic on uint1 or any other small integer type, and all casts to uint1 have &1 on them.

Lacking a specific guess for the issue, I figure we can perhaps just track down the change.
Does the issue show up if you prevent inlining of the functions called by the divstep template, and does it show up when you further prevent inlining of all functions in fiat-crypto-generated code? Can macos clang cross-compile for linux, and does the issue show up then? Can you make a good binary and a bad binary and diff the generated assembly code, or send it to me so I can?

samoht · 2023-05-30T08:45:26Z

@andres-erbsen I've added an ok (generated with clang 14.0.0) and a ko (compiled with clang 14.0.3) binary to the gist: https://gist.github.com/samoht/7c27b6418148ee04590e571f390f1997 -- let me know if you want to track this in a separate issue.

andres-erbsen · 2023-05-30T10:43:32Z

~~Thank you. It appears that any objdump in Arch repos doesn't grok mac executables, so if you could run objdump -M intel -d repro-ko.exe > repro-ko.s or such for both executables that'd be great.~~ Update: radare2 works.

samoht · 2023-05-30T11:36:56Z

removing noise

#include <stdint.h>
#include <stdio.h>

int main() {
  uint64_t x=UINT64_C(0xffffffffffffffff);
  printf("x=%llu\n", x);
}

is generating a different result on GCC/clang 14.0.0 vs. clang 14.0.3

CLANG-14.0.3
x=18446744073709551615
GCC / CLANG-14.0.0
x=18446744073709551615

And this value is used by fiat_p256_msat.

real-or-random · 2023-05-30T11:50:48Z

CLANG-14.0.3
x=18446744073709551615
GCC / CLANG-14.0.0
x=18446744073709551615

These values are the same.

samoht · 2023-05-30T11:59:25Z

Ok, I'm now very confused - it used to by x=-1 for clan 14.0.3, but now I cannot reproduce when scripting this (I copied/pasted the output of my scripts without reading the results...). I'll try to understand what is going on... apologies for the noise.

real-or-random · 2023-05-30T12:15:51Z

It probably doesn't matter for that issue, but %lld or %ld should give you -1, because d interprets (the unsigned value) as signed.

andres-erbsen · 2023-05-30T12:20:13Z

Ok, I looked into this for a bit. The diff is rather large so it's hard to say anything conclusive. However, it's rather striking that carry propagation (adcs with xzr) only appears prominently on the "ok" side. I also appears in the source code here. It's possible that whatever the ko side does is very clever and correct, but it wouldn't be the first time fiat-crypto code runs into a bug in exactly this scenario with a mainstream compiler. 🤔

samoht · 2023-05-30T16:12:47Z

ok more news - -O0 fixes the issue on clang 14.0.3.

andres-erbsen · 2023-05-30T19:05:52Z

On gcc I debugged a similar issue by listing the compilation passes and disabling them one-by-one. Clang docs seem to say they have a tool that does that https://llvm.org/docs/HowToSubmitABug.html#miscompilations and the other suggestions there also sound good.

Coincidentally, I recalled that we at least have a C-language test template at
https://github.com/search?q=repo%3Amit-plv%2Ffiat-crypto%20inversion_test_template&type=code so if that reproduces the issue it might be a good alternative to trying to bughunt through ocaml.

samoht · 2023-05-31T10:59:43Z

Bisecting the compilation passes with opt-bisect-limit points to this pass:

BISECT: running pass (303) InstCombinePass on fiat_p256_divstep

Also I've added a p256_64 test using the C-language test template, but this always seems to pass.

andres-erbsen · 2023-05-31T12:17:21Z

Great find! I figure it should be possible to make a test case in C still, but it'd probably require replicating the compilation-unit structure of the ocaml example. Perhaps the most sure we can get is to have the same object file for divstep and link it against an ocaml test harness and a C test harness, confirming with gdb that the same arguments get passed.

andres-erbsen · 2023-05-31T12:27:31Z

I looked through llvm/llvm-project@llvmorg-14.0.0...llvmorg-14.0.3 and the two commits that mention InstCombine are llvm/llvm-project@43ee392 llvm/llvm-project@9a3e81e but sticks out to me right now. But even if InstCombine is buggy pass in the code, the bug may have been already present and only triggered by a 14.0.3 change.

samoht · 2023-05-31T17:23:45Z

I managed to reproduce using C only, by tracing what values the OCaml code was sending to the C side.

#define LIMBS 4
#define WORD uint64_t
#define WORDSIZE 64
#define LEN_PRIME 256
#define CURVE_DESCRIPTION fiat_p256

#include "p256_64.h"
#include "inversion_template.h"
#include <stdio.h>
#include <string.h>
#include <assert.h>

int main (void) {
  uint64_t init[5];
  uint64_t zero[4];
  uint64_t result[4];
  int i;

  init[0] = 3164219169453123020;
  init[1] = 5610335456920920796;
  init[2] = 4531946527682827593;
  init[3] = UINT64_C(17678136164097313971);
  init[4] = 0;
  zero[0] = 0;
  zero[1] = 0;
  zero[2] = 0;
  zero[3] = 0;
  result[0] = 5547023629237264145;
  result[1] = 6634760001876509537;
  result[2] = 2568289811407991911;
  result[3] = UINT64_C(9585681005862132995);

  inverse(zero, init);

  for (i = 0; i < 4; i++) {
    if (zero[i] != result[i]) {
      printf("FAIL zero[%d]=%llu result[%d]=%llu\n", i, zero[i], i, result[i]);
      return 1;
    }
  }
  printf("PASS\n");
  return 0;

}

I've pushed to the gist two updated binaries (which are now much smaller!)

andres-erbsen · 2023-05-31T20:39:40Z

Ok that looks like we almost have it. If I had a machine that the bug reproduced on my approach at this point would be to aggressively inline and minimize the example that tells the difference between -O0 and -O1. I do it manually (while : inotifywait compile && test && bell...) but gcc has recommendations here.

andres-erbsen · 2023-06-01T11:45:25Z

Out of the two binaries you posted, the -ok one has ubsan clutter all over it so I won't spend time comparing them now. Minimizing is likely a more productive avenue anyway, because if we need to file a bug report we should post an example. Edit: if we do need to look at binaries, they should be as similar as you can make them otherwise.

samoht · 2023-06-02T17:02:55Z

I've pushed a smaller repro with less clutter. It is still too big?

andres-erbsen · 2023-06-02T19:37:58Z

The asm for the two sides looks very different (much more different than at first) due to different optimizations being applied. For example, the -ok file still has addcarryx and cmovznz as function calls instead of inlined.

samoht · 2023-06-03T10:40:45Z

Apologies for the painful process. I still haven't managed to find enough time to inline all of this properly and I do realise it doesn't help you very much.

But it's now clear something is fishing with inscombine:

clang -O1 -mllvm --instcombine-max-iterations=0 main.c # ok
clang -O1 -mllvm --instcombine-max-iterations=1 main.c # ko

Unfortunately, this iteration still leads to very different assembly outputs. I'll try to dig into this more later this weekend and manually inline functions.

real-or-random · 2023-06-03T11:47:08Z

Perhaps it's better to report to LLVM already now, even with a somewhat large source file. I can imagine that they have more experience and/or good suggestions on how to narrow down the issue further.

andres-erbsen · 2023-06-03T12:36:56Z

Yep, I think it's a fine time to report. Hopefully breaking as late as instcombine will dodge the question of why we believe a program so large to be devoid of undefined behavior.

andres-erbsen · 2023-06-03T14:17:28Z

The diff is actually nice and not-too-large again, but I still can't figure out what is going on 🤣 . The most glaring change seems to be the following. In particular, the addcarry of the highest limb of p256 isn't doing the obvious thing on either side:

https://andres.systems/tmp/rrddiff0.html

samoht · 2023-06-03T14:32:25Z

I have updated to clang version 16.0.4 (the one shipped with homebrew and not Xcode) and the bug disappeared - not totally sure what to do now. I have pushed another version of the binaries compiled with the same optimisation flags but with 2 versions of the compiler..

andres-erbsen · 2023-06-03T16:27:31Z

Ok, I looked at the new ok as well and here's a hypothesis. Looking at the last adcs in the image I posted, we can see on the left that it computes x, carry = limb + 0 + carry and then x = x-y whereas on the right we have x, carry = limb + (-y) + carry. The resulting x is the same, but the resulting carry could be different. In particular, in both -ok versions carry is 0 since limb < 2^64-1. In the -ko, carry could be nonzero depending on y. Below, these carry bits are fed into similar addition chains that I do not understand.

JasonGross · 2023-06-03T16:37:04Z

How far is the asm equivalence checker from being able to ingest these compilations and report on differences? (And would it be useful?)

andres-erbsen · 2023-06-03T16:39:16Z

This is AArch64 so quite far.

andres-erbsen · 2023-06-06T11:52:38Z

@samoht Would you be up to filing a report for this? Even though the issue has stopped appearing with latest compiler, it'd be good to make sure that the bad rule was actually fixed or removed instead of just being no longer triggered due to other optimizations changing. (Or at the very least, there should be a regression test in their repo.)

andres-erbsen · 2023-07-10T13:00:19Z

https://patchew.org/QEMU/[email protected]/ has been brought to my attention wrt the clang14 issue. I have not dug deep enough to confirm it is the same root cause, but the version matches. Just FYI, the linked patch says "There is not currently a version of Apple Clang which has the bug fix".

@TheLortex

…age, mirage-crypto-rng-lwt, mirage-crypto-rng-eio, mirage-crypto-rng-async, mirage-crypto-pk and mirage-crypto-ec (0.11.2) CHANGES: * mirage-crypto-rng-eio: improve portability by using eio 0.7's monotonic clock interface instead of mtime.clock.os. (mirage/mirage-crypto#176 @TheLortex) * mirage-crypto-rng-eio: update to eio 0.12 (mirage/mirage-crypto#182 @talex5) * mirage-crypto-rng: fix typo in RNG setup (mirage/mirage-crypto#179 @samueldurantes) * macOS: on arm64 with clang 14.0.3, avoid instcombine (due to miscompilations) reported by @samoht mit-plv/fiat-crypto#1606 (comment) re-reported in ulrikstrid/ocaml-jose#63 and mirleft/ocaml-tls#478 (mirage/mirage-crypto#185 @hannesm @kit-ty-kate) * avoid "stringop-overflow" warning on PPC64 and S390x (spurious warnings) when in devel mode (mirage/mirage-crypto#178 mirage/mirage-crypto#184 @avsm @hannesm) * stricter C prototypes, unsigned/signed integers (mirage/mirage-crypto#175 @MisterDA @haesbaert @avsm @hannesm) * support DragonFlyBSD (mirage/mirage-crypto#181 @movepointsolutions) * support GNU/Hurd (mirage/mirage-crypto#174 @pinotree)

hannesm · 2023-09-21T12:51:45Z

To update on the issue described within this issue (on macOS/arm64 with clang 14.0.3, there is a miscompilation - starting at #1606 (comment)). Since it is unclear when Apple will update their compiler toolchain, we shipped "mirage-crypto" with using "-mllvm --instcombine-max-iterations=0" on that specific platform and C compiler. Thanks a lot for the investigations and remarks in here.

@TheLortex

…age, mirage-crypto-rng-lwt, mirage-crypto-rng-eio, mirage-crypto-rng-async, mirage-crypto-pk and mirage-crypto-ec (0.11.2) CHANGES: * mirage-crypto-rng-eio: improve portability by using eio 0.7's monotonic clock interface instead of mtime.clock.os. (mirage/mirage-crypto#176 @TheLortex) * mirage-crypto-rng-eio: update to eio 0.12 (mirage/mirage-crypto#182 @talex5) * mirage-crypto-rng: fix typo in RNG setup (mirage/mirage-crypto#179 @samueldurantes) * macOS: on arm64 with clang 14.0.3, avoid instcombine (due to miscompilations) reported by @samoht mit-plv/fiat-crypto#1606 (comment) re-reported in ulrikstrid/ocaml-jose#63 and mirleft/ocaml-tls#478 (mirage/mirage-crypto#185 @hannesm @kit-ty-kate) * avoid "stringop-overflow" warning on PPC64 and S390x (spurious warnings) when in devel mode (mirage/mirage-crypto#178 mirage/mirage-crypto#184 @avsm @hannesm) * stricter C prototypes, unsigned/signed integers (mirage/mirage-crypto#175 @MisterDA @haesbaert @avsm @hannesm) * support DragonFlyBSD (mirage/mirage-crypto#181 @movepointsolutions) * support GNU/Hurd (mirage/mirage-crypto#174 @pinotree)

andres-erbsen mentioned this issue May 23, 2023

inversion / divstep code broken? #1605

Closed

hannesm mentioned this issue Jun 13, 2023

[new release] dnssec, dns, dns-tsig, dns-stub, dns-server, dns-resolver, dns-mirage, dns-client, dns-client-mirage, dns-client-lwt, dns-cli and dns-certify (7.0.2) ocaml/opam-repository#23915

Merged

hannesm mentioned this issue Jul 3, 2023

[new release] x509 (0.16.5) ocaml/opam-repository#24045

Merged

samoht mentioned this issue Jul 26, 2023

Update fiat mirage/mirage-crypto#177

Closed

ghost mentioned this issue Jul 26, 2023

Es256_priv Jwk always fails validate_signature with Invalid_signature ulrikstrid/ocaml-jose#63

Closed

unsigned char is not uint1, truncations != C casts != promises #1606

unsigned char is not uint1, truncations != C casts != promises #1606

Comments

andres-erbsen commented May 23, 2023

andres-erbsen commented May 23, 2023 • edited Loading

real-or-random commented May 23, 2023 • edited Loading

samoht commented May 23, 2023 • edited Loading

JasonGross commented May 24, 2023

andres-erbsen commented May 26, 2023

hannesm commented May 26, 2023

andres-erbsen commented May 26, 2023 • edited Loading

samoht commented May 30, 2023 • edited Loading

andres-erbsen commented May 30, 2023 • edited Loading

samoht commented May 30, 2023 • edited Loading

real-or-random commented May 30, 2023

samoht commented May 30, 2023 • edited Loading

real-or-random commented May 30, 2023

andres-erbsen commented May 30, 2023 • edited Loading

samoht commented May 30, 2023

andres-erbsen commented May 30, 2023

samoht commented May 31, 2023

andres-erbsen commented May 31, 2023

andres-erbsen commented May 31, 2023

samoht commented May 31, 2023 • edited Loading

andres-erbsen commented May 31, 2023

andres-erbsen commented Jun 1, 2023 • edited Loading

samoht commented Jun 2, 2023

andres-erbsen commented Jun 2, 2023

samoht commented Jun 3, 2023 • edited Loading

real-or-random commented Jun 3, 2023

andres-erbsen commented Jun 3, 2023

andres-erbsen commented Jun 3, 2023 • edited Loading

samoht commented Jun 3, 2023 • edited Loading

andres-erbsen commented Jun 3, 2023 • edited Loading

JasonGross commented Jun 3, 2023

andres-erbsen commented Jun 3, 2023

andres-erbsen commented Jun 6, 2023

andres-erbsen commented Jul 10, 2023

hannesm commented Sep 21, 2023

andres-erbsen commented May 23, 2023 •

edited

Loading

real-or-random commented May 23, 2023 •

edited

Loading

samoht commented May 23, 2023 •

edited

Loading

andres-erbsen commented May 26, 2023 •

edited

Loading

samoht commented May 30, 2023 •

edited

Loading

andres-erbsen commented May 30, 2023 •

edited

Loading

samoht commented May 30, 2023 •

edited

Loading

samoht commented May 30, 2023 •

edited

Loading

andres-erbsen commented May 30, 2023 •

edited

Loading

samoht commented May 31, 2023 •

edited

Loading

andres-erbsen commented Jun 1, 2023 •

edited

Loading

samoht commented Jun 3, 2023 •

edited

Loading

andres-erbsen commented Jun 3, 2023 •

edited

Loading

samoht commented Jun 3, 2023 •

edited

Loading

andres-erbsen commented Jun 3, 2023 •

edited

Loading