-
Notifications
You must be signed in to change notification settings - Fork 147
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
unsigned char is not uint1, truncations != C casts != promises #1606
Comments
Re #1605 (comment):
Yes, we could restrict the redundant-& optinization to uint8+, but could we remove uint1 altogether? Do you remember why/when we added it in the first place? |
I'd recommend against using any type that has a rank less than the rank of Want a nasty example? The following code has UB for some inputs due to signed (!) overflow:
See here for more background: https://stackoverflow.com/questions/73600275/c-undefined-behavior-when-multiplying-uint16-t
Happy to look at some generated code here. (Moreover, small types may be slower.) |
Hello, I am not sure it is related, but we also just noticed some wrong behaviour of the
|
I think we need it because we want code that relaxes uint8 to uint32, but does not relax carries to uint32? Plausibly we should support for non-idempotent bounds relaxation functions that turn uint1 into uint8 but turn larger things into uint32? I could be misremembering though. (It's also possible we need it for bounds analysis, or that the rewrite rules that deal with carries expect casts to be uint1?) |
I took a look at Lacking a specific guess for the issue, I figure we can perhaps just track down the change. |
@andres-erbsen I've added an ok (generated with clang 14.0.0) and a ko (compiled with clang 14.0.3) binary to the gist: https://gist.github.com/samoht/7c27b6418148ee04590e571f390f1997 -- let me know if you want to track this in a separate issue. |
|
removing noise
is generating a different result on GCC/clang 14.0.0 vs. clang 14.0.3
And this value is used by |
These values are the same. |
Ok, I'm now very confused - it used to by |
It probably doesn't matter for that issue, but |
Ok, I looked into this for a bit. The diff is rather large so it's hard to say anything conclusive. However, it's rather striking that carry propagation (adcs with xzr) only appears prominently on the "ok" side. I also appears in the source code here. It's possible that whatever the ko side does is very clever and correct, but it wouldn't be the first time fiat-crypto code runs into a bug in exactly this scenario with a mainstream compiler. 🤔 |
ok more news - |
On gcc I debugged a similar issue by listing the compilation passes and disabling them one-by-one. Clang docs seem to say they have a tool that does that https://llvm.org/docs/HowToSubmitABug.html#miscompilations and the other suggestions there also sound good. Coincidentally, I recalled that we at least have a C-language test template at |
Bisecting the compilation passes with
Also I've added a p256_64 test using the C-language test template, but this always seems to pass. |
Great find! I figure it should be possible to make a test case in C still, but it'd probably require replicating the compilation-unit structure of the ocaml example. Perhaps the most sure we can get is to have the same object file for divstep and link it against an ocaml test harness and a C test harness, confirming with gdb that the same arguments get passed. |
I looked through llvm/llvm-project@llvmorg-14.0.0...llvmorg-14.0.3 and the two commits that mention |
I managed to reproduce using C only, by tracing what values the OCaml code was sending to the C side. #define LIMBS 4
#define WORD uint64_t
#define WORDSIZE 64
#define LEN_PRIME 256
#define CURVE_DESCRIPTION fiat_p256
#include "p256_64.h"
#include "inversion_template.h"
#include <stdio.h>
#include <string.h>
#include <assert.h>
int main (void) {
uint64_t init[5];
uint64_t zero[4];
uint64_t result[4];
int i;
init[0] = 3164219169453123020;
init[1] = 5610335456920920796;
init[2] = 4531946527682827593;
init[3] = UINT64_C(17678136164097313971);
init[4] = 0;
zero[0] = 0;
zero[1] = 0;
zero[2] = 0;
zero[3] = 0;
result[0] = 5547023629237264145;
result[1] = 6634760001876509537;
result[2] = 2568289811407991911;
result[3] = UINT64_C(9585681005862132995);
inverse(zero, init);
for (i = 0; i < 4; i++) {
if (zero[i] != result[i]) {
printf("FAIL zero[%d]=%llu result[%d]=%llu\n", i, zero[i], i, result[i]);
return 1;
}
}
printf("PASS\n");
return 0;
} I've pushed to the gist two updated binaries (which are now much smaller!) |
Ok that looks like we almost have it. If I had a machine that the bug reproduced on my approach at this point would be to aggressively inline and minimize the example that tells the difference between -O0 and -O1. I do it manually (while : inotifywait compile && test && bell...) but gcc has recommendations here. |
Out of the two binaries you posted, the -ok one has ubsan clutter all over it so I won't spend time comparing them now. Minimizing is likely a more productive avenue anyway, because if we need to file a bug report we should post an example. Edit: if we do need to look at binaries, they should be as similar as you can make them otherwise. |
I've pushed a smaller repro with less clutter. It is still too big? |
The asm for the two sides looks very different (much more different than at first) due to different optimizations being applied. For example, the -ok file still has addcarryx and cmovznz as function calls instead of inlined. |
Apologies for the painful process. I still haven't managed to find enough time to inline all of this properly and I do realise it doesn't help you very much. But it's now clear something is fishing with
Unfortunately, this iteration still leads to very different assembly outputs. I'll try to dig into this more later this weekend and manually inline functions. |
Perhaps it's better to report to LLVM already now, even with a somewhat large source file. I can imagine that they have more experience and/or good suggestions on how to narrow down the issue further. |
Yep, I think it's a fine time to report. Hopefully breaking as late as instcombine will dodge the question of why we believe a program so large to be devoid of undefined behavior. |
I have updated to clang version 16.0.4 (the one shipped with homebrew and not Xcode) and the bug disappeared - not totally sure what to do now. I have pushed another version of the binaries compiled with the same optimisation flags but with 2 versions of the compiler.. |
Ok, I looked at the new ok as well and here's a hypothesis. Looking at the last |
How far is the asm equivalence checker from being able to ingest these compilations and report on differences? (And would it be useful?) |
This is AArch64 so quite far. |
@samoht Would you be up to filing a report for this? Even though the issue has stopped appearing with latest compiler, it'd be good to make sure that the bad rule was actually fixed or removed instead of just being no longer triggered due to other optimizations changing. (Or at the very least, there should be a regression test in their repo.) |
https://patchew.org/QEMU/[email protected]/ has been brought to my attention wrt the clang14 issue. I have not dug deep enough to confirm it is the same root cause, but the version matches. Just FYI, the linked patch says "There is not currently a version of Apple Clang which has the bug fix". |
…age, mirage-crypto-rng-lwt, mirage-crypto-rng-eio, mirage-crypto-rng-async, mirage-crypto-pk and mirage-crypto-ec (0.11.2) CHANGES: * mirage-crypto-rng-eio: improve portability by using eio 0.7's monotonic clock interface instead of mtime.clock.os. (mirage/mirage-crypto#176 @TheLortex) * mirage-crypto-rng-eio: update to eio 0.12 (mirage/mirage-crypto#182 @talex5) * mirage-crypto-rng: fix typo in RNG setup (mirage/mirage-crypto#179 @samueldurantes) * macOS: on arm64 with clang 14.0.3, avoid instcombine (due to miscompilations) reported by @samoht mit-plv/fiat-crypto#1606 (comment) re-reported in ulrikstrid/ocaml-jose#63 and mirleft/ocaml-tls#478 (mirage/mirage-crypto#185 @hannesm @kit-ty-kate) * avoid "stringop-overflow" warning on PPC64 and S390x (spurious warnings) when in devel mode (mirage/mirage-crypto#178 mirage/mirage-crypto#184 @avsm @hannesm) * stricter C prototypes, unsigned/signed integers (mirage/mirage-crypto#175 @MisterDA @haesbaert @avsm @hannesm) * support DragonFlyBSD (mirage/mirage-crypto#181 @movepointsolutions) * support GNU/Hurd (mirage/mirage-crypto#174 @pinotree)
To update on the issue described within this issue (on macOS/arm64 with clang 14.0.3, there is a miscompilation - starting at #1606 (comment)). Since it is unclear when Apple will update their compiler toolchain, we shipped "mirage-crypto" with using "-mllvm --instcombine-max-iterations=0" on that specific platform and C compiler. Thanks a lot for the investigations and remarks in here. |
…age, mirage-crypto-rng-lwt, mirage-crypto-rng-eio, mirage-crypto-rng-async, mirage-crypto-pk and mirage-crypto-ec (0.11.2) CHANGES: * mirage-crypto-rng-eio: improve portability by using eio 0.7's monotonic clock interface instead of mtime.clock.os. (mirage/mirage-crypto#176 @TheLortex) * mirage-crypto-rng-eio: update to eio 0.12 (mirage/mirage-crypto#182 @talex5) * mirage-crypto-rng: fix typo in RNG setup (mirage/mirage-crypto#179 @samueldurantes) * macOS: on arm64 with clang 14.0.3, avoid instcombine (due to miscompilations) reported by @samoht mit-plv/fiat-crypto#1606 (comment) re-reported in ulrikstrid/ocaml-jose#63 and mirleft/ocaml-tls#478 (mirage/mirage-crypto#185 @hannesm @kit-ty-kate) * avoid "stringop-overflow" warning on PPC64 and S390x (spurious warnings) when in devel mode (mirage/mirage-crypto#178 mirage/mirage-crypto#184 @avsm @hannesm) * stricter C prototypes, unsigned/signed integers (mirage/mirage-crypto#175 @MisterDA @haesbaert @avsm @hannesm) * support DragonFlyBSD (mirage/mirage-crypto#181 @movepointsolutions) * support GNU/Hurd (mirage/mirage-crypto#174 @pinotree)
From #1605
Currently, variables for which range analysis determines the range [0, 1] are assigned the type uint1. That type is then printed to C, and perhaps to other languages, as a typedef for
unsigned char
. That's no good: casting to and adding in these two types produce different results.We should stop printing uint1. If we really want to output code for which it is locally evident that a variable is in the range [0, 1], refencing it as
x&1
(which is what we were also often doing up to recently) is less bad. We can probably do even better.Our use of signed types also warrants some review.
The text was updated successfully, but these errors were encountered: