[BACKEND] Update LLVM version to llvm/llvm-project@9ddfe62#4338
[BACKEND] Update LLVM version to llvm/llvm-project@9ddfe62#4338ThomasRaoux wants to merge 5 commits intomainfrom
Conversation
|
@ThomasRaoux we can continue the discussion from #4212 here. I'm trying to figure out at which point in our internally-known good LLVM commits the problem started happening. So I rebased the I can also try to get the latest LLVM version we have internally in the hopes that the issue got fixed in the meantime (will do that once all the builds are done). In the meantime, I found where OpenXLA (which uses Triton as a dependecy) updated to this exact LLVM version that we know is causing problems in Triton, trying to see if we applied any patches on our side to make it work: openxla/xla@2b7f566 |
Wow this is awesome, thanks a lot! I tried to debug it a bit locally, I realized it happens when I build locally on linux but on my mac when I build locally all the lit tests pass. So it may be some undefined behavior or something that shows up only on some build config :( Hopefully narrowing down the LLVM hash will help figure it out. |
|
@gflegar sorry for jumping between different PRs for discussion :) So if I understand correctly this version fails and the version in between wouldn't fail llvm build on ARM? |
No worries, it's a bit complicated with all the different PRs :) This version passes LLVM builds, but fails on tests. Also, I created #4367, which is currently basically a copy of this PR, but from a separate, locked branch, since the contents of this PR would change any time we try to do something on the Also @reichlfl FYI take a look at discussions here, since you're our next Triton integrator at Google. The TL;DR is that we had CI issues for the last few weeks upstream, and now are trying to catch up, but the latest version of LLVM is causing test failures upstream, but not internally at Google. We're trying to root-cause it. You should still do the integration as normal, and create a PR for the latest LLVM we have internally, so we can check if the issue has maybe already been resolved upstream. |
|
Alright, it seems that #4366 is even more broken - plenty of tests failing with different errors, one among those is the Let's first see if all of this is fixed in the latest version of LLVM we'll roll out this week and save ourselves some time if it is :)
My pleasure! Interesting, maybe it is UB. FWIW we do run Triton's lit tests internally through undefined behavior sanitizer, and that has not detected any issues with this LLVM version (but we do probably run a slightly different build configuration, since we don't use Triton's CMake build system and roll our own Bazel-like thing instead). |
7e2f45b to
1f110e2
Compare
|
The build starts to fail with "the << UNKNOWN SSA VALUE >> error" after this commit: llvm/llvm-project@ce80c80. It's quite a simple commit, maybe you have a better idea why using a non-deterministic seed could be causing the issue for you. |
my first guess is that this header is included in both triton and LLVM but the preprocessor takes a different path on each of those therefore there would be a mismatch in the seeds based on whether the function is built with Triton or with LLVM. |
|
That would explain why we're not seeing it internally. We build both LLVM and Triton as one big package, rather than separate binaries, so we wouldn't have this mismatch. |
Update LLVM version to llvm/llvm-project@dd7d81e
Included the use of the non-deprecated version of createMCObjectStreamer (needed after llvm/llvm-project@f1422a8).
559090b to
61665fd
Compare
|
BTW, looking at the cmake file of llvm it seems like it will override the pre-processor and the right way to set this cmake option: here is the cmake logic: |
695ffec to
9267087
Compare
Yes, thanks for pointing it out. Your suggestion works and tests passes when setting the flag. I did not have a lot of time this week but I will be looking next week on how to properly fix it. |
Awesome! Thanks |
|
Say my llvm-project build is at
If there are Triton failures when the LLVM_ABI_BREAKING_CHECKS==1 code path is used, you might want to check code/tests overly depending on the determinism of llvm/include/llvm/ADT/Hashing.h, which is not guaranteed. |
This PR is setting DLLVM_ABI_BREAKING_CHECKS=FORCE_OFF, similar of how we do it inside Google to be able to update llvm version in Triton. The culprit llvm commit is llvm/llvm-project@ce80c80, that basically uses a non-deterministic seed if the previous "Flag" is set. There were some discussions in triton-lang#4338, and the other option would be to check/replace code and tests that are overly depending on the determinism of llvm/include/llvm/ADT/Hashing.h (DenseMap and DenseSet mostly). I think that the priority is first to be able to update the llvm version, but if OpenAI is interested I can work on replacing them (for MapVector), to make the unit tests to pass. Let me know.
This PR is setting DLLVM_ABI_BREAKING_CHECKS=FORCE_OFF, similar of how we do it inside Google to be able to update llvm version in Triton. The culprit llvm commit is llvm/llvm-project@ce80c80, that basically uses a non-deterministic seed if the previous "Flag" is set. There were some discussions in triton-lang#4338, and the other option would be to check/replace code and tests that are overly depending on the determinism of llvm/include/llvm/ADT/Hashing.h (DenseMap and DenseSet mostly). I think that the priority is first to be able to update the llvm version, but if OpenAI is interested I can work on replacing them (for MapVector), to make the unit tests to pass. Let me know.
…version This PR is setting LLVM_ABI_BREAKING_CHECKS=FORCE_OFF, similar of how we do it inside Google. In this way, we are forcing the structures that relies in Hashing.h to use a deterministic seed. We were not able to update the llvm version. The culprit llvm commit is llvm/llvm-project@ce80c80, that basically uses a non-deterministic seed if the previous "Flag" is set. There were some discussions in triton-lang#4338, and the other option would be to check/replace code and tests that are overly depending on the determinism of llvm/include/llvm/ADT/Hashing.h (DenseMap and DenseSet mostly). I think that the priority is first to be able to update the llvm version, but if OpenAI is interested I could work on replacing them (for MapVector), in order to fix all the failing unit tests, and then we can set the LLVM_ABI_BREAKING_CHECKS again to "WITH_ASSERTS".
…version This PR is setting LLVM_ABI_BREAKING_CHECKS=FORCE_OFF, similar of how we do it inside Google. In this way, we are forcing the structures that relies in Hashing.h to use a deterministic seed. We were not able to update the llvm version. The culprit llvm commit is llvm/llvm-project@ce80c80, that basically uses a non-deterministic seed if the previous "Flag" is set. There were some discussions in triton-lang#4338, and the other option would be to check/replace code and tests that are overly depending on the determinism of llvm/include/llvm/ADT/Hashing.h (DenseMap and DenseSet mostly). I think that the priority is first to be able to update the llvm version, but if OpenAI is interested I could work on replacing them (for MapVector), in order to fix all the failing unit tests, and then we can set the LLVM_ABI_BREAKING_CHECKS again to "WITH_ASSERTS".
|
Created #4512 to set LLVM_ABI_BREAKING_CHECKS=FORCE_OFF as it was suggested here. It's the same we are doing inside Google and I think that being able to update llvm is a priority. If @ThomasRaoux is interested, in parallel we can replace some DenseMap (and DenseSet) by VectorMap (and VectorSet) to fix the failing unit tests. These structures come with performance overhead, so let me know if you are interested on doing that refactor. Alternative we can investigate what are the issues in the current uses (of DenseMap and DenseSet). |
As I mentioned on the other PR I don't understand what cases should be changed and why |
|
@karupayun which PR do you want to land for the LLVM upgrade? |
…version (#4512) This PR is setting LLVM_ABI_BREAKING_CHECKS=FORCE_OFF, similar of how we do it inside Google. In this way, we are forcing the structures that relies in Hashing.h to use a deterministic seed. We were not able to update the llvm version. The culprit llvm commit is llvm/llvm-project@ce80c80, that basically uses a non-deterministic seed if the previous "Flag" is set. There were some discussions in #4338, and the other option would be to check/replace code and tests that are overly depending on the determinism of llvm/include/llvm/ADT/Hashing.h (DenseMap and DenseSet mostly). I think that the priority is first to be able to update the llvm version, but if OpenAI is interested I could work on replacing them (for MapVector), in order to fix all the failing unit tests, and then we can set the LLVM_ABI_BREAKING_CHECKS again to "WITH_ASSERTS". - [x] I am not making a trivial change, such as fixing a typo in a comment. - [x] I have written a PR description following these [rules](https://cbea.ms/git-commit/#why-not-how). - [x] I have run `pre-commit run --from-ref origin/main --to-ref HEAD`. - [x] This PR does not need a test because it's just setting a variable for LLVM. - [x] I have not added any `lit` tests. Co-authored-by: Tori Baker <vwbaker@google.com>
|
Closing this, we can start a fresh LLVM upgrade now that the problem is fixed. |
…version (triton-lang#4512) This PR is setting LLVM_ABI_BREAKING_CHECKS=FORCE_OFF, similar of how we do it inside Google. In this way, we are forcing the structures that relies in Hashing.h to use a deterministic seed. We were not able to update the llvm version. The culprit llvm commit is llvm/llvm-project@ce80c80, that basically uses a non-deterministic seed if the previous "Flag" is set. There were some discussions in triton-lang#4338, and the other option would be to check/replace code and tests that are overly depending on the determinism of llvm/include/llvm/ADT/Hashing.h (DenseMap and DenseSet mostly). I think that the priority is first to be able to update the llvm version, but if OpenAI is interested I could work on replacing them (for MapVector), in order to fix all the failing unit tests, and then we can set the LLVM_ABI_BREAKING_CHECKS again to "WITH_ASSERTS". - [x] I am not making a trivial change, such as fixing a typo in a comment. - [x] I have written a PR description following these [rules](https://cbea.ms/git-commit/#why-not-how). - [x] I have run `pre-commit run --from-ref origin/main --to-ref HEAD`. - [x] This PR does not need a test because it's just setting a variable for LLVM. - [x] I have not added any `lit` tests. Co-authored-by: Tori Baker <vwbaker@google.com>
…version (#4512) This PR is setting LLVM_ABI_BREAKING_CHECKS=FORCE_OFF, similar of how we do it inside Google. In this way, we are forcing the structures that relies in Hashing.h to use a deterministic seed. We were not able to update the llvm version. The culprit llvm commit is llvm/llvm-project@ce80c80, that basically uses a non-deterministic seed if the previous "Flag" is set. There were some discussions in triton-lang/triton#4338, and the other option would be to check/replace code and tests that are overly depending on the determinism of llvm/include/llvm/ADT/Hashing.h (DenseMap and DenseSet mostly). I think that the priority is first to be able to update the llvm version, but if OpenAI is interested I could work on replacing them (for MapVector), in order to fix all the failing unit tests, and then we can set the LLVM_ABI_BREAKING_CHECKS again to "WITH_ASSERTS". - [x] I am not making a trivial change, such as fixing a typo in a comment. - [x] I have written a PR description following these [rules](https://cbea.ms/git-commit/#why-not-how). - [x] I have run `pre-commit run --from-ref origin/main --to-ref HEAD`. - [x] This PR does not need a test because it's just setting a variable for LLVM. - [x] I have not added any `lit` tests. Co-authored-by: Tori Baker <vwbaker@google.com>
No description provided.