-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
misc/cgo/testsanitizers: enabling misc/cgo/testsanitizers testcases for ppc64le result in TLS relocation link errors and signal handling errors for cgo #45040
Comments
Sounds like the thread sanitizer doesn't really work on ppc64le. Let's either fix up |
@pmur thinks he has a linker fix for the relocation error. I also got all but one tsan tests it to work by building them with a shared std and -linkshared. With the linker error fixed, tsan8 still fails because it doesn't forward the Go signal. But I agree that checkCSanitizer should be fixed. |
How does this compare to using the -race option other than one is using LLVM and the other gcc? |
Even if we just skip this test, I think this is showing a problem that could happen for other cases where Go and C code is linked together in this way. |
Change https://golang.org/cl/302209 mentions this issue: |
It's not related to LLVM vs. GCC; both LLVM and GCC support These tests are relevant because people can compile their C code with |
We were looking at those files yesterday and see that some files are only built for amd64 and arm64, so this has never worked on ppc64le. Looks like we need a callCgoSigaction which on arm64 that calls _cgo_sigaction. I tried to add that but then it seemed like there were calls back and forth between the Go code and the tsan functions that need to handle the different register conventions. That's as far as I got in my debugging. |
We have the issue now that is mentioned in issue #31827 for arm64 and possibly others. The nonvolatile registers are currently not being saved for Power in sigtramp. I tried to add that code it didn't fix the problem. The failure is a SEGV in the tsan library because it tries to store at an address off R31 but R31 points to code. The address in R31 has the same value as LR, which is interesting since Go code does a mflr to R31. If I run with cpu=1 it works. |
This failure started happening when the testsanitizer tests in misc/cgo were enabled for any system where gcc allows -fsanitize=thread. On some ppc64le systems, the initial test to determine if -fsanitize=thread is allowed on the system might pass, but when building the test the link step fails. The link failure is not limited to tsan libraries -- this TLS error message could occur at link time when linking against a C shared library with large enough amount of TLS. |
Change https://golang.org/cl/306369 mentions this issue: |
@Helflym My change does not change the way signals are done on AIX. I will leave that for you to change later if you feel it is needed. |
Change https://golang.org/cl/315430 mentions this issue: |
A while back in this release the sanitizer tests were enabled for ppc64le, where previously they were never run. This uncovered some errors in these tests on ppc64le. One linker fix was made but there are still bugs in how tsan is made to work within the code, especially in how signals are enabled with cgo. Some attempts were made to make this work but intermittent failures continue to happen with the Trybots so I am just going to disable this test for ppc64le within cmd/dist. Updates #45040 Change-Id: I5392368ccecd4079ef568d0c645c9f7c94016d99 Reviewed-on: https://go-review.googlesource.com/c/go/+/315430 Run-TryBot: Lynn Boger <[email protected]> TryBot-Result: Go Bot <[email protected]> Reviewed-by: Ian Lance Taylor <[email protected]> Trust: Cherry Zhang <[email protected]>
I talked to the person who does the testing for tsan on ppc64le to ask about the intermittent failures on the trybots related to address ranges.
He told me that is most likely due to the small amount of memory available on those systems, which is why it only happens there. Due to ASLR it might end up at different addresses which is why it can be intermittent. |
Recently some tsan tests were enabled on ppc64le which had not been enabled before. This resulted in failures on systems with tsan available, and while debugging it was determined that there were other issues related to the use of signals with cgo. Signals were not being forwarded within programs linked against libtsan because the nocgo sigaction was being called for ppc64le with or without cgo. Adding callCgoSigaction and calling that allows signals to be registered so that signal forwarding works. For linux-ppc64 and aix-ppc64, this won't change. On linux-ppc64 there is no cgo. I can't test aix-ppc64 so those owners can enable it if they want. In reviewing comments about sigtramp in sys_linux_arm64 it was noted that a previous issue in arm64 due to missing callee save registers could also be a problem on ppc64x, so code was added to save and restore those. Also, the use of R31 as a temp register in some cases caused an issue since it is a nonvolatile register in C and was being clobbered in cases where the C code expected it to be valid. The code sequences to load these addresses were changed to avoid the use of R31 when loading such an address. To get around a vet error, the stubs_ppc64x.go file in runtime was split into stubs_ppc64.go and stubs_ppc64le.go. Updates #45040 Change-Id: Ia4ecff950613cbe1b89471790b1d3819d5b5cfb9 Reviewed-on: https://go-review.googlesource.com/c/go/+/306369 Trust: Lynn Boger <[email protected]> Run-TryBot: Lynn Boger <[email protected]> TryBot-Result: Go Bot <[email protected]> Reviewed-by: Carlos Eduardo Seo <[email protected]>
What version of Go are you using (
go version
)?Does this issue reproduce with the latest release?
Yes, but the errors are different depending on the system. On a system where the verification of tsan fails in a certain way the test is skipped so the error doesn't occur and the test seems to pass.
This started with CL 297774 since these tests were not run on ppc64le before that.
What operating system and processor architecture are you using (
go env
)?ppc64le
Fails consistently on Linux SMP Debian 4.19.160-2
go env
OutputWhat did you do?
This first appeared on the build dashboard: https://build.golang.org/log/a07cf12a0667dc38c330f06dc5c457841154ffc4 and then also appeared on a slowbot run: https://storage.googleapis.com/go-build-log/9fc3f56e/linux-ppc64le-power9osu_74ad5e7f.log.
I don't know why it would work sometimes but not others like has happened on the build dashboard.
What did you expect to see?
PASS
What did you see instead?
Various errors, including those shown in the logs above. It does pass sometimes as mentioned above, if the test decides that it shouldn't test tsan then the test seems to pass.
The text was updated successfully, but these errors were encountered: