Skip to content

Conversation

@kubkon
Copy link
Member

@kubkon kubkon commented Jan 20, 2021

This commit brings back stack trace printing on ARM Darwin OSes by
avoiding the use of threadlocal for keeping track of panic stages per
thread. Essentially, LLVM/LLD fails to properly emit thread-local variables
which causes a segfault on attempting access. Since the default panic
handler relies on a threadlocal to keep track of panic stages per
thread using a thread-local variable, we ensure at compilation time
that ARM Darwin platforms explicitly opt-out from this in favour of
storing the panic stages per thread in a hash-map behind a mutex.
This is a temporary solution until the underlying issue is fixed
but much welcome one since this brings back stack traces printouts
on ARM macOS without the necessity to resort to the system linker and
still get broken stack traces due to addressing differences between the
system linker and LLD. For a more complete overview of the problem with
threadlocal in LLVM, see #7527.


EDIT:

This temporary patch fixes a segfault caused by miscompilation
by the LLD when generating stubs for initialization of thread local
storage. We effectively bypass TLS in the default panic handler
so that no segfault is generated and the stack trace is correctly
reported back to the user.

Note that, this is linked directly to a bigger issue with LLD
#7527 and when resolved, we only need to remove the
comptime code path introduced with this patch to use the default
panic handler that relies on TLS.

@kubkon kubkon requested a review from andrewrk January 20, 2021 16:51
@kubkon
Copy link
Member Author

kubkon commented Jan 20, 2021

@LemonBoy why thumbs down if you don't mind me asking? Are you against this change, or angry at the fact that LLVM doesn't play well with Apple's ARM target? 😁

@LemonBoy
Copy link
Contributor

(I wrote this comment a while ago but forgot to press the green button)

The workaround is really ugly.

We now have the scaffolding for emulated TLS in place, I think force-enabling it on M1 targets is a good workaround until the new LLD mach-o backend is promoted.

@kubkon
Copy link
Member Author

kubkon commented Jan 20, 2021

(I wrote this comment a while ago but forgot to press the green button)

The workaround is really ugly.

I agree, so if you have a better idea, please fire away!

We now have the scaffolding for emulated TLS in place, I think force-enabling it on M1 targets is a good workaround until the new LLD mach-o backend is promoted.

Oh, what scaffolding do you mean? Did I miss something?

@LemonBoy
Copy link
Contributor

Oh, what scaffolding do you mean? Did I miss something?

OpenBSD (and Android) does not support the usual hardware-backed TLS model and LLVM emits calls to a library call __emutls_get_address to emulate it.
#7577 introduced a pure-zig implementation that works well on OpenBSD and Linux, ideally it should work on osx with minimal or no changes. I've dropped some instructions on how to force-enable it in #7380, let me know if this solves your problem too.

@kubkon
Copy link
Member Author

kubkon commented Jan 20, 2021

Oh, what scaffolding do you mean? Did I miss something?

OpenBSD (and Android) does not support the usual hardware-backed TLS model and LLVM emits calls to a library call __emutls_get_address to emulate it.
#7577 introduced a pure-zig implementation that works well on OpenBSD and Linux, ideally it should work on osx with minimal or no changes. I've dropped some instructions on how to force-enable it in #7380, let me know if this solves your problem too.

Oh cool, I was actually not aware of this. Thanks @LemonBoy. Lemme have a stab at this and report back.

@kubkon
Copy link
Member Author

kubkon commented Jan 20, 2021

@LemonBoy Hmm, I can't seem to get it to work for some reason. Here's my diff of changes to get the emulated TLS to work:

diff --git a/src/zig_llvm.cpp b/src/zig_llvm.cpp
index 51af5e06d..8a0501f2c 100644
--- a/src/zig_llvm.cpp
+++ b/src/zig_llvm.cpp
@@ -104,7 +104,7 @@ static const bool assertions_on = true;
 static const bool assertions_on = false;
 #endif

-LLVMTargetMachineRef ZigLLVMCreateTargetMachine(LLVMTargetRef T, const char *Triple,
+LLVMTargetMachineRef ZigLLVMCreateTargetMachine(LLVMTargetRef T, const char *TripleName,
     const char *CPU, const char *Features, LLVMCodeGenOptLevel Level, LLVMRelocMode Reloc,
     LLVMCodeModel CodeModel, bool function_sections, ZigLLVMABIType float_abi, const char *abi_name)
 {
@@ -169,7 +169,12 @@ LLVMTargetMachineRef ZigLLVMCreateTargetMachine(LLVMTargetRef T, const char *Tri
         opt.MCOptions.ABIName = abi_name;
     }

-    TargetMachine *TM = reinterpret_cast<Target*>(T)->createTargetMachine(Triple, CPU, Features, opt, RM, CM,
+    Triple triple(TripleName);
+    if ((ZigLLVM_OSType)triple.getOS() == ZigLLVM_MacOSX) {
+      opt.ExplicitEmulatedTLS = true;
+      opt.EmulatedTLS = true;
+    }
+    TargetMachine *TM = reinterpret_cast<Target*>(T)->createTargetMachine(TripleName, CPU, Features, opt, RM, CM,
             OL, JIT);
     return reinterpret_cast<LLVMTargetMachineRef>(TM);
 }

And here's the linker error I'm seeing:

[  2%] Built target opt_c_util
[ 79%] Built target embedded_softfloat
[ 83%] Built target zigcpp
[ 97%] Built target zigstage1
[ 99%] Built target zig0
[ 99%] Building self-hosted component /Users/kubkon/dev/zig/build/zig1.o
[ 99%] Linking CXX executable zig
Undefined symbols for architecture arm64:
  "_panic_stage", referenced from:
      _std.debug.panicExtra in zig1.o
      _std.debug.panicExtra.4302 in zig1.o
      _std.debug.panicExtra.4304 in zig1.o
ld: symbol(s) not found for architecture arm64
clang: error: linker command failed with exit code 1 (use -v to see invocation)
make[2]: *** [zig] Error 1
make[1]: *** [CMakeFiles/zig.dir/all] Error 2
make: *** [all] Error 2

Any ideas?

@LemonBoy
Copy link
Contributor

Well well well, LLVM seems to support emitting the emulated tls sequence for aarch64 targets.
Are you sure the branch is always taken? Are you building zig1.o from a clean slate?

The alternative workaround is to rename panicExtra to panicExtraGeneric and introduce your own panicExtraM1 where you do nasty things to please the machine. Once the workaround isn't needed anymore we can safely chuck the new code in the bin and forget it even existed 👉 😎 👉

This temporary patch fixes a segfault caused by miscompilation
by the LLD when generating stubs for initialization of thread local
storage. We effectively bypass TLS in the default panic handler
so that no segfault is generated and the stack trace is correctly
reported back to the user.

Note that, this is linked directly to a bigger issue with LLD
ziglang#7527 and when resolved, we only need to remove the
`comptime` code path introduced with this patch to use the default
panic handler that relies on TLS.

Co-authored-by: Andrew Kelley <[email protected]>
@kubkon kubkon force-pushed the apple-silicon-stack-traces branch from 4cb2f4b to 457449a Compare January 21, 2021 19:43
@kubkon
Copy link
Member Author

kubkon commented Jan 21, 2021

@LemonBoy pushed a commit that should be cleaner to remove when LLD fixes the miscompilation with TLS.

@kubkon
Copy link
Member Author

kubkon commented Jan 21, 2021

Also, since I know this is right up your alley @LemonBoy, here's what is actually going on: Thread-local storage, LLVM, and Apple Silicon: What can go wrong?.

@kubkon kubkon merged commit 843d91e into ziglang:master Jan 21, 2021
@kubkon kubkon deleted the apple-silicon-stack-traces branch January 21, 2021 22:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants