-
Notifications
You must be signed in to change notification settings - Fork 17.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runtime: intermittent os/exec.Command.Start() Hang on Darwin in Presence of "plugin" Package #38824
Comments
Background: I maintain a fork/branch of Joker (a Clojure interpreter/clone written in Go) that "autowraps" much of the Go standard library for easy access by Joker code. A couple of weeks ago, I first noticed that the (small) test suite occasionally hung. Digging further, I posted this to get some feedback: https://stackoverflow.com/questions/61342000/why-might-exec-command-start-hang-on-darwin As suggested there, I ran Later, I boiled the pertinent Joker code down to a small test program and was able to reproduce that. The first such version pulled in all the Go standard library that my fork of Joker did. I then performed manual bisections of that set and narrowed them down to just the This issue is being tracked (on my end) via jcburley/joker#19. It's Closed due to a workaround being implemented such that A slightly "fuller" version of the sample program, along with instructions on another approach to run it, is here: https://github.com/jcburley/hangme/blob/master/README.md |
cc @odeke-em re MacOS |
Hi, the same thing in
|
I took a quick look at this today and can at the very least confirm that the bug still exists with 1.14.3: (Here's my very slightly modified code, where all I did was add some more STDOUT output to figure out what's going on.) As we can see here, I ran it the same way as James and indeed it locked after only 3 attempts. At the point this screenshot was taken, the same program had been hanging for 7 minutes, 35 seconds. So I looked at Activity Monitor.app to see if I can get some more information: Here's the text of that stack trace if you can't view the image for some reason:
That last line caught my eye. Stuck in a mutex wait in the kernel? I went for a
It's just flat hung on the last line, no progress beyond that point at all. My theory at this point is that the process isn't returning because one of the executions of This is about the depth of my ability to debug at this point (I'm always learning, in fact that's why I'm looking at this issue!). I don't know how much help it'll be, but I certainly hope this aids in tracking things down. |
Hi, I'm so glad to find a fellow sufferer from this :) From that stack trace, and stack traces of my own issues, it looks like the We are also importing |
Glad to know someone else might have been saved some time by this report! Hope they fix it someday (though it's not blocking me, and I don't expect it will for some time). |
I cannot reproduce this locally, using hangme, with go1.16.5 or go1.16.7, which are the versions we are using in our production builds. There's some other ingredient here, as well. Do you happen to have a recollection of other details of the environment where you saw this fail? OS X version, go version, etc.? |
^^ I take that back, right after posting the comment, it failed (go1.16.7). |
I've adjusted I've built a custom Go1.16.7 which panics in |
One difference I can see is that, when importing
but when not importing
but I don't know what would cause that. I think the Looking at disassembly of I tried adding the cgo comment directly in |
I've narrowed this further, slowing making changes to plugin_dlopen.go. Here's where I'm at: --- fails.go 2021-12-15 15:56:19.000000000 -0500
+++ works.go 2021-12-15 15:42:11.000000000 -0500
@@ -7,10 +7,6 @@
package plugin
-/*
- */
-import "C"
-
import (
"sync"
) or in words, if |
What if you have some source file outside of the standard library that does |
For that matter, what if you take your working version, and build with |
I'll try those things. The pattern of
appears to have held up. I'm not sure what that means, though -- any ideas? |
The two also seem to be drastically different executables:
|
Definitions:
|
From the above chart and reading https://cs.opensource.google/go/go/+/refs/tags/go1.17.5:src/cmd/cgo/doc.go it seems that the "Don't call
Obviously this isn't a great answer! What other variables should I investigate? |
I just tried copying the stripped-down |
I don't know what is happening here. Just a note that in the linker there is special handling of the case in which the plugin case is imported and uses cgo: https://go.dev/src/cmd/link/internal/ld/lib.go?#L551. I don't see any obvious reason why that would cause a hang, though. |
Oh, thanks! I was grepping for Referring back to earlier in this issue, the hang is because the dyld global lock is held when At this point, I think that's because of something that |
CC @cherrymui @thanm |
(just confirmed this is still an issue with go1.17.5 and go1.18beta1) |
Thanks. I can reproduce with Go tip locally as well. I'll look into it. |
The difference for if plugin is used is that it pass If I remove the import of plugin package but pass |
It seems it hangs if only the parent uses plugin. I can reproduce if hangme just shells out /usr/bin/true. It does not hang if only the child uses plugin, but parent does not. |
With async preemption disabled it can still hang, albeit with seemingly lower frequency. So #41702 or any workaround related to that probably would not help. |
Change https://golang.org/cl/372798 mentions this issue: |
As mentioned above (thanks @djmitche !) this may be related to the dynamic linker resolving bindings. Forcing early binding resolution ( I'll look further into it to see how exactly this happens and if it is a bug of the dynamic linker. |
From dyld source
So it is indeed that the use of flat namespace makes it acquire the global lock (which could deadlock if we're forking while the lock is held and the child needs to resolve a binding before exec). Resolve bindings ahead of time seems the right workaround to me. Another possibility is to just resolve the set of symbols that the child may use before exec (e.g. by making dummy syscalls before forking). Probably not worth it. |
Awesome, what teamwork! Knowing what's wrong, and that there's a fix in a future Go, means we can work around it for the time being. |
@gopherbot please backport this to previous releases. This can cause program that uses plugins to hang on macOS. |
Backport issue(s) opened: #50245 (for 1.16), #50246 (for 1.17). Remember to create the cherry-pick CL(s) as soon as the patch is submitted to master, according to https://golang.org/wiki/MinorReleases. |
Change https://golang.org/cl/373094 mentions this issue: |
Change https://golang.org/cl/373095 mentions this issue: |
…ins on darwin When building/using plugins on darwin, we need to use flat namespace so the same symbol from the main executable and the plugin can be resolved to the same address. Apparently, when using flat namespace the dynamic linker can hang at forkExec when resolving a lazy binding. Work around it by forcing early bindings. Updates #38824. Fixes #50245. Change-Id: I983aa0a0960b15bf3f7871382e8231ee244655f4 Reviewed-on: https://go-review.googlesource.com/c/go/+/372798 Trust: Cherry Mui <[email protected]> Reviewed-by: Than McIntosh <[email protected]> Reviewed-by: Ian Lance Taylor <[email protected]> Run-TryBot: Cherry Mui <[email protected]> TryBot-Result: Gopher Robot <[email protected]> (cherry picked from commit c5fee93) Reviewed-on: https://go-review.googlesource.com/c/go/+/373095
…ins on darwin When building/using plugins on darwin, we need to use flat namespace so the same symbol from the main executable and the plugin can be resolved to the same address. Apparently, when using flat namespace the dynamic linker can hang at forkExec when resolving a lazy binding. Work around it by forcing early bindings. Updates #38824. Fixes #50246. Change-Id: I983aa0a0960b15bf3f7871382e8231ee244655f4 Reviewed-on: https://go-review.googlesource.com/c/go/+/372798 Trust: Cherry Mui <[email protected]> Reviewed-by: Than McIntosh <[email protected]> Reviewed-by: Ian Lance Taylor <[email protected]> Run-TryBot: Cherry Mui <[email protected]> TryBot-Result: Gopher Robot <[email protected]> (cherry picked from commit c5fee93) Reviewed-on: https://go-review.googlesource.com/c/go/+/373094
What version of Go are you using (
go version
)?Does this issue reproduce with the latest release?
Yes. Also with
go version devel +be08e10b3b Fri May 1 21:57:29 2020 +0000 darwin/amd6
.It does not reproduce with Go 1.13.10.
What operating system and processor architecture are you using (
go env
)?go env
OutputNote that, in a different shell,
GOROOT=/usr/local/go
, corresponding to Go 1.14.2, where the same problem occurs (though, anecdotally, less frequently) compared to the recent commit, identified above, onmaster
.What did you do?
Built a simple Go program, ran it repeatedly via:
What did you expect to see?
Repeated runs ad infinitum, no hang.
What did you see instead?
Occasionally (intermittently), the program hangs.
SIGQUIT
stack dump shows it usually hangs in theCommand.Start()
receiver inos/exec
, which is not supposed to hang at all. (I've seen somewhat-different stack traces across different programs and built with different versions of Golang, but they all hang at that call or soon after.)Such hangs have been observed (by me) only in programs that import
plugin
(even though they don't use it at all; only its initialization code should run). Comment-out the_ "plugin"
import line in the above program, rebuild (viago build
), and rerun the ever-looping command, and it runs until manually stopped.The text was updated successfully, but these errors were encountered: