-
Notifications
You must be signed in to change notification settings - Fork 17.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runtime: infinite loop in ./src/runtime/proc1.go lockextra() on both Linux & FreeBSD #13926
Comments
Does your program do a lot of callbacks from C to Go? Doing a web search turned up this: http://mpc.lists.freebsd.bugs.narkive.com/IiDB406k/bug-192926-new-patch-sched-yield-is-a-no-op. Is it possible that that bug applies to your system? |
Yes. That's 90% of the input for my application. I believe line 5 of the stack trace refers to this function, which is invoked from a C library (the sendmail milter library): https://github.com/Freeaqingme/gomilter/blob/master/gomilter.go#L211
I did see the issue, but hadn't related it because the fix was applied well before this kernel was compiled. I'll see if this particular build included the fix for that bug report. Ftr, I'm not entirely sure where the bug would be. In theory it could be a locking issue in my application code (or so I was told on IRC), but I'm really not able to find it there. It could of course also be an issue with the kernel, but determining where exactly still goes a little over my head. |
If that sched_yield bug does exist in your kernel, then you are in trouble because the spin lock in lockextra will burn all your CPU, which I think is approximately what you are seeing. If your kernel does not have that sched_yield bug, then I think the next step is to verify that this particular loop is the one that is generating all the sched_yield calls. |
The kernel was built with commit r274401, which includes the fix for the FreeBSD bug linked above.
Any idea on how to do so? |
Maybe it's good to mention that the C library (libmilter) spawns a new thread per incoming connection, then does a callback to my go code. The machine this happens on handles a few connections at a time, max. However, doing 'info threads' on the core dump shows that at the moment the machine shows the high load, there are well over 300 threads:
Backtraces all look very similar:
|
I don't know what is happening. The fact that it is happening for you on FreeBSD but not on GNU/Linux suggests a FreeBSD specific problem, but I don't know what it could be other than surprising behaviour of sched_yield. Here is how it is supposed to work. There is a linked list of extra M structures to use for a callback from C to Go on a thread started by C (the structures are not needed on threads started by Go). That linked list starts out with one entry on it. When there is a callback from C to Go, the code calls needm which calls lockextra(false). The lockextra function spins waiting for an M structure to be available. When one is available, if it is the only one available, the code sets the needextram flag. It then starts the callback. When the thread gets to Go proper, but before it actually calls the Go callback function, it checks the needextram flag. If it is set, the thread calls newextram. The newextram function allocates a new M structure and adds it to the list (using another call to lockextra, this time passing true). When the Go callback function completes, the runtime calls dropm. This adds the M structure back on the list, using another call to lockextra(true). If a bunch of callbacks happen simultaneously, they will all collide getting an M structure. The first one to succeed will create a new M structure, handing it over to the next thread. When the callback completes, the M structure will be put back on the list. So the behaviour we are likely to see is that the first time there are a bunch of parallel callbacks, there is contention in lockextra until all the M structures are allocated. When all the callbacks complete, the linked list will have a bunch of M structures, so there should be no more issues until there is again a bunch of even more parallel callbacks. Is that the behaviour you see? An initial set of collisions that resolve, followed by normal behaviour? The parallel callbacks vying for an M can theoretically starve the callback adding a new M to the list. This is not supposed to happen because of the call to osyield, and because getting the lock and failing to find an M to use will call usleep(1), giving time for the new M to appear. The call to usleep does add another possibility for OS-specific differences. The FreeBSD/AMD64 specific code looks fine to me, though. Does your truss listing show any calls to nanosleep? |
I agree. But, I haven't been able to 100% reproduce the input these freebsd boxes are getting on linux. So the fact that I haven't seen it on linux is not entirely conclusive. I'm planning to add a linux box to the set of boxes that have been demonstrating this behavior to see if it will occur there as well.
Not entirely. I think we can agree on the fact that there's some contention somewhere, resulting in the 300 threads listed above. However, once the load begins to rise, the application doesn't recover. Even when we wait 20 minutes before killing it, no cgo callback is processed since the beginning of the outage. Other goroutines, including ones that invoke os/exec.Command().Run() continue running as should. I think this last bit must be related somehow, given that I run the application on 10 machines. 3 of these machines run the os/exec.Command().Run() code path, and somehow these are the only instances exhibiting any problems.
The truss output that we captured is only a 100 lines long. Because 90 lines of these were showing the exact same line we didn't bother capturing anything more. However, of these 100 lines there's no nanosleep() calls. |
How long does it take this program to run on your FreeBSD system? On your GNU/Linux system?
|
Below I've got some results. Beware though that the linux machine has 8 cores (including hyperthreads), the freebsd machine only has two:
I've repeated the above a dozen times or so, all yielding similar results. One time however I again got a load that skyrocketed on FreeBSD. In order to stress the system a little bit more, I upped the THREADS constant to 650.
All results are similar to the ones posted above, the 1.5s is the highest number observed (even with GOMAXPROCS set to 1, results are not too dissimilar averaging at ~650ms). Doing the same on FreeBSD though:
Every few runs the load spikes. With the run that took >3 minutes, the load went up to 600. The >3 minute run is not an exception, but rather a regular phenomenon. Even though the symptoms are the same (load spikes to semi-infinite amounts), internally it's a little different. Truss shows:
A random backtrace from during such an event:
|
According to https://golang.org/cl/18455, a sleep on FreeBSD always sleeps for the remainder of the time slice, which I think increases the sleep time from 1 microsecond to 1 millisecond. That might explain why FreeBSD tends to be slower than GNU/Linux when there is a lot of contention among threads waiting for an M structure. |
@ianlancetaylor Thanks for the link, hadn't found that one yet. That could surely explain why it's a little slower on FreeBSD than on Linux. But, if performance was a little worse that's something I could live with (in this particular instance performance is hardly a concern). However, the fact that usleep() gives up the rest of the time slice, is sort of a constant factor I reckon. That wouldn't yet explain why every once in a while it would lock up on sched_yield() right? Or for that matter, why your test program takes 25 times as long in some runs compared to other runs? |
Yes, there is something here I don't understand. But I'm thinking that I might understand why it's so much worse on FreeBSD. As you say, it's only a constant factor, but it's a constant factor of up to 1000. Now that you point it out I also see a wide variance running my program on GNU/Linux. In 100 runs, the minimum is 18.227649ms, the maximum is 723.168546ms, the median is 53.283891ms, the mean is 108.02260827ms. |
I was just able to come up with this little snippet. I ran it ~10 times, and with two runs it showed the same problem almost right away. That is, load soaring, truss showing predominantly sched_yield() calls, but also some nanosleep() calls. A representative snippet:
Code:
Using a larger number of threads (e.g. 250 instead of 50) triggers the issue almost immediately. |
At this point in the release cycle I think we have to postpone to 1.7. |
Yeah, makes sense. For what it's worth, we migrated this application off to linux 6 days ago and it hasn't shown since - under an identical load as with which we saw this behavior before within 12 hours. So far so good. knocks on wood |
@ianlancetaylor What shall we do with this issue? Personally I have no interest in pursuing this issue any further. It appears the exact same set of syscalls is performed on FreeBSD as is on Linux, and it works perfectly fine on the latter. My conclusion would be that it's some sort of race condition in the FreeBSD scheduler |
@ianlancetaylor Scratch what I said above about FreeBSD. I also just witnessed the very same issue on Linux. Difference with FreeBSD would be that we witnessed the same issue there every day, and so far (3 to 4 months in) it's the first time we've seen it on Linux. Strace output simply showed lots of sched_yield() calls, I also have a pprof dump. The only thing I wasn't able to attain was a coredump (at least I was able to install gdb with a load of 1000...). |
I think I'm hitting this issue too on linux 3.16. I eventually see all of my threads that call go code from C start spending 100% of their time in runtime.osyield. Strace also shows a ton of sched_yield calls. I'm currently seeing if using fewer threads makes this less likely to happen. If it happens again I'll see if I can repro with a smaller example. |
/cc @dvyukov |
@adamfaulkner What version of Go? Can you provide a repro? |
This was postponed to 1.7 from 1.6. Will it miss the boat again? We only have a week or so to formulate a fix here. |
100% CPU consumption is expected effect of a global spin mutex. If more than 1 thread is doing cgo callback on external thread, all but one wait. If average rate of calls is higher than 1/num_threads, then the process locks up as described. Shared library support added another global mutex on that path. Now it is also used by C traceback support. If we resolve lockextra, the bottleneck will just move to another place. |
@dvyukov I'm not sure I follow you entirely. I don't think having some sort of bottleneck is a problem, but the fact that everything locks up, and then stays locked up is a problem. If we can prevent the latter I'd be a happy man. |
The description says "It appears as if the application keeps running as should". |
That's a valid point. What I meant to say was that because the load keeps on rising, even though my load balancers stop feeding the application new data. So eventually the entire machine becomes unusable, and the application needs to be killed. I wanted to see if things would settle automatically, but it didn't. However, the load by that time had grown to >1000 so eventually the entire machine 'locks' up due to load. |
I don't think a high loadavg will cause an operating system to lock up. What is the resource that causes your machine to lock up, my guess is On Wed, 8 Jun 2016, 05:07 Dolf Schimmel [email protected] wrote:
|
I've tried to reproduce the issue on tip on linux/amd64 using the program from #13926 (comment) (as is and with various changes that might increase probability of bugs in extram). |
We clearly aren't going to do anything here for 1.7. |
For 1.8 perhaps we should consider storing extram on the C side, not the Go side. Then we can use C mutexes and condition variables, and avoid the busy wait. Of course better still would be to avoid the global lock, but it's not clear to me how to do that. |
CL https://golang.org/cl/25047 mentions this issue. |
Howdy,
We've been seeing an issue recently on a couple of FreeBSD boxes. It appears as if the application keeps running as should, but CPU usage rises to 100% and the load increases to infinite numbers.
Truss showed that the application was doing a lot of sched_yield syscalls. The following keeps repeating:
49270: sched_yield(0x6aa450,0x7fffd96cbdc8,0xbcbcb0,0xfffffffffffff000,0x0,0x2e) = 0 (0x0)
49270: sched_yield(0x6aa450,0x7fffd9ccedc8,0xbcbcb0,0xfffffffffffff000,0x0,0x2e) = 0 (0x0)
49270: sched_yield(0x6aa450,0x7fffd98ccdc8,0xbcbcb0,0xfffffffffffff000,0x0,0x2e) = 0 (0x0)
Stack trace from the core dump:
I'm not 100% sure whether this thread is also the thread causing all the problems based on the core dump that I have, but given that lockextra() calls osyield() in a for loop it seems very plausible.
Having said that, I also run this app on a few other very similar boxes where this problem does not show. Difference in configuration is that on the systems that do show these symptoms I use cmd := os/exec.Command(); cmd.Run(). But I have no indication that that particular code is having any issues ( https://github.com/Freeaqingme/ClueGetter/blob/61f0089/src/cluegetter/mailQueue.go#L107 ).
So far I have not been able to reproduce or trigger this behavior on Linux. The binary is compiled using go 1.5.1, although the code in proc.go (proc.go & proc1.go were merged) and sys_freebsd_amd64.s haven't been modified since. FreeBSD version: 10.1-RELEASE #0 r274401.
The text was updated successfully, but these errors were encountered: