Detach execution context scheduler from running thread during blocking syscall#15871
Conversation
4b1309b to
9e54c28
Compare
9e54c28 to
2ade29a
Compare
fe6a9e6 to
603c2ff
Compare
This changes the runtime behavior of threads: we no longer start a thread to run a specific scheduler run loop and never terminate in practice (except for the isolated context). Each thread now has its own inner loop that switches to a scheduler loop fiber (the scheduler's main fiber) then switches back to its inner loop (the thread's main fiber) to sleep for a while, then eventually terminates. The benefit of the global thread pool is that threads are kept around instead of being created and thrown away. This is for example helpful for #15871 that will allow moving a scheduler to another thread, as well as for applications that regularly start an isolated fiber. They can keep reusing a pending thread instead of having to create one every time. Threads still eventually terminate after some configurable inactive time, except for the main thread because we need to keep the main fiber's stack alive. A future improvement could park MT threads back into the thread pool, instead of keeping them tied to the MT context. They could be reused by any context that needs parallelism, or to boot a new isolated fiber or ST context, instead of sitting around.
Marks the scheduler has running a blocking syscall for the duration of the block. The monitor thread now ticks every 10ms to check if any scheduler in any concurrent or parallel context is blocked on a syscall, and if so tries to detach the scheduler from the thread. On success, the scheduler is moved to another thread, taken from the thread pool. The fiber doing a blocking syscall will still be blocked, but other fibers can be resumed by the scheduler. When the blocking syscall returns, the thread will try to unmark the scheduler as running a blocking syscall. On success, the scheduler is still attached to the thread, so it simply continues. On failure, the scheduler has been moved to another thread, so it enqueues the current fiber into its execution context, and returns itself back into the thread pool.
603c2ff to
b1f47e3
Compare
|
Rebased from master to bring #15885 along with a few fixes. Ready for review. |
|
I finally got a trace in The monitor thread detached the scheduler from the thread (it noticed it was doing a syscall) and the thread should now be a bare thread. We call The thing is, the current thread is now a bare thread, so the condition Theory A: Theory B: the thread is being detached by the monitor at the same time the not-yet-bare thread tries to enqueue the fiber... and that's super dangerous: we must ALWAYS do a safe enqueue to the global queue 💣 💥 🤦 |
The thread may have long been detached, but it may also have just lost the atomic against the monitor thread that hasn't detached it yet. Let's imagine the monitor thread gets preempted, then the thread will check the current context + scheduler to match, but is preempted before it can actually enqueue (ticking bomb); then the monitor moves the scheduler to another thread (oops); finally the thread tries a local enqueue (boom).
|
Theory B appears to have been correct. CI doesn't reproduce anymore. |
…] (#16679) We must access `Errno.value` within the `Fiber.syscall(&)` block, not after the block returned because there can be a context switch before the method returns, and `errno` is no longer valid.
Some syscalls can block the current thread in certain circumstances, for example:
open(2)when opening a FIFO, pipe or characterdevice until another end is connected (from another thread or process);
getaddrinfo(3)until a DNS response (or error, or timeout) is received.This patch introduces a mechanism to declare the scheduler as "doing a syscall" which the monitor thread (SYSMON) can detect on its next iteration and will try to move the scheduler to another thread, so that only the fiber doing the syscall will be blocked, and the other fibers can be resumed.
Usually, the syscall should terminate before the monitor thread notices (for example opening a regular file), so the impact on performance is an atomic STORE + atomic CAS per syscall. At worst, a thread will be blocked for 10ms (SYSMON frequency). For example the updated opening FIFO file spec takes ~11ms to complete.
It works for the MT execution contexts and the ST context. It doesn't invalidate the ST guarantee that fibers in the context will never run in parallel: the blocked fiber is blocked on a syscall and will be re-enqueued immediately after the syscall has completed; also the syscalls don't invoke callbacks that would execute crystal code, so AFAICT fibers still won't run in parallel (please correct me if I'm wrong).
NOTES
The isolated context expects to block, so the
#syscall(&)method is a no-op there.There are probably other blocking syscalls that we might want to consider. For example, reading from STDIN on Windows could be greatly simplified.
Another example is
flockthat is currently retried every 100ms when it doesn't block the current thread. We might want to be able to actively detach a scheduler when calling#syscall(&), so we could try once (non-blocking) then on failure detach the scheduler and try again (blocking) without waiting for SYSMON to notice.EXAMPLE
The following example blocks the current thread, yet the spawned fiber keeps ticking every second. Remove the
Fiber.syscallwrapper, and the fiber won't even start!FOLLOW UP
We plan to use this in the future to rework and simplify use cases in the stdlib. For example:
Polling event loops could support blocking file descriptors, so we could stop setting
O_NONBLOCKon standard descriptors (shared), including pipes to spawned processes (Crystal's duplicated stdios are broken #16353).Same on Windows where console streams don't support OVERLAPPED (Console streams are blocking on Windows #14576, Emulate non-blocking
STDINconsole on Windows #14947).THREAD POOL
This PoC also introduces a pool of threads. It changes the behavior of threads: we don't start a thread to run a specific scheduler run loop, but each thread now has its own inner loop that basically switches to a scheduler loop then switches back to its inner loop to sleep.The benefit of the global thread loop is that threads are kept around instead of being created and thrown away. If you regularly spawn an isolated fiber, it will likely keep reusing the same thread(s). Threads still eventually shutdown after some inactive time (configurable) except for the main thread (we need to keep the main fiber alive).A potential evolution will park MT threads into the thread pool, instead of keeping them tied to the MT context, so they can be reused by any context that needs parallelism, or to boot a new isolated fiber or ST context.Extracted to #15885.
POTENTIAL ISSUES
I got one segfault in a gc call nested a libxml2 callback in one early run of the std specs (with-Dpreview_mt -Dexecution_context) but I couldn't reproduce it after fixing different issues in the PR.Maybe it was a fluke (because of the bugs), or maybe it was just a regular MT issue with libxml2, or maybe sysmon moved the scheduler from the main thread to another thread then resumed a fiber doing something in libxml2, and theglobalthread local state couldn't be found?This is the already known MT issue we have with libxml2. What's new is that the segfault might start happening in a ST environment 😢
☝️ expected to have been fixed by #15899 (and #15906).