Add io_uring event loop (linux)#16264
Add io_uring event loop (linux)#16264ysbaddaden wants to merge 19 commits intocrystal-lang:masterfrom
Conversation
| # immediately if the CQ lock couldn't be acquired. | ||
| def cq_trylock?(&) | ||
| {% if flag?(:execution_context) %} | ||
| if @cq_lock.try_lock |
There was a problem hiding this comment.
Meta: Would a try_synchronize method make sense?
bf095d5 to
3940272
Compare
|
Fixed a bug with execution contexts, and rebased from master. Ready for review! |
|
Wow, yasbaddaden is such an amazing developer! |
a482b2d to
7684b4f
Compare
|
f2bb940 to
0ea44ff
Compare
spec/std/http/spec_helper.cr
Outdated
| if exc = done.receive | ||
| raise exc | ||
| end | ||
| raise exception if exception |
There was a problem hiding this comment.
This helper seems to be dependent on fiber context switches. Weirdly it only fails on the OAuth2::Client specs.
With the patch EC + io_uring passes but all other cases hang.
Without the patch EC + io_uring fails but all other cases pass.
653017e to
24d5e4a
Compare
straight-shoota
left a comment
There was a problem hiding this comment.
I must admit that I'm not super familiar with the details of io_uring and did not scrutinize all the implementations in depth. So there's a good chance I would miss a logic bug if there is one.
But the code looks good overall and it has been more or less working for months now. So I'm pretty confident about merging it. Exposing it more easily for people to try out with their code should help find any issues that might still be there somewhere.
This is amazing work 🚀
- c/errno: EBADR - c/poll: POLLIN, POLLERR, ... - c/sys/mman: MAP_POPULATE, MADV_DONTFORK - c/uio: iovec struct - linux/io_uring: structs, constants and syscalls
Mostly boilerplate around creating the ring, detecting available features, mapping the buffers from kernel to user land, handling the SQ and CQ rings, ...
24d5e4a to
ad1e647
Compare
|
Rebased from master to bring the refactored Linux CI workflow, remove the custom changes, and merely add the io_uring jobs to the test stdlib matrix (4 in total). |
| # which will block the current thread until the ring has been fully drained (all | ||
| # the SQE have completed), at which point it will be unregistered from the event | ||
| # loop that will nillify the entry in the rings array. | ||
| class Crystal::EventLoop::IoUring < Crystal::EventLoop |
There was a problem hiding this comment.
| class Crystal::EventLoop::IoUring < Crystal::EventLoop | |
| @[Experimental] | |
| class Crystal::EventLoop::IoUring < Crystal::EventLoop |
Implements an event loop that leverages io_uring on Linux targets.
Requirements
The event loop requires different features that have been added in different versions of the kernel. At a minimum Linux 5.19 is required, while the recent Linux 6.13 is recommended. It is thus compatible with Linux 6.1 SLTS but not previous (S)LTS kernels.
The io_uring event loop is disabled by default. It must be enabled manually at compile time with the
-Devloop=io_uringflag.The SQPOLL feature is support but disabled by default. It allows to avoid syscalls on submissions & completions which is very cool... but it uses lots of CPU 🔥. It can be enabled at compile time with the
IORING_SQ_THREAD_IDLEenvironment variable (in milliseconds) that sets the idle time for the SQPOLL thread.For example:
Implementation details
The basic implementation was straightforward. It's basically an async framework: submit an operation, suspend the fiber, and resume it when the operation has completed.
This is also the second event loop that uses blocking IO after IOCP on Windows, and the first one on UNIX.
The main issue is a Linux limitation where close doesn't interrupt pending operations in the kernel, so we must shutdown sockets and cancel pending ops on files for example.
Threads Support & Safety
The MT safe implementation (preview_mt, execution_context) was much more complex. Unlike the other event loops, we can't have a single ring as it would require to lock on every submit, and with multiple threads it would create a contention and would likely require syscalls (that would defeat the point), so we need a ring per thread (sharing the same kernel resources).
There's thus a new API to register execution context schedulers to the event loop, so we can create/close rings as needed. Since a scheduler can shutdown (e.g. after a resize down), the execution context must also drain its ring before the scheduler can stop: all the pending operations must have completed and all the pending fibers be enqueued.
We need cross rings communication for a couple scenarios: to interrupt a thread waiting on the event loop, and for cancelling pending read/write file operations (the serial R/W of #16209 is required). At worst, this communication needs a lock on submit (which is avoided on Linux 6.13+). Unlike the single ring, the lock should usually not be contented in practice (unless you open lots of files, read/write from many fibers to the same file and close from whatever fiber).
Unlike the other event loops, there isn't a single system instance for the whole event loop (e.g. one epoll, kqueue or IOCP), and each scheduler is responsible for its own completion queue... which means that we're back into the "a busy thread can block runnable fibers" in its completion queue while there might be starving threads. A busy thread can be a CPU bound fiber, or a pair of fibers that keep re-enqueue each other.
To avoid this situation, once in a while + every time a scheduler would wait on the event loop (starving), the event loop will instead iterate the completion rings and try to steal runnable fibers from other threads. That requires a lock on the completion queue, that should also usually not be contended (only once in a while).
TODO
segfault with musl-libc when initializing(it's actually~STDERR:const_read(doesn't happen with glibc)raisethat tries to initializeSTDERRthat depends on evloop that's not available).ENV["IORING_SQ_THREAD_IDLE"]with-Dio_uring_sq_thread_idle=200—let's use comptime flag values!*-linux-*with-Devloop=io_uring— that's a bunch more targets but std specs are quick (and compiler specs are irrelevant).MAYBE
IORING_SQ_THREAD_CPUENV variable (sadly, it can't be changed after the ring is created).IORING_SQ_THREAD_IDLEENV variable (sadly, it can't be changed after the ring is created).Obsoletes #15634
Depends on #16209
Closes #10740