Conversation
|
|
||
| - Go stores all timers into a min-heap (4-ary) but allocates timers in the GC | ||
| HEAP and merely marks cancelled timers on delete. I didn't investigate how | ||
| it deals with the tombstones. |
There was a problem hiding this comment.
This sounds worth investigating further to me, I actually wrote a message above which described doing exactly this, but I deleted it when I read this part. Tombstones can probably be kept around until dequeue, though there may be other opportune times to delete them if scanning/moving entries anyway.
There was a problem hiding this comment.
It could be interesting to understand, but I wonder about the benefit.
Keeping the tombstones means they might stay for seconds, minutes or hours despite having been cancelled. The occupancy would no longer be how many active timers there are now, but the total number of timers created in the last N seconds/minutes/hours.
They also increase the cost of delete-min: it must be repeated multiple times until we reach a not cancelled timer (not cool).
We'd have to allocate the event in the GC HEAP (we currently allocate events on the stack) and they'd stay allocated until they finally leave the 4-heap.
We can probably clear the tombstones we meet as we swap items (cool), but that means dereferencing each pointer, which reduces the CPU cache benefit of the flat array...
There was a problem hiding this comment.
Yes, the practicalities of the solution might outweigh any performance benefit, but Go going that way provides signal to me it's worth doing the benchmarking.
There was a problem hiding this comment.
It could be a cleanup during a GC collection 🤔
Co-authored-by: Johannes Müller <straightshoota@gmail.com>
Related to [RFC #12](crystal-lang/rfcs#12). Replaces the `Deque` used in #14996 for a min [Pairing Heap] which is a kind of [Mergeable Heap] and is one of the best performing heap in practical tests when arbitrary deletions are required (think cancelling a timeout), otherwise a D-ary Heap (e.g. 4-heap) will usually perform better. See the [A Nearly-Tight Analysis of Multipass Pairing Heaps](https://epubs.siam.org/doi/epdf/10.1137/1.9781611973068.52) paper or the Wikipedia page for more details. The implementation itself is based on the [Pairing Heaps: Experiments and Analysis](https://dl.acm.org/doi/pdf/10.1145/214748.214759) paper, and merely implements a recursive twopass algorithm (the auxiliary twopass might perform even better). The `Crystal::PointerPairingList(T)` type is generic and relies on intrusive nodes (the links are into `T`) to avoid extra allocations for the nodes (same as `Crystal::PointerLinkedList(T)`). It also requires a `T#heap_compare` method, so we can use the same type for a min or max heap, or to build a more complex comparison. Note: I also tried a 4-heap, and while it performs very well and only needs a flat array, the arbitrary deletion (e.g. cancelling timeout) needs a linear scan and its performance quickly plummets, even at low occupancy, and becomes painfully slow at higher occupancy (tens of microseconds on _each_ delete, while the pairing heap does it in tens of nanoseconds). Follow up to #14996 [Mergeable Heap]: https://en.wikipedia.org/wiki/Mergeable_heap [Pairing Heap]: https://en.wikipedia.org/wiki/Pairing_heap [D-ary Heap]: https://en.wikipedia.org/wiki/D-ary_heap Co-authored-by: Linus Sellberg <linus.sellberg@nj.se> Co-authored-by: Johannes Müller <straightshoota@gmail.com>
Co-authored-by: Vlad Zarakovsky <vlad.zar@gmail.com>
| - determine the next timer to expire, so we can decide for how long a process or | ||
| thread can be suspended (usually when there is nothing to do). |
There was a problem hiding this comment.
The need for this depends on the event loop. Is there any need at all for all this complexity if the event loop is driven by io_uring? Just emit a timeout event for each fiber that is waiting, and that's it. I'm all for shared code between underlying event loops that are limited in what they can do, but are there any reason to lock event loops into more structure?
It also supports timeouting io operations.
There was a problem hiding this comment.
io_uring supporting timeouts on IO operations => ❤️
Each eventloop can use whatever it pleases, yet... there are still "select action timeouts" that can be cancelled, so even io_uring will need to support an arbitrary dequeue of timeouts.
Even with events to notify the blocking waits (which we use for epoll, kqueue and IOCP), we still need to rearm the timer after it triggered (for example) and need to know when the next timer is expiring. I don't think io_uring will be treated differently.
So far, my naive vision is for io_uring to notify an eventfd registered to an epoll instance, along with a timerfd (for precise timers), waiting on arbitrary fd (#wait_readable and #wait_writable) and eventually more niceties (i.e. signalfd and pidfd).
There was a problem hiding this comment.
Each eventloop can use whatever it pleases, yet... there are still "select action timeouts" that can be cancelled,
Yes. For example using the uring op TIMEOUT_REMOVE.
That said, it is when considering what actually goes on in a select action loop that made me really dislike it in general. So much pointless teardown and rearming..
so even io_uring will need to support an arbitrary dequeue of timeouts.
No, that does not follow. It may be an issue if we are not ok waiting for the response to the timeout removal I guess, and it also need to handle the race condition where the timer is already triggering and execute before the actual timeout removal. But it is definitely doable without.
timerfd(for precise timers),
FWIW, uring timeout op also take timespec structs as arguments with the same precision as timerfd. What uring doesn't seem to support is the periodic part of the argument, but instead there is an MULTISHOT argument if you want repeating triggers.
So far, my naive vision is for io_uring to notify an eventfd registered to an epoll instance
I'd suggest not using epoll at all and instead use the uring POLL op, which does more or less the same but a lot simpler.
But in any case I guess it doesn't matter too much as it doesn't really impact the public interfaces so in the end it can be changed when the need arises..
There was a problem hiding this comment.
Thanks for all the information! I'll have to dig much deeper into io_uring capabilities.
AFAIK all timeouts in the Linux kernel go into the timer wheel (tick based, low precision, now loses precision over time) while timers go into hrtimer (high precision, no ticks, nanoseconds clock).
I'd expect io_uring timeouts to end up into the timer wheel, which is fine for timeouts, but I'd like to keep timerfd for sleep(seconds).
Co-authored-by: Vlad Zarakovsky <vlad.zar@gmail.com>
No description provided.