Skip to content

Fix: raise on manual fiber resume from sleep#15744

Merged
straight-shoota merged 4 commits intocrystal-lang:masterfrom
ysbaddaden:fix/evloop-raise-on-manual-resume-from-sleep
May 8, 2025
Merged

Fix: raise on manual fiber resume from sleep#15744
straight-shoota merged 4 commits intocrystal-lang:masterfrom
ysbaddaden:fix/evloop-raise-on-manual-resume-from-sleep

Conversation

@ysbaddaden
Copy link
Collaborator

We can suspend a fiber by calling sleep(time) and we expect the fiber to only be resumed when the sleep time expires, but there is nothing preventing to enqueue the fiber (because of errors), which will unexpectedly resume the fiber early.

This leads to at best errors (wake_at can't be nil) as well as segfaults because the event, allocated on the stack, is still in timers the sleep method returned, or to double resumes, ...

This should usually not happen, unless there is a programming error in the runtime, or someone wants to use sleep as a timeout mechanism (it's not).

I discovered this the hard way while implementing sync primitives. Having an explicit exception is much nicer than segfaults and timer related errors because Fiber.yield got manually resumed.

We can suspend a fiber by calling `sleep(time)` and we expect the fiber
to only be resumed when the sleep time expires, but there is nothing
preventing to enqueue the fiber (because of errors), which will
unexpectedly resume the fiber early.

This leads to at best errors (`wake_at` can't be nil) as well as
segfaults because the event, allocated on the stack, is still in timers
the sleep method returned, or to double resumes, ...

This should usually not happen, unless there is a programming error in
the runtime, but someone wants to use `sleep` as a timeout mechanism
(it's not).
@ysbaddaden
Copy link
Collaborator Author

Related: I think the Crystal::EventLoop::IOCP#timeout method ain't MT (execution context) safe: we try to cleanup after the fiber has resumed and before it returns (fine) but another thread might be running the evloop, and have dequeued the timer and can enqueue the fiber when it shouldn't.

This is missing a resume-once safety mechanism, as for the timeout action. Maybe we should introduce a general timeout mechanism for a sleep that can be safely resumed early, and it would allow to implement timeouts in sync primitives 🤔

@straight-shoota straight-shoota added this to the 1.17.0 milestone May 6, 2025
@straight-shoota straight-shoota merged commit 20f67be into crystal-lang:master May 8, 2025
38 of 39 checks passed
@ysbaddaden ysbaddaden deleted the fix/evloop-raise-on-manual-resume-from-sleep branch May 10, 2025 12:55
@straight-shoota straight-shoota added kind:bug A bug in the code. Does not apply to documentation, specs, etc. and removed kind:refactor labels Jun 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kind:bug A bug in the code. Does not apply to documentation, specs, etc. topic:stdlib:runtime

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants