-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add macos-latest to CI's os matrix #228
Conversation
Nice. Huh, so somewhat expected the hangs start during cancellation tests (our most aggressive of the bunch).
@guilledk would you mind also putting a 5m timeout on the actions run? |
Yeah looks like we're gonna need a mac peep to check these out. |
@goodboy Timeout done! |
.github/workflows/ci.yml
Outdated
@@ -24,12 +24,12 @@ jobs: | |||
|
|||
testing: | |||
name: '${{ matrix.os }} Python ${{ matrix.python }} - ${{ matrix.spawn_backend }}' | |||
timeout-minutes: 10 | |||
timeout-minutes: 5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You'll probably need to bump this again (i think) due to increases in the test set.
Every subactor in the tree now receives the socket (or whatever the mailbox type ends up being) during startup and can call the new `tractor._discovery.get_root()` function to get a portal to the current root actor in their tree. The main reason for adding this atm is to support nested child actors gaining access to the root's tty lock for debugging. Also, when a channel disconnects from a message loop, might as well kill all its rpc tasks.
This appears to demonstrate the same bug found in #156. It looks like cancelling a subactor with a child, while that child is running sync code, can result in the child never getting cancelled due to some strange condition where the internal nurseries aren't being torn down as expected when a `trio.Cancelled` is raised.
For reliable remote cancellation we need to "report" `trio.Cancelled`s (just like any other error) when exhausting a portal such that the caller can make decisions about cancelling the respective actor if need be. Resolves #156
Add `Actor._cancel_called` and `._cancel_complete` making it possible to determine whether the actor has started the cancellation sequence and whether that sequence has fully completed. This allows for blocking in internal machinery tasks as necessary. Also, always trigger the end of ongoing rpc tasks even if the last task errors; there's no guarantee the trio cancellation semantics will guarantee us a nice internal "state" without this.
This reverts commit 2b53c74.
This reverts commit 1710b64.
If the root calls `trio.Process.kill()` on immediate child proc teardown when the child is using pdb, we can get stdstreams clobbering that results in a pdb++ repl where the user can't see what's been typed. Not killing such children on cancellation / error seems to resolve this issue whilst still giving reliable termination. For now, code that special path until a time it becomes a problem for ensuring zombie reaps.
Finally this makes a cancelled root actor nursery not clobber child tasks which request and lock the root's tty for the debugger repl. Using an edge triggered event which is set after all fifo-lock-queued tasks are complete, we can be sure that no lingering child tasks are going to get interrupted during pdb use and tty lock acquisition. Further, even if new tasks do queue up to get the lock, the root will incrementally send cancel msgs to each sub-actor only once the tty is not locked by a (set of) child request task(s). Add shielding around all the critical sections where the child attempts to allocate the lock from the root such that it won't be disrupted from cancel messages from the root after the acquire lock transaction has started.
We may get multiple re-entries to debugger by `bp_forever` sub-actor now since the root will incrementally try to cancel it only when the tty lock is not held.
Woah I wrecked this PR, let me close it and re-do it |
@guilledk lol can we get a new version? |
This simply adds
macos-latest
to the testing envoirments.There seems to be a few different failures as well as hangs using the
trio
backend.You can see the actions run on my fork: https://github.com/guilledk/tractor/actions/runs/1094209479