Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add macos-latest to CI's os matrix #228

Closed
wants to merge 337 commits into from
Closed

Add macos-latest to CI's os matrix #228

wants to merge 337 commits into from

Conversation

guilledk
Copy link
Collaborator

@guilledk guilledk commented Aug 3, 2021

This simply adds macos-latest to the testing envoirments.

There seems to be a few different failures as well as hangs using the trio backend.

You can see the actions run on my fork: https://github.com/guilledk/tractor/actions/runs/1094209479

@goodboy
Copy link
Owner

goodboy commented Aug 3, 2021

Nice.

Huh, so somewhat expected the hangs start during cancellation tests (our most aggressive of the bunch).

tests/test_cancellation.py FF.F.....F......F.

@guilledk would you mind also putting a 5m timeout on the actions run?
I don't think I've seen a clean job that takes longer then that.

@goodboy
Copy link
Owner

goodboy commented Aug 3, 2021

Yeah looks like we're gonna need a mac peep to check these out.
I'm frankly somewhat surprised this happens on the trio/subprocess spawner of all places.

@guilledk
Copy link
Collaborator Author

guilledk commented Aug 3, 2021

@goodboy Timeout done!

@goodboy
Copy link
Owner

goodboy commented Oct 23, 2021

@guilledk rebase this onto master and let's see if #245 fixed some of these failures yah?

@@ -24,12 +24,12 @@ jobs:

testing:
name: '${{ matrix.os }} Python ${{ matrix.python }} - ${{ matrix.spawn_backend }}'
timeout-minutes: 10
timeout-minutes: 5
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'll probably need to bump this again (i think) due to increases in the test set.

Every subactor in the tree now receives the socket (or whatever the
mailbox type ends up being) during startup and can call the new
`tractor._discovery.get_root()` function to get a portal to the current
root actor in their tree. The main reason for adding this atm is to
support nested child actors gaining access to the root's tty lock for
debugging.

Also, when a channel disconnects from a message loop, might as well kill
all its rpc tasks.
This appears to demonstrate the same bug found in #156. It looks like
cancelling a subactor with a child, while that child is running sync code,
can result in the child never getting cancelled due to some strange
condition where the internal nurseries aren't being torn down as
expected when a `trio.Cancelled` is raised.
For reliable remote cancellation we need to "report" `trio.Cancelled`s
(just like any other error) when exhausting a portal such that the
caller can make decisions about cancelling the respective actor if need
be.

Resolves #156
Add `Actor._cancel_called` and `._cancel_complete` making it possible to
determine whether the actor has started the cancellation sequence and
whether that sequence has fully completed. This allows for blocking in
internal machinery tasks as necessary. Also, always trigger the end of
ongoing rpc tasks even if the last task errors; there's no guarantee the
trio cancellation semantics will guarantee us a nice internal "state"
without this.
goodboy and others added 26 commits October 24, 2021 18:39
If the root calls `trio.Process.kill()` on immediate child proc teardown
when the child is using pdb, we can get stdstreams clobbering that
results in a pdb++ repl where the user can't see what's been typed. Not
killing such children on cancellation / error seems to resolve this
issue whilst still giving reliable termination. For now, code that
special path until a time it becomes a problem for ensuring zombie
reaps.
Finally this makes a cancelled root actor nursery not clobber child
tasks which request and lock the root's tty for the debugger repl.

Using an edge triggered event which is set after all fifo-lock-queued
tasks are complete, we can be sure that no lingering child tasks are
going to get interrupted during pdb use and tty lock acquisition.
Further, even if new tasks do queue up to get the lock, the root will
incrementally send cancel msgs to each sub-actor only once the tty is
not locked by a (set of) child request task(s). Add shielding around all
the critical sections where the child attempts to allocate the lock from
the root such that it won't be disrupted from cancel messages from the
root after the acquire lock transaction has started.
We may get multiple re-entries to debugger by `bp_forever` sub-actor
now since the root will incrementally try to cancel it only when the tty
lock is not held.
@guilledk
Copy link
Collaborator Author

Woah I wrecked this PR, let me close it and re-do it

@goodboy
Copy link
Owner

goodboy commented Dec 7, 2021

@guilledk lol can we get a new version?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants