Add macos-latest to CI's os matrix #228

guilledk · 2021-08-03T14:16:21Z

This simply adds macos-latest to the testing envoirments.

There seems to be a few different failures as well as hangs using the trio backend.

You can see the actions run on my fork: https://github.com/guilledk/tractor/actions/runs/1094209479

goodboy · 2021-08-03T14:21:23Z

Nice.

Huh, so somewhat expected the hangs start during cancellation tests (our most aggressive of the bunch).

tests/test_cancellation.py FF.F.....F......F.

@guilledk would you mind also putting a 5m timeout on the actions run?
I don't think I've seen a clean job that takes longer then that.

goodboy · 2021-08-03T14:22:33Z

Yeah looks like we're gonna need a mac peep to check these out.
I'm frankly somewhat surprised this happens on the trio/subprocess spawner of all places.

guilledk · 2021-08-03T15:46:03Z

@goodboy Timeout done!

goodboy · 2021-10-23T18:13:51Z

@guilledk rebase this onto master and let's see if #245 fixed some of these failures yah?

goodboy · 2021-10-23T18:14:54Z

.github/workflows/ci.yml

@@ -24,12 +24,12 @@ jobs:

  testing:
    name: '${{ matrix.os }} Python ${{ matrix.python }} - ${{ matrix.spawn_backend }}'
-    timeout-minutes: 10
+    timeout-minutes: 5


You'll probably need to bump this again (i think) due to increases in the test set.

Every subactor in the tree now receives the socket (or whatever the mailbox type ends up being) during startup and can call the new `tractor._discovery.get_root()` function to get a portal to the current root actor in their tree. The main reason for adding this atm is to support nested child actors gaining access to the root's tty lock for debugging. Also, when a channel disconnects from a message loop, might as well kill all its rpc tasks.

This appears to demonstrate the same bug found in #156. It looks like cancelling a subactor with a child, while that child is running sync code, can result in the child never getting cancelled due to some strange condition where the internal nurseries aren't being torn down as expected when a `trio.Cancelled` is raised.

For reliable remote cancellation we need to "report" `trio.Cancelled`s (just like any other error) when exhausting a portal such that the caller can make decisions about cancelling the respective actor if need be. Resolves #156

Add `Actor._cancel_called` and `._cancel_complete` making it possible to determine whether the actor has started the cancellation sequence and whether that sequence has fully completed. This allows for blocking in internal machinery tasks as necessary. Also, always trigger the end of ongoing rpc tasks even if the last task errors; there's no guarantee the trio cancellation semantics will guarantee us a nice internal "state" without this.

This reverts commit 2b53c74.

This reverts commit 1710b64.

If the root calls `trio.Process.kill()` on immediate child proc teardown when the child is using pdb, we can get stdstreams clobbering that results in a pdb++ repl where the user can't see what's been typed. Not killing such children on cancellation / error seems to resolve this issue whilst still giving reliable termination. For now, code that special path until a time it becomes a problem for ensuring zombie reaps.

Finally this makes a cancelled root actor nursery not clobber child tasks which request and lock the root's tty for the debugger repl. Using an edge triggered event which is set after all fifo-lock-queued tasks are complete, we can be sure that no lingering child tasks are going to get interrupted during pdb use and tty lock acquisition. Further, even if new tasks do queue up to get the lock, the root will incrementally send cancel msgs to each sub-actor only once the tty is not locked by a (set of) child request task(s). Add shielding around all the critical sections where the child attempts to allocate the lock from the root such that it won't be disrupted from cancel messages from the root after the acquire lock transaction has started.

We may get multiple re-entries to debugger by `bp_forever` sub-actor now since the root will incrementally try to cancel it only when the tty lock is not held.

…e output

guilledk · 2021-10-24T21:47:12Z

Woah I wrecked this PR, let me close it and re-do it

goodboy · 2021-12-07T14:42:41Z

@guilledk lol can we get a new version?

guilledk added testing debugging labels Aug 3, 2021

guilledk requested a review from goodboy August 3, 2021 14:16

goodboy reviewed Oct 23, 2021

View reviewed changes

goodboy added 22 commits October 24, 2021 18:39

Add re-entrant root breakpoint test; demonstrates a bug..

a162ab1

Facepalm: tty locking from root doesn't require an extra task

d978404

Add a multi-subactor test where the root errors

51ca9dd

Change to relative conftest.py imports

60404b7

Add a deep nested error propagation test

45ec3ab

Add mention subactor uid during locking

e168b09

Add test showing issue with child in tty lock when cancelled

ee60df1

Report trio.Cancelled when exhausting portals..

6f2f11e

For reliable remote cancellation we need to "report" `trio.Cancelled`s (just like any other error) when exhausting a portal such that the caller can make decisions about cancelling the respective actor if need be. Resolves #156

Add pattern matching to test

749faf6

Fix missing await

4dae16d

Make tests a package (for relative imports)

f9d6502

mypy fixes

eef27b1

Add pexpect dep for debugger tests

81d8581

Add some comments

7cb729c

Skip sync sleep test on mp backend

217feee

Skip quad ex on local mp tests as well

45bfd72

Support debug mode only on the trio backend

0f0eec9

Revert "Change to relative conftest.py imports"

ee69a53

This reverts commit 2b53c74.

Revert "Make tests a package (for relative imports)"

0e6afff

This reverts commit 1710b64.

goodboy and others added 26 commits October 24, 2021 18:39

Add some brief todo notes on idea of shielded breakpoint

a75fe34

Add debug example that causes pdb stdin clobbering

717c5bb

Catch and delay errors in the root if debugger is active

e86a3b8

Don't shield debugger status wait; it causes hangs

4495a6e

Move debugger wait inside OCA nursery

862ddf7

Move some infos to runtime level

0e6cf4d

Fix hard kill in debug mode; only do it when debug lock is empty

a831c05

Distinguish between a local pdb unlock and the tty unlock in root

8778820

Comment hard-kill-sidestep for now since nursery version covers it?

43df81b

Go back to only logging tbs on no debugger

b71f607

Adjust debug tests to accomodate no more root clobbering

b7b3964

We may get multiple re-entries to debugger by `bp_forever` sub-actor now since the root will incrementally try to cancel it only when the tty lock is not held.

Add fast fail test using the context api

862657d

Drop debugger path and duplicate func from rebasing

66d6c32

Fix lock context manager return type

0d3ff35

Hide _invoke() tb, move actor error to exceptions mod

f097a54

Handle repeat child tty-acquires race

c551116

Docs and comments tidying

cda8b90

Add .alpha1 news flash

da36db4

Facepalm: fix rst hyperlinks

947c90d

Better early timeout handling, continue on child re-lock

2b00e0b

Drop leftover noisy exception logging..

578028a

Don't log cancelled inceptions seen by the root

c0412a9

Terminate async gen example caller to avoid (benign) errors in consol…

abac3f1

…e output

Add macos-latest to os matrix

bbc4c38

guilledk closed this Oct 24, 2021

goodboy mentioned this pull request Dec 7, 2021

Readme: Cluster example fails without msgspec on OS X #269

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add macos-latest to CI's os matrix #228

Add macos-latest to CI's os matrix #228

guilledk commented Aug 3, 2021

goodboy commented Aug 3, 2021

goodboy commented Aug 3, 2021

guilledk commented Aug 3, 2021

goodboy commented Oct 23, 2021

goodboy Oct 23, 2021

guilledk commented Oct 24, 2021

goodboy commented Dec 7, 2021

Add macos-latest to CI's os matrix #228

Add macos-latest to CI's os matrix #228

Conversation

guilledk commented Aug 3, 2021

goodboy commented Aug 3, 2021

goodboy commented Aug 3, 2021

guilledk commented Aug 3, 2021

goodboy commented Oct 23, 2021

goodboy Oct 23, 2021

Choose a reason for hiding this comment

guilledk commented Oct 24, 2021

goodboy commented Dec 7, 2021