Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent behaviour when child coroutine attaches to the parent during "completing" -> "completed" transition #3893

Closed
qwwdfsad opened this issue Sep 19, 2023 · 0 comments
Labels

Comments

@qwwdfsad
Copy link
Collaborator

qwwdfsad commented Sep 19, 2023

Steps to reproduce:

// Add this test to JobChildStressTest

@Test
fun testFailingChildIsAddedWhenJobFinalizesItsState() {
    // All exceptions should get aggregated here
    repeat(N_ITERATIONS) {
        runBlocking {
            val rogueJob = AtomicReference<Job?>()
            println(it)
            val deferred = CompletableDeferred<Unit>()
            launch(pool + deferred) {
                deferred.complete(Unit) // Transition deferred into "completing" state waiting for current child
                // **Asynchronously** submit task that launches a child so it races with completion
                pool.executor.execute {
                    rogueJob.set(launch(pool + deferred) {
                        println("isCancelled: " + coroutineContext.job.isCancelled)
                        throw TestException()
                    })
                }
            }

            deferred.join()
            if (rogueJob.get()?.isActive ?: false) {
                val rogue = rogueJob.get()!!
                println("Rogue job with parent " + rogue.parent + " and children list: " + rogue.parent?.children?.toList())
            }
        }
    }
}

What happens here:

  • Deferred is completing, waiting for the first launch (1) ChildCompletion handler to finalize its state
  • ChildCompletion invokes continueCompleting
  • In parallel, the second launch (2) is attached to the deferred
  1. Happy path: 2 successfully attaches to the parent, 1 detects that in continueCompleting and starts waiting for it. This situation is indistinguishable from deferred having two children

  2. Unhappy path #1: 1 detects there are no children and invokes finalizeFinishingState.
    Then 2 attaches itself to the parent. finalizeFinishingState reaches completeStateFinalization -> notifyCompletion and cancels the child, which might have been running for some time already.
    This is an observable and counter-intuitive (because nothing actually failed or was cancelled explicitly) behaviour.
    Also, if 2 fails with an exception, it gets reported to the global exception handler.

  3. Unhappy path #2: the same as above, but 2 attaches itself to the parent after it completely finalizes its state.
    Meaning that we have a completed deferred with no children and active non-cancelled coroutine with a parent pointing to the deferred

Note that 2) kind of emulates the behaviour "attempt to attach as a child to already completed job immediately cancels current job"

dkhalanskyjb added a commit that referenced this issue May 22, 2024
The internal implementation of `JobSupport` no longer uses
the Double-Compare Single-Swap algorithm.
Instead, the signal for the list to stop accepting this or that
kind of elements is provided explicitly.
In addition to simplifying the implementation somewhat,
this change allowed us to more precisely define when child
nodes should stop being accepted into the list, fixing a bug.

Fixes #3893

Additionally, new stress tests are added to ensure the correct
behavior.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant