Stronger `close()` semantics for `TaskTracker` #6759

conradludgate · 2024-08-08T10:15:05Z

Is your feature request related to a problem? Please describe.

TaskTracker is a great solution for waiting for tasks to shutdown, but there's a race condition if used incorrectly. Take this code

// routine is not cancel safe - we must wait for it to complete before
// exiting the application.
async fn routine() {}

fn spawn(tracker: &TaskTracker) -> Result<(), Error> {
    if tracker.is_closed() { return Err(Error); }
    
    tracker.spawn(routine());
    Ok(())
}

async fn shutdown(tracker: TaskTracker) {
    // stop spawning
    tracker.close();
    tracker.wait().await;
}

#[tokio::main]
async fn main() {
    let tracker = TaskTracker::new();

    let tracker2 = tracker.clone();
    tokio::spawn(async move {
        loop {
            tokio::time::sleep(Duration::from_secs(10)).await;
            spawn(&tracker2);
        }
    });
    
    crtl_c().await;
    shutdown(tracker).await;
}

If we're unlucky, this program might experience the ordering

tracker.is_closed() // false
tracker.close() // no tasks, so trigger wait notify
tracker.wait().await // notify triggered, no wait
tracker.spawn(...) // task started, will be cancelled after main exists

The close() name suggests slightly stronger semantics, similar to channel closure,
which might suggest why this might be incorrectly relied upon.

Describe the solution you'd like

A stronger token_if_not_closed() API that returns a Result<TaskTrackerToken, TrackerCloseError>.

Internally it could perform:

pub fn token_if_not_closed(&self) -> Result<TaskTrackerToken, TrackerCloseError> {
    let token = self.token();
    if self.is_closed() {
        return Err(TrackerCloseError(()));
    }
    Ok(token);
}

This might cause some extra notifications to be triggered, but that seems OK to me.

Downsides here is that it would cause confusion with the 6 existing spawn_... family of functions. Unclear whether we should create a second TaskTracker with this api, or use the same TaskTracker with two sets of the spawn functions.

Describe alternatives you've considered

We could use RwLock<TaskTracker> ourselves to have much stronger semantics. Only close and wait on write lock. Only spawn on try_read lock. Alternatively, we could use the token dance suggested above, but we miss out on the spawn convenience functions.

Additional context

I noticed this desired while reviewing this commit: neondatabase/neon@d79da8f
Our internal discussions are available in the associated PR.

The text was updated successfully, but these errors were encountered:

conradludgate · 2024-08-08T10:30:43Z

Alternative solution: Split API like a channel. TaskSpawner and TaskWaiter. TaskWaiter has 2 functions: async fn wait(self) -> () and fn spawner(&self) -> TaskSpawner. Waiting on the TaskWaiter immediately marks it for shutdown and any spawn attempts on the TaskSpawner will error - much like a mpsc::Sender::send().

Darksonn · 2024-08-17T17:10:55Z

Thanks for raising this. I'm wondering if this is highlighting a design mistake in TaskTracker. I think it would be worth to think about what it would look like if we redesign it from the ground up. Perhaps the method named spawn should be fallible? Should we have two different kinds of trackers, where only one supports closing it? Whatever design we pick, the goal should be to make it really difficult to accidentally use it incorrectly.

If there's a sufficiently compelling design that isn't backwards-compatible, we could potentially make a 0.8.0 release using it. A breaking release of tokio-util is overdue anyway.

hmaka · 2024-09-10T04:25:44Z

Thanks again for raising this. There seems to be two important problems with TaskTracker current implementation.

spawn is possible on a closed TaskTracker
uninitiated users might call wait on a TaskTracker before closing--resulting in wait sleeping forever.

Proposal:
We change TaskTracker to have the following behavior:

spawn will be fallible, returning result ok or spawnError (or some error).
close will stop the ability to spawn on taskTracker. All further spawn calls will result in error.
wait will wait until all tasks on TaskTracker complete, then close the TaskTracker before returning.

I also wonder if an additional method such as the one below would be helpful:
-tasks_done will wait until all tasks on TaskTracker complete, then return without closing the TaskTracker

Welcome suggestions, and am happy to implement whatever is decided.
Also thanks to @Darksonn for helping me think through this.

Darksonn · 2024-09-23T16:03:07Z

One concern I have is what happens to existing users that upgrade without changing their code. Right now, they might call close, then spawn the tasks, and then call wait. This would break with this change. Though it would also result in a warning.

hmaka · 2024-09-23T16:31:45Z

Hmm I guess it would be a breaking change that breaks in a not nice way.
An alternative I could think of is making spawn panic when a TaskTracker is closed and making a new method try_spawn which would be the fallible one.

I guess another benefit of this distinction might be clear syntax that would indicate how TaskTracker is being used? For example seeing try_spawn would indicate that there's a high chance TaskTracker was cloned and it's unclear it's still open, while spawn would indicate a strong belief that TaskTracker is still open.

Darksonn · 2024-09-26T11:45:58Z

One possibility is to get rid of close completely. That way, we only have spawn and wait methods, and waiting waits for it to become empty.

If we want to support closing the tracker, we could have a separate type that does support closing with your proposal.

Or if we only want one type, maybe we rename the type to something else, or maybe we rename close or wait.

Basically, if we change the behavior, I'd like to make a change that breaks the compilation of existing users.

conradludgate added A-tokio Area: The main tokio crate C-feature-request Category: A feature request. labels Aug 8, 2024

Darksonn added A-tokio-util Area: The tokio-util crate M-task Module: tokio/task and removed A-tokio Area: The main tokio crate labels Aug 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stronger `close()` semantics for `TaskTracker` #6759

Stronger `close()` semantics for `TaskTracker` #6759

conradludgate commented Aug 8, 2024

conradludgate commented Aug 8, 2024

Darksonn commented Aug 17, 2024 •

edited

Loading

hmaka commented Sep 10, 2024

Darksonn commented Sep 23, 2024

hmaka commented Sep 23, 2024

Darksonn commented Sep 26, 2024

Stronger close() semantics for TaskTracker #6759

Stronger close() semantics for TaskTracker #6759

Comments

conradludgate commented Aug 8, 2024

conradludgate commented Aug 8, 2024

Darksonn commented Aug 17, 2024 • edited Loading

hmaka commented Sep 10, 2024

Darksonn commented Sep 23, 2024

hmaka commented Sep 23, 2024

Darksonn commented Sep 26, 2024

Stronger `close()` semantics for `TaskTracker` #6759

Stronger `close()` semantics for `TaskTracker` #6759

Darksonn commented Aug 17, 2024 •

edited

Loading