Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stronger close() semantics for TaskTracker #6759

Open
conradludgate opened this issue Aug 8, 2024 · 6 comments
Open

Stronger close() semantics for TaskTracker #6759

conradludgate opened this issue Aug 8, 2024 · 6 comments
Labels
A-tokio-util Area: The tokio-util crate C-feature-request Category: A feature request. M-task Module: tokio/task

Comments

@conradludgate
Copy link
Contributor

Is your feature request related to a problem? Please describe.

TaskTracker is a great solution for waiting for tasks to shutdown, but there's a race condition if used incorrectly. Take this code

// routine is not cancel safe - we must wait for it to complete before
// exiting the application.
async fn routine() {}

fn spawn(tracker: &TaskTracker) -> Result<(), Error> {
    if tracker.is_closed() { return Err(Error); }
    
    tracker.spawn(routine());
    Ok(())
}

async fn shutdown(tracker: TaskTracker) {
    // stop spawning
    tracker.close();
    tracker.wait().await;
}

#[tokio::main]
async fn main() {
    let tracker = TaskTracker::new();

    let tracker2 = tracker.clone();
    tokio::spawn(async move {
        loop {
            tokio::time::sleep(Duration::from_secs(10)).await;
            spawn(&tracker2);
        }
    });
    
    crtl_c().await;
    shutdown(tracker).await;
}

If we're unlucky, this program might experience the ordering

tracker.is_closed() // false
tracker.close() // no tasks, so trigger wait notify
tracker.wait().await // notify triggered, no wait
tracker.spawn(...) // task started, will be cancelled after main exists

The close() name suggests slightly stronger semantics, similar to channel closure,
which might suggest why this might be incorrectly relied upon.

Describe the solution you'd like

A stronger token_if_not_closed() API that returns a Result<TaskTrackerToken, TrackerCloseError>.

Internally it could perform:

pub fn token_if_not_closed(&self) -> Result<TaskTrackerToken, TrackerCloseError> {
    let token = self.token();
    if self.is_closed() {
        return Err(TrackerCloseError(()));
    }
    Ok(token);
}

This might cause some extra notifications to be triggered, but that seems OK to me.

Downsides here is that it would cause confusion with the 6 existing spawn_... family of functions. Unclear whether we should create a second TaskTracker with this api, or use the same TaskTracker with two sets of the spawn functions.

Describe alternatives you've considered

We could use RwLock<TaskTracker> ourselves to have much stronger semantics. Only close and wait on write lock. Only spawn on try_read lock. Alternatively, we could use the token dance suggested above, but we miss out on the spawn convenience functions.

Additional context

I noticed this desired while reviewing this commit: neondatabase/neon@d79da8f
Our internal discussions are available in the associated PR.

@conradludgate conradludgate added A-tokio Area: The main tokio crate C-feature-request Category: A feature request. labels Aug 8, 2024
@Darksonn Darksonn added A-tokio-util Area: The tokio-util crate M-task Module: tokio/task and removed A-tokio Area: The main tokio crate labels Aug 8, 2024
@conradludgate
Copy link
Contributor Author

Alternative solution: Split API like a channel. TaskSpawner and TaskWaiter. TaskWaiter has 2 functions: async fn wait(self) -> () and fn spawner(&self) -> TaskSpawner. Waiting on the TaskWaiter immediately marks it for shutdown and any spawn attempts on the TaskSpawner will error - much like a mpsc::Sender::send().

@Darksonn
Copy link
Contributor

Darksonn commented Aug 17, 2024

Thanks for raising this. I'm wondering if this is highlighting a design mistake in TaskTracker. I think it would be worth to think about what it would look like if we redesign it from the ground up. Perhaps the method named spawn should be fallible? Should we have two different kinds of trackers, where only one supports closing it? Whatever design we pick, the goal should be to make it really difficult to accidentally use it incorrectly.

If there's a sufficiently compelling design that isn't backwards-compatible, we could potentially make a 0.8.0 release using it. A breaking release of tokio-util is overdue anyway.

@hmaka
Copy link
Contributor

hmaka commented Sep 10, 2024

Thanks again for raising this. There seems to be two important problems with TaskTracker current implementation.

  1. spawn is possible on a closed TaskTracker
  2. uninitiated users might call wait on a TaskTracker before closing--resulting in wait sleeping forever.

Proposal:
We change TaskTracker to have the following behavior:

  • spawn will be fallible, returning result ok or spawnError (or some error).
  • close will stop the ability to spawn on taskTracker. All further spawn calls will result in error.
  • wait will wait until all tasks on TaskTracker complete, then close the TaskTracker before returning.

I also wonder if an additional method such as the one below would be helpful:
-tasks_done will wait until all tasks on TaskTracker complete, then return without closing the TaskTracker

Welcome suggestions, and am happy to implement whatever is decided.
Also thanks to @Darksonn for helping me think through this.

@Darksonn
Copy link
Contributor

One concern I have is what happens to existing users that upgrade without changing their code. Right now, they might call close, then spawn the tasks, and then call wait. This would break with this change. Though it would also result in a warning.

@hmaka
Copy link
Contributor

hmaka commented Sep 23, 2024

Hmm I guess it would be a breaking change that breaks in a not nice way.
An alternative I could think of is making spawn panic when a TaskTracker is closed and making a new method try_spawn which would be the fallible one.

I guess another benefit of this distinction might be clear syntax that would indicate how TaskTracker is being used? For example seeing try_spawn would indicate that there's a high chance TaskTracker was cloned and it's unclear it's still open, while spawn would indicate a strong belief that TaskTracker is still open.

@Darksonn
Copy link
Contributor

One possibility is to get rid of close completely. That way, we only have spawn and wait methods, and waiting waits for it to become empty.

If we want to support closing the tracker, we could have a separate type that does support closing with your proposal.

Or if we only want one type, maybe we rename the type to something else, or maybe we rename close or wait.

Basically, if we change the behavior, I'd like to make a change that breaks the compilation of existing users.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-tokio-util Area: The tokio-util crate C-feature-request Category: A feature request. M-task Module: tokio/task
Projects
None yet
Development

No branches or pull requests

3 participants