Use TaskTracket in streamer for graceful exit and avoiding flaky tests#8066
Conversation
14b3982 to
1f540db
Compare
|
localnet failure is due to exit timeout. I suspect that this is due to we rely on runtime to shutdown instead of proper cancelling all the tasks because now we wait for all the tasks to finish first. This is fixed in #8025 |
d682f46 to
ee7d17a
Compare
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #8066 +/- ##
=========================================
- Coverage 83.0% 83.0% -0.1%
=========================================
Files 826 826
Lines 362561 362581 +20
=========================================
+ Hits 301086 301100 +14
- Misses 61475 61481 +6 🚀 New features to boost your workflow:
|
|
@lijunwangs @alexpyattaev this PR is ready for review. |
lijunwangs
left a comment
There was a problem hiding this comment.
The changes itself look good. Do we have any performance bench with/without this and see if any difference we should worry about as this related to the core of tracking tasks?
alexpyattaev
left a comment
There was a problem hiding this comment.
LGTM. TaskTracker is an AtomicUsize, it should not impact perf.
It is struct TaskTrackerInner {
state: AtomicUsize,
on_last_exit: Notify,
}So it is really lightweight and hardly this will lead to any observable performance effect. |
Problem
When streamer service is launched, it spawns task and returns handle for this task.
In tests we typically do the following:
Yet inside of this root task of streamer, it creates many other tasks to handle connections and streams, which this root task has no idea about.
Which means that when the root task has stopped we have no idea if it's child tasks have completed or not.
This leads to flaky tests. For example, in
nonblocking::quic::test::test_quic_server_multiple_streamswe stop the streamer and after that checkstats.total_connections.load(Ordering::Relaxed). This counter counts how many task are handling at giving moment connections. It might happen that these tasks haven't finished before we do this check, so all the checks belowtask_handle.awaitare potentially flaky.Summary of Changes
Use
TaskTrackeras tokio documentation suggests