make tracing a task public so self-tracing is possible #6972

arielb1 · 2024-11-13T16:31:52Z

There is some desire to make it possible for tasks to trace themselves to discover slow wakeups, something similar to the test I added.

This PR makes some functions public (but unstable) to make that easier.

I currently shared it with someone I'm working with to see whether it helps them debug their "slow task" problems. I'll make this PR less WIP after I get feedback from them.

WIP: actually add docs.

arielb1 · 2024-11-13T17:07:20Z

cc @carllerche

also for seeing how to make this pass CI without removing the test

arielb1 · 2024-11-18T11:49:42Z

[PR status: I'm having some people check out this PR to see if it helps them debug some "slow task" problems. After I see the lessons learned I'll make it less WIP]

jlizen · 2024-11-21T01:14:42Z

Chiming in, @arielb1 asked me to poke at this implementation in a somewhat realistic scenario where it might be useful.

One such case that I've encountered is while operating a service that uses a Redis cluster as a backend for distributed precision/adaptive throttling, with various multi-step state operations and varying key cardinality. Redis's single threaded event loop and key space partitioning makes it prone to bottlenecks if code is poorly optimized. Debugging performance issues can be complex and require a decent amount of context on Redis's underlying behaviors. Since, Redis-perspective per-command logs typically don't include time spent while waiting in the event loop queue, metrics tend to be heavily aggregated, and performance issues are difficult to directly trigger without large-scale load testing (and with realistic traffic shapes).

A bit of magic to systematically trace the slower futures without a lot of manual instrumentation by end users, and without introducing new bottlenecks due to blanket log/metric outputs, would be quite handy. This change would open up that door, and then other libraries could wrap their futures with self-tracing logic.

I threw together a crude simulation where I ran a redis cluster locally (8 nodes, key space evenly distributed). I then simulated a big (and fairly hot) key alongside a bunch of normal keys. My wrapper future takes a trace + dumps that trace as well as a human-readable key name, in case the duration of the total lookup is > 500ms.

This specific implementation is pretty heavy-handed for the use case above compared to a plain timer + simple event output. But, I can imagine end user scenarios where there are more complex futures that contain multiple i/o calls, different function contexts, etc, where the task trace might be handy. Ariel has another open PR that will also probably make access to the backtrace more useful.

Probably the larger benefit is for libraries rather than usage directly by end users. It might be nice for @arielb1 to look at how this API feels in e.g. a tower layer, which seems like a good use case for this.

My code and output are below. You'll see a subset of my bigkey calls tripping the threshold, as well as certain regular keys that are routed to the same node and stuck behind the bigkeys in line.

Here is my code:

use std::{future::Future, pin::Pin, task::{Context, Poll}, time::{Duration, Instant}};

use itertools::Itertools;
use rand::distributions::{Alphanumeric, DistString};
use redis::{cluster::ClusterClient, cluster_async::ClusterConnection, AsyncCommands};
use tokio::{runtime::dump::{Root, Trace}, task::JoinSet};
use tracing::{error, info_span, warn, Instrument};
use tracing_subscriber;

#[tokio::main]
async fn main() {
    tracing_subscriber::fmt::init(); 
    let client = ClusterClient::new(vec!["redis://127.0.0.1:6379"]).unwrap();
    let connection = client.get_async_connection().await.unwrap();

    let elements: Vec<(&'static str, &'static str)> = get_good_keys().into_iter().interleave(get_bad_keys().into_iter()).collect();

    let mut set = JoinSet::new();
    for (key, value) in elements.iter() {
        set.spawn(maybe_traced_redis_set(key, value, connection.clone()));
    }

    set.join_all().await;
}


async fn maybe_traced_redis_set(key: &'static str, value: &'static str, mut connection: ClusterConnection) {
    let span = info_span!("redis_set", key = key);
    let future = connection.set::<&str, &str, String>(key, value);
    let maybe_traced_future = SelfTracingFuture::new(future).instrument(span);

    if let Err(e) = maybe_traced_future.await {
        error!("error setting {key}: {e}");
    }
}


fn get_good_keys() -> Vec<(&'static str, &'static str)> {
    (0..200).map(|el| {
        let leaked: &'static str = el.to_string().leak();
        (leaked, leaked)
    }).collect()
}

fn get_bad_keys() -> Vec<(&'static str, &'static str)> {
    let big_key: &'static str = Alphanumeric.sample_string(&mut rand::thread_rng(), 5_000_000).leak();

    vec![("big_key", big_key); 50]
}


pin_project_lite::pin_project! { pub struct SelfTracingFuture<F: Future> {
        #[pin]
        f: Root<F>,
        t_last: State,
    }
}

enum State {
    NotStarted,
    Running { since: Instant },
    Alerted,
}


impl<F: Future> SelfTracingFuture<F> {
    fn new(f: F) -> Self {
        Self {
            f: Trace::root(f),
            t_last: State::NotStarted,
        }
    }
}


impl<F: Future> Future for SelfTracingFuture<F> {
    type Output = F::Output;

    fn poll(self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<F::Output> {
        let mut this = self.project();
        let now = Instant::now();
        let t_last = match this.t_last {
            State::Running { since } => Some(*since),
            State::NotStarted => {
                *this.t_last = State::Running { since: now };
                None
            }
            State::Alerted => {
                // don't double-alert for the same future
                None
            }
        };
        if t_last.is_some_and(|t_last| now.duration_since(t_last) > Duration::from_millis(500)) {
            let (res, trace) = tokio::runtime::dump::Trace::capture(|| this.f.as_mut().poll(cx));
            *this.t_last = State::Alerted;
            warn!("future ran long: {trace}");
            
            return res;
        }
        this.f.poll(cx)
    }
}

And here is my output:

     Running `build/private/cargo-target/debug/test_trace`
2024-11-21T02:20:21.795154Z  WARN redis_set{key="big_key"}: test_trace: future ran long: ╼ <core::pin::Pin<P> as core::future::future::Future>::poll at /rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14/library/core/src/future/future.rs:123:9
  └╼ redis::commands::AsyncCommands::set::{{closure}} at /workplace/jlizen/GokuBotsAtpAggregator/src/GokuRustCommons/build/private/cargo-home/registry/src/-6353be53fb99b376/redis-0.25.4/src/commands/macros.rs:169:69
     └╼ redis::cmd::Cmd::query_async::{{closure}} at /workplace/jlizen/GokuBotsAtpAggregator/src/GokuRustCommons/build/private/cargo-home/registry/src/-6353be53fb99b376/redis-0.25.4/src/cmd.rs:434:48
        └╼ <core::pin::Pin<P> as core::future::future::Future>::poll at /rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14/library/core/src/future/future.rs:123:9
           └╼ redis::cluster_async::ClusterConnection<C>::route_command::{{closure}} at /workplace/jlizen/GokuBotsAtpAggregator/src/GokuRustCommons/build/private/cargo-home/registry/src/-6353be53fb99b376/redis-0.25.4/src/cluster_async/mod.rs:107:14
              └╼ <tokio::sync::oneshot::Receiver<T> as core::future::future::Future>::poll at /workplace/jlizen/GokuBotsAtpAggregator/src/tokio/tokio/src/sync/oneshot.rs:1112:30
                 └╼ tokio::sync::oneshot::Inner<T>::poll_recv at /workplace/jlizen/GokuBotsAtpAggregator/src/tokio/tokio/src/sync/oneshot.rs:1143:16
2024-11-21T02:20:21.985727Z  WARN redis_set{key="big_key"}: test_trace: future ran long: ╼ <core::pin::Pin<P> as core::future::future::Future>::poll at /rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14/library/core/src/future/future.rs:123:9
  └╼ redis::commands::AsyncCommands::set::{{closure}} at /workplace/jlizen/GokuBotsAtpAggregator/src/GokuRustCommons/build/private/cargo-home/registry/src/-6353be53fb99b376/redis-0.25.4/src/commands/macros.rs:169:69
     └╼ redis::cmd::Cmd::query_async::{{closure}} at /workplace/jlizen/GokuBotsAtpAggregator/src/GokuRustCommons/build/private/cargo-home/registry/src/-6353be53fb99b376/redis-0.25.4/src/cmd.rs:434:48
        └╼ <core::pin::Pin<P> as core::future::future::Future>::poll at /rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14/library/core/src/future/future.rs:123:9
           └╼ redis::cluster_async::ClusterConnection<C>::route_command::{{closure}} at /workplace/jlizen/GokuBotsAtpAggregator/src/GokuRustCommons/build/private/cargo-home/registry/src/-6353be53fb99b376/redis-0.25.4/src/cluster_async/mod.rs:107:14
              └╼ <tokio::sync::oneshot::Receiver<T> as core::future::future::Future>::poll at /workplace/jlizen/GokuBotsAtpAggregator/src/tokio/tokio/src/sync/oneshot.rs:1112:30
                 └╼ tokio::sync::oneshot::Inner<T>::poll_recv at /workplace/jlizen/GokuBotsAtpAggregator/src/tokio/tokio/src/sync/oneshot.rs:1143:16
2024-11-21T02:20:21.985690Z  WARN redis_set{key="big_key"}: test_trace: future ran long: ╼ <core::pin::Pin<P> as core::future::future::Future>::poll at /rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14/library/core/src/future/future.rs:123:9
  └╼ redis::commands::AsyncCommands::set::{{closure}} at /workplace/jlizen/GokuBotsAtpAggregator/src/GokuRustCommons/build/private/cargo-home/registry/src/-6353be53fb99b376/redis-0.25.4/src/commands/macros.rs:169:69
     └╼ redis::cmd::Cmd::query_async::{{closure}} at /workplace/jlizen/GokuBotsAtpAggregator/src/GokuRustCommons/build/private/cargo-home/registry/src/-6353be53fb99b376/redis-0.25.4/src/cmd.rs:434:48
        └╼ <core::pin::Pin<P> as core::future::future::Future>::poll at /rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14/library/core/src/future/future.rs:123:9
           └╼ redis::cluster_async::ClusterConnection<C>::route_command::{{closure}} at /workplace/jlizen/GokuBotsAtpAggregator/src/GokuRustCommons/build/private/cargo-home/registry/src/-6353be53fb99b376/redis-0.25.4/src/cluster_async/mod.rs:107:14
              └╼ <tokio::sync::oneshot::Receiver<T> as core::future::future::Future>::poll at /workplace/jlizen/GokuBotsAtpAggregator/src/tokio/tokio/src/sync/oneshot.rs:1112:30
                 └╼ tokio::sync::oneshot::Inner<T>::poll_recv at /workplace/jlizen/GokuBotsAtpAggregator/src/tokio/tokio/src/sync/oneshot.rs:1143:16
2024-11-21T02:20:21.985997Z  WARN redis_set{key="big_key"}: test_trace: future ran long: ╼ <core::pin::Pin<P> as core::future::future::Future>::poll at /rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14/library/core/src/future/future.rs:123:9
  └╼ redis::commands::AsyncCommands::set::{{closure}} at /workplace/jlizen/GokuBotsAtpAggregator/src/GokuRustCommons/build/private/cargo-home/registry/src/-6353be53fb99b376/redis-0.25.4/src/commands/macros.rs:169:69
     └╼ redis::cmd::Cmd::query_async::{{closure}} at /workplace/jlizen/GokuBotsAtpAggregator/src/GokuRustCommons/build/private/cargo-home/registry/src/-6353be53fb99b376/redis-0.25.4/src/cmd.rs:434:48
        └╼ <core::pin::Pin<P> as core::future::future::Future>::poll at /rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14/library/core/src/future/future.rs:123:9
           └╼ redis::cluster_async::ClusterConnection<C>::route_command::{{closure}} at /workplace/jlizen/GokuBotsAtpAggregator/src/GokuRustCommons/build/private/cargo-home/registry/src/-6353be53fb99b376/redis-0.25.4/src/cluster_async/mod.rs:107:14
              └╼ <tokio::sync::oneshot::Receiver<T> as core::future::future::Future>::poll at /workplace/jlizen/GokuBotsAtpAggregator/src/tokio/tokio/src/sync/oneshot.rs:1112:30
                 └╼ tokio::sync::oneshot::Inner<T>::poll_recv at /workplace/jlizen/GokuBotsAtpAggregator/src/tokio/tokio/src/sync/oneshot.rs:1143:16
2024-11-21T02:20:21.986514Z  WARN redis_set{key="big_key"}: test_trace: future ran long: ╼ <core::pin::Pin<P> as core::future::future::Future>::poll at /rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14/library/core/src/future/future.rs:123:9
  └╼ redis::commands::AsyncCommands::set::{{closure}} at /workplace/jlizen/GokuBotsAtpAggregator/src/GokuRustCommons/build/private/cargo-home/registry/src/-6353be53fb99b376/redis-0.25.4/src/commands/macros.rs:169:69
     └╼ redis::cmd::Cmd::query_async::{{closure}} at /workplace/jlizen/GokuBotsAtpAggregator/src/GokuRustCommons/build/private/cargo-home/registry/src/-6353be53fb99b376/redis-0.25.4/src/cmd.rs:434:48
        └╼ <core::pin::Pin<P> as core::future::future::Future>::poll at /rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14/library/core/src/future/future.rs:123:9
           └╼ redis::cluster_async::ClusterConnection<C>::route_command::{{closure}} at /workplace/jlizen/GokuBotsAtpAggregator/src/GokuRustCommons/build/private/cargo-home/registry/src/-6353be53fb99b376/redis-0.25.4/src/cluster_async/mod.rs:107:14
              └╼ <tokio::sync::oneshot::Receiver<T> as core::future::future::Future>::poll at /workplace/jlizen/GokuBotsAtpAggregator/src/tokio/tokio/src/sync/oneshot.rs:1112:30
                 └╼ tokio::sync::oneshot::Inner<T>::poll_recv at /workplace/jlizen/GokuBotsAtpAggregator/src/tokio/tokio/src/sync/oneshot.rs:1143:16
2024-11-21T02:20:21.986774Z  WARN redis_set{key="191"}: test_trace: future ran long: ╼ <core::pin::Pin<P> as core::future::future::Future>::poll at /rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14/library/core/src/future/future.rs:123:9
  └╼ redis::commands::AsyncCommands::set::{{closure}} at /workplace/jlizen/GokuBotsAtpAggregator/src/GokuRustCommons/build/private/cargo-home/registry/src/-6353be53fb99b376/redis-0.25.4/src/commands/macros.rs:169:69
     └╼ redis::cmd::Cmd::query_async::{{closure}} at /workplace/jlizen/GokuBotsAtpAggregator/src/GokuRustCommons/build/private/cargo-home/registry/src/-6353be53fb99b376/redis-0.25.4/src/cmd.rs:434:48
        └╼ <core::pin::Pin<P> as core::future::future::Future>::poll at /rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14/library/core/src/future/future.rs:123:9
           └╼ redis::cluster_async::ClusterConnection<C>::route_command::{{closure}} at /workplace/jlizen/GokuBotsAtpAggregator/src/GokuRustCommons/build/private/cargo-home/registry/src/-6353be53fb99b376/redis-0.25.4/src/cluster_async/mod.rs:107:14
              └╼ <tokio::sync::oneshot::Receiver<T> as core::future::future::Future>::poll at /workplace/jlizen/GokuBotsAtpAggregator/src/tokio/tokio/src/sync/oneshot.rs:1112:30
                 └╼ tokio::sync::oneshot::Inner<T>::poll_recv at /workplace/jlizen/GokuBotsAtpAggregator/src/tokio/tokio/src/sync/oneshot.rs:1143:16
2024-11-21T02:20:21.986933Z  WARN redis_set{key="big_key"}: test_trace: future ran long: ╼ <core::pin::Pin<P> as core::future::future::Future>::poll at /rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14/library/core/src/future/future.rs:123:9
  └╼ redis::commands::AsyncCommands::set::{{closure}} at /workplace/jlizen/GokuBotsAtpAggregator/src/GokuRustCommons/build/private/cargo-home/registry/src/-6353be53fb99b376/redis-0.25.4/src/commands/macros.rs:169:69
     └╼ redis::cmd::Cmd::query_async::{{closure}} at /workplace/jlizen/GokuBotsAtpAggregator/src/GokuRustCommons/build/private/cargo-home/registry/src/-6353be53fb99b376/redis-0.25.4/src/cmd.rs:434:48
        └╼ <core::pin::Pin<P> as core::future::future::Future>::poll at /rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14/library/core/src/future/future.rs:123:9
           └╼ redis::cluster_async::ClusterConnection<C>::route_command::{{closure}} at /workplace/jlizen/GokuBotsAtpAggregator/src/GokuRustCommons/build/private/cargo-home/registry/src/-6353be53fb99b376/redis-0.25.4/src/cluster_async/mod.rs:107:14
              └╼ <tokio::sync::oneshot::Receiver<T> as core::future::future::Future>::poll at /workplace/jlizen/GokuBotsAtpAggregator/src/tokio/tokio/src/sync/oneshot.rs:1112:30
                 └╼ tokio::sync::oneshot::Inner<T>::poll_recv at /workplace/jlizen/GokuBotsAtpAggregator/src/tokio/tokio/src/sync/oneshot.rs:1143:16
2024-11-21T02:20:21.986032Z  WARN redis_set{key="big_key"}: test_trace: future ran long: ╼ <core::pin::Pin<P> as core::future::future::Future>::poll at /rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14/library/core/src/future/future.rs:123:9
  └╼ redis::commands::AsyncCommands::set::{{closure}} at /workplace/jlizen/GokuBotsAtpAggregator/src/GokuRustCommons/build/private/cargo-home/registry/src/-6353be53fb99b376/redis-0.25.4/src/commands/macros.rs:169:69
     └╼ redis::cmd::Cmd::query_async::{{closure}} at /workplace/jlizen/GokuBotsAtpAggregator/src/GokuRustCommons/build/private/cargo-home/registry/src/-6353be53fb99b376/redis-0.25.4/src/cmd.rs:434:48
        └╼ <core::pin::Pin<P> as core::future::future::Future>::poll at /rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14/library/core/src/future/future.rs:123:9
           └╼ redis::cluster_async::ClusterConnection<C>::route_command::{{closure}} at /workplace/jlizen/GokuBotsAtpAggregator/src/GokuRustCommons/build/private/cargo-home/registry/src/-6353be53fb99b376/redis-0.25.4/src/cluster_async/mod.rs:107:14
              └╼ <tokio::sync::oneshot::Receiver<T> as core::future::future::Future>::poll at /workplace/jlizen/GokuBotsAtpAggregator/src/tokio/tokio/src/sync/oneshot.rs:1112:30
                 └╼ tokio::sync::oneshot::Inner<T>::poll_recv at /workplace/jlizen/GokuBotsAtpAggregator/src/tokio/tokio/src/sync/oneshot.rs:1143:16
2024-11-21T02:20:21.986290Z  WARN redis_set{key="big_key"}: test_trace: future ran long: ╼ <core::pin::Pin<P> as core::future::future::Future>::poll at /rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14/library/core/src/future/future.rs:123:9
  └╼ redis::commands::AsyncCommands::set::{{closure}} at /workplace/jlizen/GokuBotsAtpAggregator/src/GokuRustCommons/build/private/cargo-home/registry/src/-6353be53fb99b376/redis-0.25.4/src/commands/macros.rs:169:69
     └╼ redis::cmd::Cmd::query_async::{{closure}} at /workplace/jlizen/GokuBotsAtpAggregator/src/GokuRustCommons/build/private/cargo-home/registry/src/-6353be53fb99b376/redis-0.25.4/src/cmd.rs:434:48
        └╼ <core::pin::Pin<P> as core::future::future::Future>::poll at /rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14/library/core/src/future/future.rs:123:9
           └╼ redis::cluster_async::ClusterConnection<C>::route_command::{{closure}} at /workplace/jlizen/GokuBotsAtpAggregator/src/GokuRustCommons/build/private/cargo-home/registry/src/-6353be53fb99b376/redis-0.25.4/src/cluster_async/mod.rs:107:14
              └╼ <tokio::sync::oneshot::Receiver<T> as core::future::future::Future>::poll at /workplace/jlizen/GokuBotsAtpAggregator/src/tokio/tokio/src/sync/oneshot.rs:1112:30
                 └╼ tokio::sync::oneshot::Inner<T>::poll_recv at /workplace/jlizen/GokuBotsAtpAggregator/src/tokio/tokio/src/sync/oneshot.rs:1143:16
2024-11-21T02:20:21.986339Z  WARN redis_set{key="big_key"}: test_trace: future ran long: ╼ <core::pin::Pin<P> as core::future::future::Future>::poll at /rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14/library/core/src/future/future.rs:123:9
  └╼ redis::commands::AsyncCommands::set::{{closure}} at /workplace/jlizen/GokuBotsAtpAggregator/src/GokuRustCommons/build/private/cargo-home/registry/src/-6353be53fb99b376/redis-0.25.4/src/commands/macros.rs:169:69
     └╼ redis::cmd::Cmd::query_async::{{closure}} at /workplace/jlizen/GokuBotsAtpAggregator/src/GokuRustCommons/build/private/cargo-home/registry/src/-6353be53fb99b376/redis-0.25.4/src/cmd.rs:434:48
        └╼ <core::pin::Pin<P> as core::future::future::Future>::poll at /rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14/library/core/src/future/future.rs:123:9
           └╼ redis::cluster_async::ClusterConnection<C>::route_command::{{closure}} at /workplace/jlizen/GokuBotsAtpAggregator/src/GokuRustCommons/build/private/cargo-home/registry/src/-6353be53fb99b376/redis-0.25.4/src/cluster_async/mod.rs:107:14
              └╼ <tokio::sync::oneshot::Receiver<T> as core::future::future::Future>::poll at /workplace/jlizen/GokuBotsAtpAggregator/src/tokio/tokio/src/sync/oneshot.rs:1112:30
                 └╼ tokio::sync::oneshot::Inner<T>::poll_recv at /workplace/jlizen/GokuBotsAtpAggregator/src/tokio/tokio/src/sync/oneshot.rs:1143:16
2024-11-21T02:20:21.985800Z  WARN redis_set{key="big_key"}: test_trace: future ran long: ╼ <core::pin::Pin<P> as core::future::future::Future>::poll at /rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14/library/core/src/future/future.rs:123:9
  └╼ redis::commands::AsyncCommands::set::{{closure}} at /workplace/jlizen/GokuBotsAtpAggregator/src/GokuRustCommons/build/private/cargo-home/registry/src/-6353be53fb99b376/redis-0.25.4/src/commands/macros.rs:169:69
     └╼ redis::cmd::Cmd::query_async::{{closure}} at /workplace/jlizen/GokuBotsAtpAggregator/src/GokuRustCommons/build/private/cargo-home/registry/src/-6353be53fb99b376/redis-0.25.4/src/cmd.rs:434:48
        └╼ <core::pin::Pin<P> as core::future::future::Future>::poll at /rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14/library/core/src/future/future.rs:123:9
           └╼ redis::cluster_async::ClusterConnection<C>::route_command::{{closure}} at /workplace/jlizen/GokuBotsAtpAggregator/src/GokuRustCommons/build/private/cargo-home/registry/src/-6353be53fb99b376/redis-0.25.4/src/cluster_async/mod.rs:107:14
              └╼ <tokio::sync::oneshot::Receiver<T> as core::future::future::Future>::poll at /workplace/jlizen/GokuBotsAtpAggregator/src/tokio/tokio/src/sync/oneshot.rs:1112:30
                 └╼ tokio::sync::oneshot::Inner<T>::poll_recv at /workplace/jlizen/GokuBotsAtpAggregator/src/tokio/tokio/src/sync/oneshot.rs:1143:16
2024-11-21T02:20:21.986253Z  WARN redis_set{key="big_key"}: test_trace: future ran long: ╼ <core::pin::Pin<P> as core::future::future::Future>::poll at /rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14/library/core/src/future/future.rs:123:9
  └╼ redis::commands::AsyncCommands::set::{{closure}} at /workplace/jlizen/GokuBotsAtpAggregator/src/GokuRustCommons/build/private/cargo-home/registry/src/-6353be53fb99b376/redis-0.25.4/src/commands/macros.rs:169:69
     └╼ redis::cmd::Cmd::query_async::{{closure}} at /workplace/jlizen/GokuBotsAtpAggregator/src/GokuRustCommons/build/private/cargo-home/registry/src/-6353be53fb99b376/redis-0.25.4/src/cmd.rs:434:48
        └╼ <core::pin::Pin<P> as core::future::future::Future>::poll at /rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14/library/core/src/future/future.rs:123:9
           └╼ redis::cluster_async::ClusterConnection<C>::route_command::{{closure}} at /workplace/jlizen/GokuBotsAtpAggregator/src/GokuRustCommons/build/private/cargo-home/registry/src/-6353be53fb99b376/redis-0.25.4/src/cluster_async/mod.rs:107:14
              └╼ <tokio::sync::oneshot::Receiver<T> as core::future::future::Future>::poll at /workplace/jlizen/GokuBotsAtpAggregator/src/tokio/tokio/src/sync/oneshot.rs:1112:30
                 └╼ tokio::sync::oneshot::Inner<T>::poll_recv at /workplace/jlizen/GokuBotsAtpAggregator/src/tokio/tokio/src/sync/oneshot.rs:1143:16
2024-11-21T02:20:21.986078Z  WARN redis_set{key="big_key"}: test_trace: future ran long: ╼ <core::pin::Pin<P> as core::future::future::Future>::poll at /rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14/library/core/src/future/future.rs:123:9
  └╼ redis::commands::AsyncCommands::set::{{closure}} at /workplace/jlizen/GokuBotsAtpAggregator/src/GokuRustCommons/build/private/cargo-home/registry/src/-6353be53fb99b376/redis-0.25.4/src/commands/macros.rs:169:69
     └╼ redis::cmd::Cmd::query_async::{{closure}} at /workplace/jlizen/GokuBotsAtpAggregator/src/GokuRustCommons/build/private/cargo-home/registry/src/-6353be53fb99b376/redis-0.25.4/src/cmd.rs:434:48
        └╼ <core::pin::Pin<P> as core::future::future::Future>::poll at /rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14/library/core/src/future/future.rs:123:9
           └╼ redis::cluster_async::ClusterConnection<C>::route_command::{{closure}} at /workplace/jlizen/GokuBotsAtpAggregator/src/GokuRustCommons/build/private/cargo-home/registry/src/-6353be53fb99b376/redis-0.25.4/src/cluster_async/mod.rs:107:14
              └╼ <tokio::sync::oneshot::Receiver<T> as core::future::future::Future>::poll at /workplace/jlizen/GokuBotsAtpAggregator/src/tokio/tokio/src/sync/oneshot.rs:1112:30
                 └╼ tokio::sync::oneshot::Inner<T>::poll_recv at /workplace/jlizen/GokuBotsAtpAggregator/src/tokio/tokio/src/sync/oneshot.rs:1143:16
2024-11-21T02:20:21.987240Z  WARN redis_set{key="big_key"}: test_trace: future ran long: ╼ <core::pin::Pin<P> as core::future::future::Future>::poll at /rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14/library/core/src/future/future.rs:123:9
  └╼ redis::commands::AsyncCommands::set::{{closure}} at /workplace/jlizen/GokuBotsAtpAggregator/src/GokuRustCommons/build/private/cargo-home/registry/src/-6353be53fb99b376/redis-0.25.4/src/commands/macros.rs:169:69
     └╼ redis::cmd::Cmd::query_async::{{closure}} at /workplace/jlizen/GokuBotsAtpAggregator/src/GokuRustCommons/build/private/cargo-home/registry/src/-6353be53fb99b376/redis-0.25.4/src/cmd.rs:434:48
        └╼ <core::pin::Pin<P> as core::future::future::Future>::poll at /rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14/library/core/src/future/future.rs:123:9
           └╼ redis::cluster_async::ClusterConnection<C>::route_command::{{closure}} at /workplace/jlizen/GokuBotsAtpAggregator/src/GokuRustCommons/build/private/cargo-home/registry/src/-6353be53fb99b376/redis-0.25.4/src/cluster_async/mod.rs:107:14
              └╼ <tokio::sync::oneshot::Receiver<T> as core::future::future::Future>::poll at /workplace/jlizen/GokuBotsAtpAggregator/src/tokio/tokio/src/sync/oneshot.rs:1112:30
                 └╼ tokio::sync::oneshot::Inner<T>::poll_recv at /workplace/jlizen/GokuBotsAtpAggregator/src/tokio/tokio/src/sync/oneshot.rs:1143:16
2024-11-21T02:20:21.986810Z  WARN redis_set{key="big_key"}: test_trace: future ran long: ╼ <core::pin::Pin<P> as core::future::future::Future>::poll at /rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14/library/core/src/future/future.rs:123:9
  └╼ redis::commands::AsyncCommands::set::{{closure}} at /workplace/jlizen/GokuBotsAtpAggregator/src/GokuRustCommons/build/private/cargo-home/registry/src/-6353be53fb99b376/redis-0.25.4/src/commands/macros.rs:169:69
     └╼ redis::cmd::Cmd::query_async::{{closure}} at /workplace/jlizen/GokuBotsAtpAggregator/src/GokuRustCommons/build/private/cargo-home/registry/src/-6353be53fb99b376/redis-0.25.4/src/cmd.rs:434:48
        └╼ <core::pin::Pin<P> as core::future::future::Future>::poll at /rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14/library/core/src/future/future.rs:123:9
           └╼ redis::cluster_async::ClusterConnection<C>::route_command::{{closure}} at /workplace/jlizen/GokuBotsAtpAggregator/src/GokuRustCommons/build/private/cargo-home/registry/src/-6353be53fb99b376/redis-0.25.4/src/cluster_async/mod.rs:107:14
              └╼ <tokio::sync::oneshot::Receiver<T> as core::future::future::Future>::poll at /workplace/jlizen/GokuBotsAtpAggregator/src/tokio/tokio/src/sync/oneshot.rs:1112:30
                 └╼ tokio::sync::oneshot::Inner<T>::poll_recv at /workplace/jlizen/GokuBotsAtpAggregator/src/tokio/tokio/src/sync/oneshot.rs:1143:16
2024-11-21T02:20:21.988358Z  WARN redis_set{key="70"}: test_trace: future ran long: ╼ <core::pin::Pin<P> as core::future::future::Future>::poll at /rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14/library/core/src/future/future.rs:123:9
  └╼ redis::commands::AsyncCommands::set::{{closure}} at /workplace/jlizen/GokuBotsAtpAggregator/src/GokuRustCommons/build/private/cargo-home/registry/src/-6353be53fb99b376/redis-0.25.4/src/commands/macros.rs:169:69
     └╼ redis::cmd::Cmd::query_async::{{closure}} at /workplace/jlizen/GokuBotsAtpAggregator/src/GokuRustCommons/build/private/cargo-home/registry/src/-6353be53fb99b376/redis-0.25.4/src/cmd.rs:434:48
        └╼ <core::pin::Pin<P> as core::future::future::Future>::poll at /rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14/library/core/src/future/future.rs:123:9
           └╼ redis::cluster_async::ClusterConnection<C>::route_command::{{closure}} at /workplace/jlizen/GokuBotsAtpAggregator/src/GokuRustCommons/build/private/cargo-home/registry/src/-6353be53fb99b376/redis-0.25.4/src/cluster_async/mod.rs:107:14
              └╼ <tokio::sync::oneshot::Receiver<T> as core::future::future::Future>::poll at /workplace/jlizen/GokuBotsAtpAggregator/src/tokio/tokio/src/sync/oneshot.rs:1112:30
                 └╼ tokio::sync::oneshot::Inner<T>::poll_recv at /workplace/jlizen/GokuBotsAtpAggregator/src/tokio/tokio/src/sync/oneshot.rs:1143:16
<snip, maybe half the bigkey calls were >500 ms and a scattering of other calls>

And then for completeness, here you can see the overlap of the bigkey + regular key futures that were running long - all were routed to node 4:

bash-5.2# redis-cli -c
127.0.0.1:6379> cluster keyslot big_key
(integer) 11045
127.0.0.1:6379> cluster keyslot 191
(integer) 11242
127.0.0.1:6379> cluster keyslot 70
(integer) 10833
127.0.0.1:6379> cluster nodes
8fd9761e7124eb3572428f6955a0d666ad804751 240.10.3.4:7001@17001 master - 0 1732152536000 7 connected 4096-6143
91490ff14cf53f07b17044a480162072bf3f6a3e 240.10.3.4:6379@16379 myself,master - 0 1732152535000 1 connected 0-2047
aef5d6bebe5d2a05cb1932657f7352bfc3e913ba 240.10.3.4:7000@17000 master - 0 1732152536782 0 connected 2048-4095
5d55504cde8d478a3006b14ead7097676d768546 240.10.3.4:7004@17004 master - 0 1732152535000 4 connected 10240-12287
04fcec03a691deb0d6dc65d83e1b4e647f5a113f 240.10.3.4:7006@17006 master - 0 1732152535000 5 connected 14336-16383
014709135cc1968a32a0f97d729a3a709d333eb1 240.10.3.4:7002@17002 master - 0 1732152537833 3 connected 6144-8191
6da488511ca9434ddd0c88f026c37741b059648a 240.10.3.4:7003@17003 master - 0 1732152535000 6 connected 8192-10239
218d002c72b76b9255056d990f09e11d82756227 240.10.3.4:7005@17005 master - 0 1732152534074 2 connected 12288-14335

jlizen · 2024-11-21T15:45:33Z

Going to also throw in a test against a more realistic application that has more complicated usage of the tokio runtime to see if anything rattles loose.

jlizen · 2024-12-16T18:56:23Z

I did some more testing in a real application. It is an axum server that handles user flows across a series of endpoints that span vending javascript to the client, vending inputs for a hashcash proof of work challenge, then various crypto operations to validate the solution and vend an encrypted token.

I injected the self-tracing task functionality in an outer tower middleware layer. It dumps a trace in case a request takes longer than 500ms. I then injected a 5% chance of a sleep inside a place where we make a remote call to AWS DynamoDB. This mimics a real performance bottleneck we encountered, due to overlarge table size, that was annoying to debug due to lack of specific metrics wrapping that remote call at the time.

I then simulated user realistic user flows that hit all endpoints in the browser, acquired tokens, etc. I didn't test at load but did send enough traffic to hit a bit of concurrency. I didn't see any signs of strange behavior with regard to the executor or otherwise. Seemed to behave as expected.

The stack trace was useful and pointed directly to the line of code that had the sleep added. See abbreviated output below:

warn 2024-12-16T18:45:51.813062959Z long running task: ╼ 
<snip> 
 └╼ test_app[5d607e4097ef04f4]::routing::apis::<test_app_backends[6d9dd0500a3162d0]::cp_datastore::backends::api_gateway_proxy::ApiGatewayProxy>::{closure#4} at /TestApp/test_app/src/routing.rs:269:22
    └╼ test_app[5d607e4097ef04f4]::handlers::verify::verify_handler::<test_app_backends[6d9dd0500a3162d0]::cp_datastore::backends::api_gateway_proxy::ApiGatewayProxy> at /TestApp/test_app/src/handlers/verify.rs:305:1
       └╼ test_app[5d607e4097ef04f4]::handlers::verify::verify_handler::<test_app_backends[6d9dd0500a3162d0]::cp_datastore::backends::api_gateway_proxy::ApiGatewayProxy>::{closure#0} at /TestApp/test_app/src/handlers/verify.rs:365:6
          └╼ test_app[5d607e4097ef04f4]::handlers::verify::verify_handler_inner::<test_app_backends[6d9dd0500a3162d0]::cp_datastore::backends::api_gateway_proxy::ApiGatewayProxy> at /TestApp/test_app/src/handlers/verify.rs:652:85
             └╼ test_app[5d607e4097ef04f4]::handlers::verify::check_and_insert_with_maybe_sleep at /TestApp/test_app/src/handlers/verify.rs:727:67
                └╼ <tokio[b5f4bbd0e64ee820]::time::sleep::Sleep as core[23aac720c248f827]::future::future::Future> at /TestApp/src/tokio/tokio/src/time/sleep.rs:448:22
                   └╼ <tokio[b5f4bbd0e64ee820]::time::sleep::Sleep> at /TestApp/src/tokio/tokio/src/time/sleep.rs:404:16

arielb1 · 2025-01-09T15:50:28Z

Now ready for review.

Darksonn

Looks reasonable enough.

tokio/src/runtime/dump.rs

tokio/tests/task_trace_self.rs

Darksonn · 2025-01-10T08:36:48Z

Seems like the doc-tests fail to compile.

arielb1 · 2025-01-10T23:39:41Z

fixed doctests

jswrenn

A few nits, but this mostly LGTM.

tokio/src/runtime/dump.rs

jswrenn

LGTM

Comments addressed.

arielb1 · 2025-01-14T10:23:55Z

Waiting for you @Noah-Kennedy

Bumps tokio from 1.43.0 to 1.44.0. Release notes Sourced from tokio's releases. Tokio v1.44.0 1.44.0 (March 7th, 2025) This release changes the from_std method on sockets to panic if a blocking socket is provided. We determined this change is not a breaking change as Tokio is not intended to operate using blocking sockets. Doing so results in runtime hangs and should be considered a bug. Accidentally passing a blocking socket to Tokio is one of the most common user mistakes. If this change causes an issue for you, please comment on #7172. Added coop: add task::coop module (#7116) process: add Command::get_kill_on_drop() (#7086) sync: add broadcast::Sender::closed (#6685, #7090) sync: add broadcast::WeakSender (#7100) sync: add oneshot::Receiver::is_empty() (#7153) sync: add oneshot::Receiver::is_terminated() (#7152) Fixed fs: empty reads on File should not start a background read (#7139) process: calling start_kill on exited child should not fail (#7160) signal: fix CTRL_CLOSE, CTRL_LOGOFF, CTRL_SHUTDOWN on windows (#7122) sync: properly handle panic during mpsc drop (#7094) Changes runtime: clean up magic number in registration set (#7112) coop: make coop yield using waker defer strategy (#7185) macros: make select! budget-aware (#7164) net: panic when passing a blocking socket to from_std (#7166) io: clean up buffer casts (#7142) Changes to unstable APIs rt: add before and after task poll callbacks (#7120) tracing: make the task tracing API unstable public (#6972) Documented docs: fix nesting of sections in top-level docs (#7159) fs: rename symlink and hardlink parameter names (#7143) io: swap reader/writer in simplex doc test (#7176) macros: docs about select! alternatives (#7110) net: rename the argument for send_to (#7146) process: add example for reading Child stdout (#7141) process: clarify Child::kill behavior (#7162) process: fix grammar of the ChildStdin struct doc comment (#7192) runtime: consistently use worker_threads instead of core_threads (#7186) #6685: tokio-rs/tokio#6685 #6972: tokio-rs/tokio#6972 #7086: tokio-rs/tokio#7086 #7090: tokio-rs/tokio#7090 ... (truncated) Commits 8182ecf chore: prepare Tokio v1.44.0 (#7202) a258bff ci: enable printing in multi thread loom tests (#7200) e076d21 process: clarify Child::kill behavior (#7162) 042433c net: debug_assert on creating a tokio socket from a blocking one (#7166) 0284d1b macros: make select! budget-aware (#7164) 710bc80 rt: coop should yield using waker defer strategy (#7185) a2b12bd readme: adjust release schedule to once per month (#7191) e7b593c process: fix grammar of the ChildStdin struct doc comment (#7192) 3aaf4a5 coop: adjust grammar in tests/coop_budget.rs (#7173) 8e741c1 tokio: mark 1.43 as LTS (#7189) Additional commits viewable in compare view Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase. Dependabot commands and options You can trigger Dependabot actions by commenting on this PR: @dependabot rebase will rebase this PR @dependabot recreate will recreate this PR, overwriting any edits that have been made to it @dependabot merge will merge this PR after your CI passes on it @dependabot squash and merge will squash and merge this PR after your CI passes on it @dependabot cancel merge will cancel a previously requested merge and block automerging @dependabot reopen will reopen this PR if it is closed @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually @dependabot show <dependency name> ignore conditions will show all of the ignore conditions of the specified dependency @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

github-actions bot added R-loom-current-thread Run loom current-thread tests on this PR R-loom-multi-thread Run loom multi-thread tests on this PR R-loom-multi-thread-alt Run loom multi-thread alt tests on this PR labels Nov 13, 2024

arielb1 force-pushed the self-trace branch 3 times, most recently from 35d68a7 to 1716248 Compare November 13, 2024 17:02

arielb1 force-pushed the self-trace branch 5 times, most recently from d93b7cf to 5375c24 Compare November 14, 2024 19:16

Darksonn marked this pull request as draft November 18, 2024 11:58

Darksonn added A-tokio Area: The main tokio crate M-taskdump --cfg tokio_taskdump labels Nov 18, 2024

arielb1 force-pushed the self-trace branch 4 times, most recently from 24f51f0 to 048978f Compare January 9, 2025 15:47

arielb1 changed the title ~~[WIP] make tracing a task public so self-tracing is possible~~ make tracing a task public so self-tracing is possible Jan 9, 2025

arielb1 marked this pull request as ready for review January 9, 2025 15:47

arielb1 force-pushed the self-trace branch from 048978f to 6e64f61 Compare January 9, 2025 15:50

Darksonn reviewed Jan 9, 2025

View reviewed changes

tokio/src/runtime/dump.rs Outdated Show resolved Hide resolved

tokio/tests/task_trace_self.rs Outdated Show resolved Hide resolved

tokio/tests/task_trace_self.rs Outdated Show resolved Hide resolved

Darksonn requested review from jswrenn and Noah-Kennedy January 9, 2025 16:07

arielb1 force-pushed the self-trace branch 2 times, most recently from 49e2cbd to ba0f34d Compare January 9, 2025 19:45

Ariel Ben-Yehuda added 2 commits January 10, 2025 23:16

make self-tracing public

fbb5aa0

address review comments

6972b7b

arielb1 force-pushed the self-trace branch from ba0f34d to 8d8af2a Compare January 10, 2025 23:16

try to fix doctest

8eedfd6

arielb1 force-pushed the self-trace branch from 8d8af2a to 8eedfd6 Compare January 10, 2025 23:24

adjust imports to fit standard

1c6c605

jswrenn previously requested changes Jan 13, 2025

View reviewed changes

tokio/src/runtime/dump.rs Outdated Show resolved Hide resolved

tokio/src/runtime/dump.rs Outdated Show resolved Hide resolved

tokio/src/runtime/dump.rs Show resolved Hide resolved

tokio/src/runtime/dump.rs Show resolved Hide resolved

more documentation

03b481a

jswrenn reviewed Jan 13, 2025

View reviewed changes

carllerche approved these changes Jan 27, 2025

View reviewed changes

carllerche merged commit 2671ffb into tokio-rs:master Jan 27, 2025
90 checks passed

Darksonn mentioned this pull request Mar 7, 2025

chore: prepare Tokio v1.44.0 #7202

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

make tracing a task public so self-tracing is possible #6972

make tracing a task public so self-tracing is possible #6972

arielb1 commented Nov 13, 2024 •

edited

Loading

arielb1 commented Nov 13, 2024

arielb1 commented Nov 18, 2024

jlizen commented Nov 21, 2024 •

edited

Loading

jlizen commented Nov 21, 2024

jlizen commented Dec 16, 2024 •

edited

Loading

arielb1 commented Jan 9, 2025

Darksonn left a comment

Darksonn commented Jan 10, 2025

arielb1 commented Jan 10, 2025

jswrenn left a comment

jswrenn left a comment

arielb1 commented Jan 14, 2025

make tracing a task public so self-tracing is possible #6972

make tracing a task public so self-tracing is possible #6972

Conversation

arielb1 commented Nov 13, 2024 • edited Loading

arielb1 commented Nov 13, 2024

arielb1 commented Nov 18, 2024

jlizen commented Nov 21, 2024 • edited Loading

jlizen commented Nov 21, 2024

jlizen commented Dec 16, 2024 • edited Loading

arielb1 commented Jan 9, 2025

Darksonn left a comment

Choose a reason for hiding this comment

Darksonn commented Jan 10, 2025

arielb1 commented Jan 10, 2025

jswrenn left a comment

Choose a reason for hiding this comment

jswrenn left a comment

Choose a reason for hiding this comment

arielb1 commented Jan 14, 2025

arielb1 commented Nov 13, 2024 •

edited

Loading

jlizen commented Nov 21, 2024 •

edited

Loading

jlizen commented Dec 16, 2024 •

edited

Loading