Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tracing & tracing-subscriber have a design defect I believe #3223

Open
vlovich opened this issue Feb 24, 2025 · 2 comments
Open

tracing & tracing-subscriber have a design defect I believe #3223

vlovich opened this issue Feb 24, 2025 · 2 comments
Labels
crate/subscriber Related to the `tracing-subscriber` crate kind/bug Something isn't working

Comments

@vlovich
Copy link

vlovich commented Feb 24, 2025

Version

tracing 0.1.41
tracing-subscriber 0.3.19

Platform

I'm pretty sure all of them, but definitely verified on Linux 6.12.9-arch1-1

Description

Using with_subscriber will trigger a panic if the async block creates a span & enters it on a different thread that will target the default subscriber. I believe both the with_subscriber AND the enter off-thread scenario are idiomatic. The reason for the crash is that the span.enter() try's recording the span exit into the default subscriber which isn't aware of it since it's registered by the scoped subscriber.

I encountered this with hyper which in GaiResolver has a block like:

fn call(&mut self, name: Name) -> Self::Future {
    let span = debug_span!("resolve", host = %name.host);
    let blocking = tokio::task::spawn_blocking(move || { 
        let _enter = span.enter();
        (&*name.host, 0)
            .to_socket_addrs()
            .map(|i| SocketAddrs { iter: i })
    });

    GaiFuture { inner: blocking }
}

This pattern seems pretty idiomatic: https://docs.rs/tracing/0.1.41/tracing/struct.Span.html#method.or_current

A simple repro on Linux (which uses GaiResolver) is:

#[tokio::main]
async fn main() {
    tracing_subscriber::registry()
        .with(tracing_subscriber::fmt::layer())
        .init();

    let override_default_registry =
        tracing_subscriber::Registry::default().with(tracing_subscriber::fmt::layer());
    async move {
        tracing::info!("Starting request");
        reqwest::Client::new()
            .post("http://localhost:8787/ping")
            .body("")
            .send()
            .await
           .expect("HTTP request failed");
        tracing::info!("Finished request");
    }
    .with_subscriber(override_default_registry)
    .await;
}

Tried filing upstream but the author there confirmed it's likely an issue in this crate: hyperium/hyper#3848 (comment)

Either with_subscriber has a fatal flaw or the idiomatic pattern needs to be clarified on how to actually make it safe for the above snippet.

@hawkw
Copy link
Member

hawkw commented Feb 24, 2025

Hmm, that doesn't seem right. The guard returned by Span::enter should call the span's do_exit() method when it's dropped:

tracing/tracing/src/span.rs

Lines 1558 to 1570 in bfca546

impl Drop for Entered<'_> {
#[inline(always)]
fn drop(&mut self) {
self.span.do_exit()
}
}
impl Drop for EnteredSpan {
#[inline(always)]
fn drop(&mut self) {
self.span.do_exit()
}
}

The Span::do_exit() method calls exit on the subscriber that was current when the span was created (which is stored by the Span), not the current subscriber on the thread when the guard was dropped:

tracing/tracing/src/span.rs

Lines 1051 to 1054 in bfca546

fn do_exit(&self) {
if let Some(inner) = self.inner.as_ref() {
inner.subscriber.exit(&inner.id);
}

This is designed to prevent precisely the situation you're describing in this issue, so I'm not sure what's gone wrong.

@hawkw
Copy link
Member

hawkw commented Feb 24, 2025

Looking at the panic in hyperium/hyper#3848, I see that and the stack indicates that we are in the span's try_close method:

stack backtrace:
   0: rust_begin_unwind
             at /rustc/4d91de4e48198da2e33413efdcd9cd2cc0c46688/library/std/src/panicking.rs:692:5
   1: core::panicking::panic_fmt
             at /rustc/4d91de4e48198da2e33413efdcd9cd2cc0c46688/library/core/src/panicking.rs:75:14
   2: <tracing_subscriber::registry::sharded::Registry as tracing_core::subscriber::Subscriber>::try_close
             at /home/me/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tracing-subscriber-0.3.19/src/registry/sharded.rs:350:21
   3: <tracing_subscriber::layer::layered::Layered<L,S> as tracing_core::subscriber::Subscriber>::try_close
             at /home/me/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tracing-subscriber-0.3.19/src/layer/layered.rs:186:12
   4: tracing_core::dispatcher::Dispatch::try_close
             at /home/me/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tracing-core-0.1.33/src/dispatcher.rs:703:9
   5: <tracing_subscriber::registry::sharded::Registry as tracing_core::subscriber::Subscriber>::exit::{{closure}}
             at /home/me/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tracing-subscriber-0.3.19/src/registry/sharded.rs:303:52
   6: tracing_core::dispatcher::get_default

It looks like the problem is this code in tracing-subscriber's Registry implementation:

if spans.borrow_mut().pop(id) {
dispatcher::get_default(|dispatch| dispatch.try_close(id.clone()));
}

Note that here, in order to call the exit callback on every subscriber Layer above the Registry, we call get_default to get the current default subscriber. The registry implementation assumes that that will return the same subscriber that's processing the exit, but this isn't the case.

One solution, although it has unfortunate performance implications, would be for the Span type's enter and exit code to set its captured subscriber as the thread's default for the duration of the call to enter, exit, or close it, rather than calling those methods directly on the captured Dispatch reference to the subscriber. That way, any code that runs inside the subscriber callbacks will see itself as the current default on that thread.

@hawkw hawkw added kind/bug Something isn't working crate/subscriber Related to the `tracing-subscriber` crate labels Feb 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
crate/subscriber Related to the `tracing-subscriber` crate kind/bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants