-
Notifications
You must be signed in to change notification settings - Fork 347
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Epoll deadlock on tokio's test #3911
Comments
Currently looking into this, @rustbot claim |
I know you are still working on epoll so thought I would just ask this here. Last month this code was added to epoll_create1. It calls Epoll:default() a second time in the function and then sets its ready_list to what I would think is the default anyway and then it lets the variable go out of scope without further use. Maybe a merge artifact? miri/src/shims/unix/linux/epoll.rs Lines 210 to 214 in 9efab21
Apologies if I am misreading something. And for using this issue thread. This would not be the cause of a deadlock. |
Huh yea, that looks like a rebase artifact, good find. the first two lines should just not exist anymore at all |
Wow, nice catch! |
Reproducible: use tokio_util::task;
#[tokio::main]
async fn main() {
let pool = task::LocalPoolHandle::new(1);
pool.spawn_pinned(|| async {}).await;
pool.spawn_pinned(|| async {}).await;
} full traceerror: deadlock: the evaluated program deadlocked
--> /home/byt/.cargo/registry/src/index.crates.io-6f17d22bba15001f/mio-1.0.2/src/sys/unix/selector/epoll.rs:56:9
|
56 | / syscall!(epoll_wait(
57 | | self.ep.as_raw_fd(),
58 | | events.as_mut_ptr(),
59 | | events.capacity() as i32,
60 | | timeout,
61 | | ))
| |__________^ the evaluated program deadlocked
|
= note: BACKTRACE on thread `unnamed-2`:
= note: inside `mio::sys::unix::selector::Selector::select` at /home/byt/.cargo/registry/src/index.crates.io-6f17d22bba15001f/mio-1.0.2/src/sys/unix/mod.rs:8:48: 8:49
= note: inside `mio::poll::Poll::poll` at /home/byt/.cargo/registry/src/index.crates.io-6f17d22bba15001f/mio-1.0.2/src/poll.rs:435:9: 435:61
= note: inside `tokio::runtime::io::driver::Driver::turn` at /home/byt/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/io/driver.rs:149:15: 149:47
= note: inside `tokio::runtime::io::driver::Driver::park` at /home/byt/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/io/driver.rs:122:9: 122:32
= note: inside `tokio::runtime::signal::Driver::park` at /home/byt/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/signal/mod.rs:92:9: 92:29
= note: inside `tokio::runtime::process::Driver::park` at /home/byt/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/process.rs:32:9: 32:31
= note: inside `tokio::runtime::driver::IoStack::park` at /home/byt/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/driver.rs:175:40: 175:54
= note: inside `tokio::runtime::time::Driver::park_internal` at /home/byt/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/time/mod.rs:247:21: 247:46
= note: inside `tokio::runtime::time::Driver::park` at /home/byt/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/time/mod.rs:173:9: 173:41
= note: inside `tokio::runtime::driver::TimeDriver::park` at /home/byt/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/driver.rs:332:55: 332:74
= note: inside `tokio::runtime::driver::Driver::park` at /home/byt/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/driver.rs:71:9: 71:32
= note: inside closure at /home/byt/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/scheduler/current_thread/mod.rs:382:17: 382:44
= note: inside `tokio::runtime::scheduler::current_thread::Context::enter::<(), {closure@tokio::runtime::scheduler::current_thread::Context::park::{closure#1}}>` at /home/byt/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/scheduler/current_thread/mod.rs:423:19: 423:22
= note: inside `tokio::runtime::scheduler::current_thread::Context::park` at /home/byt/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/scheduler/current_thread/mod.rs:381:27: 384:15
= note: inside closure at /home/byt/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/scheduler/current_thread/mod.rs:724:33: 724:59
= note: inside closure at /home/byt/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/scheduler/current_thread/mod.rs:774:68: 774:84
= note: inside `tokio::runtime::context::scoped::Scoped::<tokio::runtime::scheduler::Context>::set::<{closure@tokio::runtime::scheduler::current_thread::CoreGuard<'_>::enter<{closure@tokio::runtime::scheduler::current_thread::CoreGuard<'_>::block_on<std::pin::Pin<&mut {async fn body of tokio::task::LocalSet::run_until<{async block@tokio_util::task::spawn_pinned::LocalWorkerHandle::run::{closure#0}}>()}>>::{closure#0}}, std::option::Option<()>>::{closure#0}}, (std::boxed::Box<tokio::runtime::scheduler::current_thread::Core>, std::option::Option<()>)>` at /home/byt/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/context/scoped.rs:40:9: 40:12
= note: inside closure at /home/byt/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/context.rs:180:26: 180:47
= note: inside `std::thread::LocalKey::<tokio::runtime::context::Context>::try_with::<{closure@tokio::runtime::context::set_scheduler<(std::boxed::Box<tokio::runtime::scheduler::current_thread::Core>, std::option::Option<()>), {closure@tokio::runtime::scheduler::current_thread::CoreGuard<'_>::enter<{closure@tokio::runtime::scheduler::current_thread::CoreGuard<'_>::block_on<std::pin::Pin<&mut {async fn body of tokio::task::LocalSet::run_until<{async block@tokio_util::task::spawn_pinned::LocalWorkerHandle::run::{closure#0}}>()}>>::{closure#0}}, std::option::Option<()>>::{closure#0}}>::{closure#0}}, (std::boxed::Box<tokio::runtime::scheduler::current_thread::Core>, std::option::Option<()>)>` at /home/byt/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/thread/local.rs:283:12: 283:27
= note: inside `std::thread::LocalKey::<tokio::runtime::context::Context>::with::<{closure@tokio::runtime::context::set_scheduler<(std::boxed::Box<tokio::runtime::scheduler::current_thread::Core>, std::option::Option<()>), {closure@tokio::runtime::scheduler::current_thread::CoreGuard<'_>::enter<{closure@tokio::runtime::scheduler::current_thread::CoreGuard<'_>::block_on<std::pin::Pin<&mut {async fn body of tokio::task::LocalSet::run_until<{async block@tokio_util::task::spawn_pinned::LocalWorkerHandle::run::{closure#0}}>()}>>::{closure#0}}, std::option::Option<()>>::{closure#0}}>::{closure#0}}, (std::boxed::Box<tokio::runtime::scheduler::current_thread::Core>, std::option::Option<()>)>` at /home/byt/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/thread/local.rs:260:9: 260:25
= note: inside `tokio::runtime::context::set_scheduler::<(std::boxed::Box<tokio::runtime::scheduler::current_thread::Core>, std::option::Option<()>), {closure@tokio::runtime::scheduler::current_thread::CoreGuard<'_>::enter<{closure@tokio::runtime::scheduler::current_thread::CoreGuard<'_>::block_on<std::pin::Pin<&mut {async fn body of tokio::task::LocalSet::run_until<{async block@tokio_util::task::spawn_pinned::LocalWorkerHandle::run::{closure#0}}>()}>>::{closure#0}}, std::option::Option<()>>::{closure#0}}>` at /home/byt/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/context.rs:180:9: 180:48
= note: inside `tokio::runtime::scheduler::current_thread::CoreGuard::<'_>::enter::<{closure@tokio::runtime::scheduler::current_thread::CoreGuard<'_>::block_on<std::pin::Pin<&mut {async fn body of tokio::task::LocalSet::run_until<{async block@tokio_util::task::spawn_pinned::LocalWorkerHandle::run::{closure#0}}>()}>>::{closure#0}}, std::option::Option<()>>` at /home/byt/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/scheduler/current_thread/mod.rs:774:27: 774:85
= note: inside `tokio::runtime::scheduler::current_thread::CoreGuard::<'_>::block_on::<std::pin::Pin<&mut {async fn body of tokio::task::LocalSet::run_until<{async block@tokio_util::task::spawn_pinned::LocalWorkerHandle::run::{closure#0}}>()}>>` at /home/byt/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/scheduler/current_thread/mod.rs:683:19: 751:11
= note: inside closure at /home/byt/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/scheduler/current_thread/mod.rs:191:28: 191:49
= note: inside `tokio::runtime::context::runtime::enter_runtime::<{closure@tokio::runtime::scheduler::current_thread::CurrentThread::block_on<{async fn body of tokio::task::LocalSet::run_until<{async block@tokio_util::task::spawn_pinned::LocalWorkerHandle::run::{closure#0}}>()}>::{closure#0}}, ()>` at /home/byt/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/context/runtime.rs:65:16: 65:38
= note: inside `tokio::runtime::scheduler::current_thread::CurrentThread::block_on::<{async fn body of tokio::task::LocalSet::run_until<{async block@tokio_util::task::spawn_pinned::LocalWorkerHandle::run::{closure#0}}>()}>` at /home/byt/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/scheduler/current_thread/mod.rs:179:9: 214:11
= note: inside `tokio::runtime::Runtime::block_on_inner::<{async fn body of tokio::task::LocalSet::run_until<{async block@tokio_util::task::spawn_pinned::LocalWorkerHandle::run::{closure#0}}>()}>` at /home/byt/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/runtime.rs:361:47: 361:88
= note: inside `tokio::runtime::Runtime::block_on::<{async fn body of tokio::task::LocalSet::run_until<{async block@tokio_util::task::spawn_pinned::LocalWorkerHandle::run::{closure#0}}>()}>` at /home/byt/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/runtime.rs:335:13: 335:40
= note: inside `tokio::task::LocalSet::block_on::<{async block@tokio_util::task::spawn_pinned::LocalWorkerHandle::run::{closure#0}}>` at /home/byt/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/task/local.rs:596:9: 596:44
= note: inside `tokio_util::task::spawn_pinned::LocalWorkerHandle::run` at /home/byt/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-util-0.7.12/src/task/spawn_pinned.rs:396:9: 401:11
= note: inside closure at /home/byt/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-util-0.7.12/src/task/spawn_pinned.rs:381:31: 381:77
= note: inside `std::sys::backtrace::__rust_begin_short_backtrace::<{closure@tokio_util::task::spawn_pinned::LocalWorkerHandle::new_worker::{closure#0}}, ()>` at /home/byt/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/sys/backtrace.rs:154:18: 154:21
= note: inside closure at /home/byt/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/thread/mod.rs:522:17: 522:71
= note: inside `<std::panic::AssertUnwindSafe<{closure@std::thread::Builder::spawn_unchecked_<'_, {closure@tokio_util::task::spawn_pinned::LocalWorkerHandle::new_worker::{closure#0}}, ()>::{closure#1}::{closure#0}}> as std::ops::FnOnce<()>>::call_once` at /home/byt/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/panic/unwind_safe.rs:272:9: 272:19
= note: inside `std::panicking::r#try::do_call::<std::panic::AssertUnwindSafe<{closure@std::thread::Builder::spawn_unchecked_<'_, {closure@tokio_util::task::spawn_pinned::LocalWorkerHandle::new_worker::{closure#0}}, ()>::{closure#1}::{closure#0}}>, ()>` at /home/byt/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panicking.rs:557:40: 557:43
= note: inside `std::panicking::r#try::<(), std::panic::AssertUnwindSafe<{closure@std::thread::Builder::spawn_unchecked_<'_, {closure@tokio_util::task::spawn_pinned::LocalWorkerHandle::new_worker::{closure#0}}, ()>::{closure#1}::{closure#0}}>>` at /home/byt/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panicking.rs:520:19: 520:88
= note: inside `std::panic::catch_unwind::<std::panic::AssertUnwindSafe<{closure@std::thread::Builder::spawn_unchecked_<'_, {closure@tokio_util::task::spawn_pinned::LocalWorkerHandle::new_worker::{closure#0}}, ()>::{closure#1}::{closure#0}}>, ()>` at /home/byt/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panic.rs:348:14: 348:33
= note: inside closure at /home/byt/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/thread/mod.rs:521:30: 523:16
= note: inside `<{closure@std::thread::Builder::spawn_unchecked_<'_, {closure@tokio_util::task::spawn_pinned::LocalWorkerHandle::new_worker::{closure#0}}, ()>::{closure#1}} as std::ops::FnOnce<()>>::call_once - shim(vtable)` at /home/byt/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ops/function.rs:250:5: 250:71
= note: inside `<std::boxed::Box<dyn std::ops::FnOnce()> as std::ops::FnOnce<()>>::call_once` at /home/byt/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/boxed.rs:2453:9: 2453:52
= note: inside `<std::boxed::Box<std::boxed::Box<dyn std::ops::FnOnce()>> as std::ops::FnOnce<()>>::call_once` at /home/byt/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/boxed.rs:2453:9: 2453:52
= note: inside `std::sys::pal::unix::thread::Thread::new::thread_start` at /home/byt/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/sys/pal/unix/thread.rs:105:17: 105:64
= note: this error originates in the macro `syscall` (in Nightly builds, run with -Z macro-backtrace for more info)
error: deadlock: the evaluated program deadlocked
--> /home/byt/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/sys/pal/unix/futex.rs:75:21
|
75 | )
| ^ the evaluated program deadlocked
|
= note: BACKTRACE:
= note: inside `std::sys::pal::unix::futex::futex_wait` at /home/byt/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/sys/pal/unix/futex.rs:75:21: 75:22
= note: inside `std::sys::sync::condvar::futex::Condvar::wait_optional_timeout` at /home/byt/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/sys/sync/condvar/futex.rs:50:17: 50:62
= note: inside `std::sys::sync::condvar::futex::Condvar::wait` at /home/byt/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/sys/sync/condvar/futex.rs:34:9: 34:48
= note: inside `std::sync::Condvar::wait::<()>` at /home/byt/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/sync/condvar.rs:192:13: 192:34
= note: inside `tokio::runtime::park::Inner::park` at /home/byt/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/park.rs:116:17: 116:37
= note: inside closure at /home/byt/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/park.rs:254:41: 254:65
= note: inside closure at /home/byt/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/park.rs:268:41: 268:49
= note: inside `std::thread::LocalKey::<tokio::runtime::park::ParkThread>::try_with::<{closure@tokio::runtime::park::CachedParkThread::with_current<{closure@tokio::runtime::park::CachedParkThread::park::{closure#0}}, ()>::{closure#0}}, ()>` at /home/byt/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/thread/local.rs:283:12: 283:27
= note: inside `tokio::runtime::park::CachedParkThread::with_current::<{closure@tokio::runtime::park::CachedParkThread::park::{closure#0}}, ()>` at /home/byt/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/park.rs:268:9: 268:50
= note: inside `tokio::runtime::park::CachedParkThread::park` at /home/byt/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/park.rs:254:9: 254:66
= note: inside `tokio::runtime::park::CachedParkThread::block_on::<{async block@src/main.rs:3:1: 3:15}>` at /home/byt/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/park.rs:285:13: 285:24
= note: inside `tokio::runtime::context::blocking::BlockingRegionGuard::block_on::<{async block@src/main.rs:3:1: 3:15}>` at /home/byt/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/context/blocking.rs:66:9: 66:25
= note: inside closure at /home/byt/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/scheduler/multi_thread/mod.rs:87:13: 87:38
= note: inside `tokio::runtime::scheduler::multi_thread::MultiThread::block_on::<{async block@src/main.rs:3:1: 3:15}>` at /home/byt/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/scheduler/multi_thread/mod.rs:86:9: 88:11
note: inside `main`
--> src/main.rs:9:5
|
9 | pool.spawn_pinned(|| async {}).await;
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
error: deadlock: the evaluated program deadlocked
--> /home/byt/.cargo/registry/src/index.crates.io-6f17d22bba15001f/mio-1.0.2/src/sys/unix/selector/epoll.rs:56:9
|
56 | / syscall!(epoll_wait(
57 | | self.ep.as_raw_fd(),
58 | | events.as_mut_ptr(),
59 | | events.capacity() as i32,
60 | | timeout,
61 | | ))
| |__________^ the evaluated program deadlocked
|
= note: BACKTRACE on thread `tokio-runtime-w`:
= note: inside `mio::sys::unix::selector::Selector::select` at /home/byt/.cargo/registry/src/index.crates.io-6f17d22bba15001f/mio-1.0.2/src/sys/unix/mod.rs:8:48: 8:49
= note: inside `mio::poll::Poll::poll` at /home/byt/.cargo/registry/src/index.crates.io-6f17d22bba15001f/mio-1.0.2/src/poll.rs:435:9: 435:61
= note: inside `tokio::runtime::io::driver::Driver::turn` at /home/byt/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/io/driver.rs:149:15: 149:47
= note: inside `tokio::runtime::io::driver::Driver::park` at /home/byt/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/io/driver.rs:122:9: 122:32
= note: inside `tokio::runtime::signal::Driver::park` at /home/byt/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/signal/mod.rs:92:9: 92:29
= note: inside `tokio::runtime::process::Driver::park` at /home/byt/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/process.rs:32:9: 32:31
= note: inside `tokio::runtime::driver::IoStack::park` at /home/byt/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/driver.rs:175:40: 175:54
= note: inside `tokio::runtime::time::Driver::park_internal` at /home/byt/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/time/mod.rs:247:21: 247:46
= note: inside `tokio::runtime::time::Driver::park` at /home/byt/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/time/mod.rs:173:9: 173:41
= note: inside `tokio::runtime::driver::TimeDriver::park` at /home/byt/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/driver.rs:332:55: 332:74
= note: inside `tokio::runtime::driver::Driver::park` at /home/byt/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/driver.rs:71:9: 71:32
= note: inside `tokio::runtime::scheduler::multi_thread::park::Inner::park_driver` at /home/byt/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/scheduler/multi_thread/park.rs:194:9: 194:28
= note: inside `tokio::runtime::scheduler::multi_thread::park::Inner::park` at /home/byt/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/scheduler/multi_thread/park.rs:127:13: 127:50
= note: inside `tokio::runtime::scheduler::multi_thread::park::Parker::park` at /home/byt/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/scheduler/multi_thread/park.rs:70:9: 70:32
= note: inside `tokio::runtime::scheduler::multi_thread::worker::Context::park_timeout` at /home/byt/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/scheduler/multi_thread/worker.rs:750:13: 750:50
= note: inside `tokio::runtime::scheduler::multi_thread::worker::Context::park` at /home/byt/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/scheduler/multi_thread/worker.rs:718:24: 718:53
= note: inside `tokio::runtime::scheduler::multi_thread::worker::Context::run` at /home/byt/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/scheduler/multi_thread/worker.rs:566:21: 566:36
= note: inside closure at /home/byt/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/scheduler/multi_thread/worker.rs:513:21: 513:33
= note: inside `tokio::runtime::context::scoped::Scoped::<tokio::runtime::scheduler::Context>::set::<{closure@tokio::runtime::scheduler::multi_thread::worker::run::{closure#0}::{closure#0}}, ()>` at /home/byt/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/context/scoped.rs:40:9: 40:12
= note: inside closure at /home/byt/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/context.rs:180:26: 180:47
= note: inside `std::thread::LocalKey::<tokio::runtime::context::Context>::try_with::<{closure@tokio::runtime::context::set_scheduler<(), {closure@tokio::runtime::scheduler::multi_thread::worker::run::{closure#0}::{closure#0}}>::{closure#0}}, ()>` at /home/byt/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/thread/local.rs:283:12: 283:27
= note: inside `std::thread::LocalKey::<tokio::runtime::context::Context>::with::<{closure@tokio::runtime::context::set_scheduler<(), {closure@tokio::runtime::scheduler::multi_thread::worker::run::{closure#0}::{closure#0}}>::{closure#0}}, ()>` at /home/byt/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/thread/local.rs:260:9: 260:25
= note: inside `tokio::runtime::context::set_scheduler::<(), {closure@tokio::runtime::scheduler::multi_thread::worker::run::{closure#0}::{closure#0}}>` at /home/byt/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/context.rs:180:9: 180:48
= note: inside closure at /home/byt/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/scheduler/multi_thread/worker.rs:508:9: 519:11
= note: inside `tokio::runtime::context::runtime::enter_runtime::<{closure@tokio::runtime::scheduler::multi_thread::worker::run::{closure#0}}, ()>` at /home/byt/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/context/runtime.rs:65:16: 65:38
= note: inside `tokio::runtime::scheduler::multi_thread::worker::run` at /home/byt/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/scheduler/multi_thread/worker.rs:500:5: 520:7
= note: inside closure at /home/byt/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/scheduler/multi_thread/worker.rs:466:45: 466:56
= note: inside `<tokio::runtime::blocking::task::BlockingTask<{closure@tokio::runtime::scheduler::multi_thread::worker::Launch::launch::{closure#0}}> as std::future::Future>::poll` at /home/byt/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/blocking/task.rs:42:21: 42:27
= note: inside closure at /home/byt/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/task/core.rs:331:17: 331:37
= note: inside `tokio::loom::std::unsafe_cell::UnsafeCell::<tokio::runtime::task::core::Stage<tokio::runtime::blocking::task::BlockingTask<{closure@tokio::runtime::scheduler::multi_thread::worker::Launch::launch::{closure#0}}>>>::with_mut::<std::task::Poll<()>, {closure@tokio::runtime::task::core::Core<tokio::runtime::blocking::task::BlockingTask<{closure@tokio::runtime::scheduler::multi_thread::worker::Launch::launch::{closure#0}}>, tokio::runtime::blocking::schedule::BlockingSchedule>::poll::{closure#0}}>` at /home/byt/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/loom/std/unsafe_cell.rs:16:9: 16:24
= note: inside `tokio::runtime::task::core::Core::<tokio::runtime::blocking::task::BlockingTask<{closure@tokio::runtime::scheduler::multi_thread::worker::Launch::launch::{closure#0}}>, tokio::runtime::blocking::schedule::BlockingSchedule>::poll` at /home/byt/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/task/core.rs:320:13: 332:15
= note: inside closure at /home/byt/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/task/harness.rs:500:19: 500:38
= note: inside `<std::panic::AssertUnwindSafe<{closure@tokio::runtime::task::harness::poll_future<tokio::runtime::blocking::task::BlockingTask<{closure@tokio::runtime::scheduler::multi_thread::worker::Launch::launch::{closure#0}}>, tokio::runtime::blocking::schedule::BlockingSchedule>::{closure#0}}> as std::ops::FnOnce<()>>::call_once` at /home/byt/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/panic/unwind_safe.rs:272:9: 272:19
= note: inside `std::panicking::r#try::do_call::<std::panic::AssertUnwindSafe<{closure@tokio::runtime::task::harness::poll_future<tokio::runtime::blocking::task::BlockingTask<{closure@tokio::runtime::scheduler::multi_thread::worker::Launch::launch::{closure#0}}>, tokio::runtime::blocking::schedule::BlockingSchedule>::{closure#0}}>, std::task::Poll<()>>` at /home/byt/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panicking.rs:557:40: 557:43
= note: inside `std::panicking::r#try::<std::task::Poll<()>, std::panic::AssertUnwindSafe<{closure@tokio::runtime::task::harness::poll_future<tokio::runtime::blocking::task::BlockingTask<{closure@tokio::runtime::scheduler::multi_thread::worker::Launch::launch::{closure#0}}>, tokio::runtime::blocking::schedule::BlockingSchedule>::{closure#0}}>>` at /home/byt/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panicking.rs:520:19: 520:88
= note: inside `std::panic::catch_unwind::<std::panic::AssertUnwindSafe<{closure@tokio::runtime::task::harness::poll_future<tokio::runtime::blocking::task::BlockingTask<{closure@tokio::runtime::scheduler::multi_thread::worker::Launch::launch::{closure#0}}>, tokio::runtime::blocking::schedule::BlockingSchedule>::{closure#0}}>, std::task::Poll<()>>` at /home/byt/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panic.rs:348:14: 348:33
= note: inside `tokio::runtime::task::harness::poll_future::<tokio::runtime::blocking::task::BlockingTask<{closure@tokio::runtime::scheduler::multi_thread::worker::Launch::launch::{closure#0}}>, tokio::runtime::blocking::schedule::BlockingSchedule>` at /home/byt/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/task/harness.rs:488:18: 503:8
= note: inside `tokio::runtime::task::harness::Harness::<tokio::runtime::blocking::task::BlockingTask<{closure@tokio::runtime::scheduler::multi_thread::worker::Launch::launch::{closure#0}}>, tokio::runtime::blocking::schedule::BlockingSchedule>::poll_inner` at /home/byt/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/task/harness.rs:209:27: 209:55
= note: inside `tokio::runtime::task::harness::Harness::<tokio::runtime::blocking::task::BlockingTask<{closure@tokio::runtime::scheduler::multi_thread::worker::Launch::launch::{closure#0}}>, tokio::runtime::blocking::schedule::BlockingSchedule>::poll` at /home/byt/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/task/harness.rs:154:15: 154:32
= note: inside `tokio::runtime::task::raw::poll::<tokio::runtime::blocking::task::BlockingTask<{closure@tokio::runtime::scheduler::multi_thread::worker::Launch::launch::{closure#0}}>, tokio::runtime::blocking::schedule::BlockingSchedule>` at /home/byt/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/task/raw.rs:271:5: 271:19
= note: inside `tokio::runtime::task::raw::RawTask::poll` at /home/byt/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/task/raw.rs:201:18: 201:41
= note: inside `tokio::runtime::task::UnownedTask::<tokio::runtime::blocking::schedule::BlockingSchedule>::run` at /home/byt/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/task/mod.rs:473:9: 473:19
= note: inside `tokio::runtime::blocking::pool::Task::run` at /home/byt/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/blocking/pool.rs:160:9: 160:24
= note: inside `tokio::runtime::blocking::pool::Inner::run` at /home/byt/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/blocking/pool.rs:518:17: 518:27
= note: inside closure at /home/byt/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/blocking/pool.rs:476:13: 476:54
= note: inside `std::sys::backtrace::__rust_begin_short_backtrace::<{closure@tokio::runtime::blocking::pool::Spawner::spawn_thread::{closure#0}}, ()>` at /home/byt/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/sys/backtrace.rs:154:18: 154:21
= note: inside closure at /home/byt/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/thread/mod.rs:522:17: 522:71
= note: inside `<std::panic::AssertUnwindSafe<{closure@std::thread::Builder::spawn_unchecked_<'_, {closure@tokio::runtime::blocking::pool::Spawner::spawn_thread::{closure#0}}, ()>::{closure#1}::{closure#0}}> as std::ops::FnOnce<()>>::call_once` at /home/byt/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/panic/unwind_safe.rs:272:9: 272:19
= note: inside `std::panicking::r#try::do_call::<std::panic::AssertUnwindSafe<{closure@std::thread::Builder::spawn_unchecked_<'_, {closure@tokio::runtime::blocking::pool::Spawner::spawn_thread::{closure#0}}, ()>::{closure#1}::{closure#0}}>, ()>` at /home/byt/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panicking.rs:557:40: 557:43
= note: inside `std::panicking::r#try::<(), std::panic::AssertUnwindSafe<{closure@std::thread::Builder::spawn_unchecked_<'_, {closure@tokio::runtime::blocking::pool::Spawner::spawn_thread::{closure#0}}, ()>::{closure#1}::{closure#0}}>>` at /home/byt/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panicking.rs:520:19: 520:88
= note: inside `std::panic::catch_unwind::<std::panic::AssertUnwindSafe<{closure@std::thread::Builder::spawn_unchecked_<'_, {closure@tokio::runtime::blocking::pool::Spawner::spawn_thread::{closure#0}}, ()>::{closure#1}::{closure#0}}>, ()>` at /home/byt/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panic.rs:348:14: 348:33
= note: inside closure at /home/byt/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/thread/mod.rs:521:30: 523:16
= note: inside `<{closure@std::thread::Builder::spawn_unchecked_<'_, {closure@tokio::runtime::blocking::pool::Spawner::spawn_thread::{closure#0}}, ()>::{closure#1}} as std::ops::FnOnce<()>>::call_once - shim(vtable)` at /home/byt/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ops/function.rs:250:5: 250:71
= note: inside `<std::boxed::Box<dyn std::ops::FnOnce()> as std::ops::FnOnce<()>>::call_once` at /home/byt/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/boxed.rs:2453:9: 2453:52
= note: inside `<std::boxed::Box<std::boxed::Box<dyn std::ops::FnOnce()>> as std::ops::FnOnce<()>>::call_once` at /home/byt/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/boxed.rs:2453:9: 2453:52
= note: inside `std::sys::pal::unix::thread::Thread::new::thread_start` at /home/byt/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/sys/pal/unix/thread.rs:105:17: 105:64
= note: this error originates in the macro `syscall` (in Nightly builds, run with -Z macro-backtrace for more info)
note: some details are omitted, run with `MIRIFLAGS=-Zmiri-backtrace=full` for a verbose backtrace
error: aborting due to 3 previous errors; 2 warnings emitted |
Current progress: As miri reported deadlock on both Also, when running this test with @Darksonn do you have any idea on what could possibly happen in the reproducible? |
I took a quick look but wasn't able to see the issue. But be aware that the test has two runtimes, so you have two epoll instances in that test. Not sure what effect that has. |
The reproducer is helpful. But it can be simplified. The tokio function only needs to be called once, but it took about twice as many seeds before the deadlock was reported. In my build, a seed value of 9 triggered the problem. Found that, starting from 1, so eight had passed. |
Nice! I think we can start minimizing then by manually inlining aggressively |
Maybe this helps understand the failing code a little better. There are three threads being reported as deadlocked. I think this means there are three threads with no chance of making further progress in the simulator. Likely they aren't actually deadlocked with each other. But the design would have called for a clean exit on the main thread after the nop future was polled on the worker thread. So it appears an event was lost or an event didn't get created. Of the three threads, two are blocked on epoll, and one is blocked on a futex. The two epolls being blocked on are actually in different tokio runtimes. There is the multi_thread tokio runtime/scheduler that was responsible for the futex blocked thread and the main thread. And there is a current_thread (aka single threaded tokio) scheduler that is also blocked on an epoll. The output I'm looking at shows the three thread backtraces, I'll number then 1, 2 and 3, in the order they are printed. 1 - shows a thread name of "unnamed-2" and is the current_thread tokio blocked on epoll. 2 - shows no thread name, is blocked on a futex and condvar, and deeper down the stack shows is part of the tokio multi_thread runtime. 3 - shows a thread name of "tokio-runtime-w", and is the multi-thread runtime blocked on epoll. I'm going to go out on a limb and guess that thread 1 above is the sole worker thread that was created for the pool and it is in its normal state of having nothing to do so it waits. The logic in Miri should have signaled that the work it was given has completed. |
That means it is the main thread. Miri doesn't know which thread is blocked "on which other thread", so when it reports a deadlock that means all threads are blocked on something, but the actual deadlocked cycle might not involve all the mentioned threads. |
But maybe deadlock is a misnomer. I take deadlock to mean a mutual embrace, while a simulator like Miri is probably only reporting that there is no simulated thread able to make further progress. None is technically blocked waiting for another. And to my naive eye, it is interesting that the main thread is reported blocked on the futex. I always thought that tokio would run its scheduler on the main thread, but probably that was just a bad assumption. |
I think deadlock is the correct term. If no thread can make progress, that means there's a subset of threads that are in a dependency cycle. |
The thread blocked on the futex is interesting. I suspect it was started by the multi_thread scheduler to watch for when the worker thread wants to signal that it completed the work given to it. If the futex/condvar is blocked, I think that points to either the worker thread never seeing the work it was supposed to do, or it did the work and signaled the completion, but the simulated futex didn't match the send with the receive. I think the worker's work was received and done, as evidenced by other steps I took in narrowing down the possibilities. But that analysis also ran into new problems not worth mentioning here so everything here needs to be taken with a grain of salt. |
Generally, each Tokio runtime will have one thread blocked on epoll, and all other threads blocked on a futex/condvar. In the case of the main thread, that's a |
Maybe we need to do a log dumpl of all FFI calls and its args, as well as thread switches in a successful run and in a run that deadlocks and compare them. May have some insights. |
One thing @tiif noticed was that epoll_wait was never invoked on a real system, even tho miri ends up with a deadlock waiting on such a call finishing |
I'm not convinced that this is really true. When the extra threads become idle, they would use epoll_wait to sleep. Of course, there's a possibility that the main thread causes the process to abort before that happens. |
Thank you. I also took that as a red-herring. And I'm running into my own fair share of those. Glad I don't post something everytime I think a new interesting pattern has emerged. :) By looking at the tokio-util code, I see where the current_thread runtime comes from. So that makes sense. I don't know how to dump FFI calls, I'm presuming those are simulated FFI calls. But I would like to see how Futexes are simulated in Miri anyway so this is good. |
This strace of a real execution does not contain a single epoll_wait: https://rust-lang.zulipchat.com/user_uploads/4715/MR22oZW2Mj8PVXHkh_XNk--8/log We don't have a good way of logging emulated FFI calls. A full trace may be too noisy, so it may require some log filtering or modifying of miri to just log the interesting parts |
The interplay of the code running in the worker that would use condvar to indicate the work is done and then presumably call notify_one, and the condvar loop running in the main thread which is waiting to see some shared state change is where I want to get further visibility. I'm still looking for the tokio call that does these two sides, and then I'll look at the Miri shim for those pieces. (I think shim is the right term but I could be wrong.) |
The reproducer can be made even smaller. The tokio current_thread scheduler can be used for main and then the stack dumps show only two threads were active. Not sure this makes analyzing the problem easier though. use tokio_util::task;
#[tokio::main(flavor = "current_thread")]
async fn main() {
let pool = task::LocalPoolHandle::new(1);
let _ = pool.spawn_pinned_by_idx(|| async {}, 0).await;
} |
Just an update - I am making progress in understanding where the passing path diverges from the failing path in the "best" case. Best defined by the longest common path created by the Miri PRNG. Thank goodness for the INFO messages from the miri::machine module. I clearly see a difference in the miri shim use of the first futex. On the failing side, there is never a futex_wake called because the tokio code seems to have already thought there is no-one left on the other side of an unbounded mpsc channel (funny I'm not sure yet if the sender or the receiver was thought to be gone - I just see on the passing path that tokio::sync::maps::UnboundedSender code kicks in while on the failing side, std::rt::panic_count::count_is_zero kicks in). But even this, take with a grain of salt. I need to reproduce the results, and hopefully somehow reduce the diff noise that came before. I haven't looked at how Miri uses its PRNG from the seed - maybe there is already a way to do this, but it would be nice if there were a second level of seed that could be kicked in after 'n' calls to the first PRNG - 'n' could then be determined manually with another binary-search. So when either a good seed or a bad seed was found, one could play with when to diverge down a new path. As it is, I have a passing seed of 167 and a failing seed of 178 that show remarkable congruency for well over a quarter of the extensive trace outputs (33,000 some lines are the same). But before interesting things happen about lines 105,000, lots of small and medium differences - and the significance of those differences ahead of the large divergence I haven't figured out. I also have ideas of reducing the noise by running tokio with features features enabled and even hard-wiring some cfg macros like for coop and MetricsBatch, that one wouldn't expect to affect the final results, but because of their use of the cell module, certainly cause a lot of variation possibilities for Miri from seed to seed. If someone finds a quicker solution, I won't mind but this investigation is certainly fun so far. |
It sounds like you want exactly what loom does, just under MIRI :) |
I think a seed fuel for breaking the rng after n rng invocations is a great idea. All we'd need to do is track the number of RNG invocations and interleave a single rand bool after N has been reached. |
@Darksonn Question, does the "net" feature on tokio and on tokio-util have anything in common? In my new try at reducing the tracing output, I started to run with less than "full" features for tokio and tokio-util and to my surprise, found lots of ways to get the failing test for epoll to pass. I wouldn't be surprised if without "net", there is no epoll, but then how does the main, mentioned above, even run? Anyway after binary searches on the features for the two crates, I found adding "net" to either dependency caused the failure path. So in case it helps anyone figure this out faster, running with a Cargo.toml where tokio just uses "rt-multi-thread" and "macros", and tokio-util just uses "rt", I didn't see that test fail, with a hundred different seeds, while usually about one in five would fail. With "net" added to either, it would fail basically immediately. So big difference. Again, I wouldn't be surprised that without "net" on one or the other, I'm not testing what I want to be testing. But I have to go out for a while. Later, I can modify a local tokio to see which branches from the "net" feature is significant to this. |
I observed that adding certain dependencies will invoke Also, thanks for helping to narrow down the cause. I will keep following this thread while trying different stuff on my side. |
Well, okay. The "net" feature to tokio-util passes "net" to tokio. Should have looked at its Cargo.toml before sounding like I didn't know how to figure anything! So I'll focus on what aspect of "net" to tokio makes the difference. |
A bit more narrowing. The tokio/net feature causes feature mio/os-poll to be pulled in. I've gone to running the test with the current_thread runtime as that only brings two threads into play instead of three and still displays the failure about 20% of the seeded tests - so fewer differences to compare. I've narrowed it down to the PRNG in the miri concurrency::weak_memory module. Interesting that it is not the PRNG responsible for the thread switching or any of the other ten places a PRNG is used by miri. Here are some notes I made along the way.
|
@RalfJung That actually makes a lot of sense. Thanks for spelling out what probably seemed obvious to you. Let me see if I can follow the bread crumbs in eventfd and the test you laid out. --Edit So eventfd has a read and a write and they join their VClock to the thread, or from the thread, respectively. |
Thanks again for all the help here!
A thread is woken up only if there is at least one How it works is (with a lot of details omitted):
|
Sorry, yes, I meant the ready list whenever I said "interest list".
|
There are two ready_list defined in the epoll module. Still wrapping my head around the fact there are two things holding the same type of BTreeMap. Is there really one but through cloning, the two types share it? If so, then it sounds like a new struct to wrap the BTreeMap, or essentially turn it into a tuple of BTreeMap and VClock. |
They are the same thing. If an |
As expected, I managed to produce a data race error (and sadly not a deadlock) with the test below in fn test_epoll_race() {
// Create an epoll instance.
let epfd = unsafe { libc::epoll_create1(0) };
assert_ne!(epfd, -1);
// Create an eventfd instances.
let flags = libc::EFD_NONBLOCK | libc::EFD_CLOEXEC;
let fd = unsafe { libc::eventfd(0, flags) };
// Register eventfd with the epoll instance.
let mut ev = libc::epoll_event { events: EPOLL_IN_OUT_ET, u64: fd as u64 };
let res = unsafe { libc::epoll_ctl(epfd, libc::EPOLL_CTL_ADD, fd, &mut ev) };
assert_eq!(res, 0);
static mut VAL: u8 = 0;
let thread1 = thread::spawn(move || {
// Write to the static mut variable.
unsafe { VAL = 1 };
// Write to the eventfd instance.
let sized_8_data: [u8; 8] = 1_u64.to_ne_bytes();
let res = unsafe { libc::write(fd, sized_8_data.as_ptr() as *const libc::c_void, 8) };
// read returns number of bytes has been read, which is always 8.
assert_eq!(res, 8);
});
thread::yield_now();
// epoll_wait for the event to happen.
let expected_event = u32::try_from(libc::EPOLLIN | libc::EPOLLOUT).unwrap();
let expected_value = u64::try_from(fd).unwrap();
check_epoll_wait::<8>(epfd, &[(expected_event, expected_value)], -1);
// Read from the static mut variable.
unsafe { assert_eq!(VAL, 1) };
thread1.join().unwrap();
} I kind of hope I can reproduce the deadlock issue purely using syscalls to check whether it is fixed. |
Should the VClock be per ready_list, or per entry in the ready_list, the EpollEventInterest? The latter is more fine grained I think and would only cause a clock sync from the thread resolving the event to the thread waiting for the event. Otherwise there seems like more clock syncing to the thread being woken then is called for. |
Isn't the ready list already per-epoll-instance? But I guess per event is indeed better, since a call to |
Yes. I have a concern here, if we were to support level-triggered epoll in future, one event will be able to wake up more than one thread at once. I assume a clock can only be acquire once, so would this cause problem? So for level-triggered epoll, something like this might happen:
What we currently support is edge-triggered epoll, and one event will only wake up one thread, but this doesn't apply to level-triggered epoll. |
That assumption is wrong. :) It can be acquired any number of times. |
@tiif Thank you for putting my latest confusion so clearly. I'm looking for where a fd event triggers the epoll mechanism, and wondering how the epoll event hooks up to the ready lists. Probably there is one global for all epoll events and that can be mapped back to the epolls? |
As far as I am concerned, the above is a better test. But if you want to keep looking I won't stop you. :) |
Nice to hear that! :D
Whenever an event happened, We have a global map that maps target file description to its associated But I think my explanation is vague here, if you need more explanation you can always open a thread in zulip or ping me directly :) |
@tiif Perfectly clear. I was going down that path but hadn't noticed the this.machine.epoll_interests. So I think the clock can be per ready list. I should be able to test this in a few minutes now but you may want to fix it all up to your liking anyway, I totally understand. |
Just go for it :), It'd be nice if you can help to test the tokio reproducible above at the same time and make sure it can pass (but you don't need to add it into the PR). |
Yes, the tokio reproducible is one of my goals. Just getting it to compile is satisfying but not that satisfying. |
Adding VClock to the epoll ready_list fixes the tokio reproducer. |
Awesome. :) Nice job identifying which part of the non-determinism is the culprit here. |
For the PR, I was asked to include the test case described above. I could use some help in figuring out how to get the test either run or compiled. When I try to run it, it won't compile but maybe there is a different way the test should be run? I tried
but that fails because the test harness says the function didn't compile and it shows a reasonable compiler error:
|
Follow-up question. I found another way to run the test so now it's just a warning but then an assertion fails.
and here you can see what line 171 in that file is:
If I run it as Is this the normal way to run tests and should I just commit this to the PR as is? The PR's CI isn't going to flag the warning, the assertion, or the compile error? |
you can grep for the warnings can be disabled with the attribute mentioned in the diagnostic message |
Under what conditions do the So why does putting the flag on the command line to miri work but not the comment in the header? I find dozens of test files with similar lines in their header. Here is what the file already has:
I found the ui_test docs talk about this comment flag, but it doesn't help me understand why passing the -Z flag on the test command line makes it pass but not having it on the command line but in the test file, like dozens of others, and only this new test function fails? |
Well, as soon as I posted that, I thought of something else to try. Just But when the test is run with a FILTER, either as a test or a run, the flag has to be provided on the command-line. Was this advice somewhere I should have been reading first - it would probably be in the ui_test docs but I don't see it? Or am I still misinterpreting things and the results are explained some other way? 😕 |
There are two ways to run the test in miri, one is There is a brief introduction about testing in https://github.com/rust-lang/miri/blob/master/CONTRIBUTING.md#building-and-testing-miri If you are unsure, you can push the change to the PR first, and we can help to check what's wrong by reading the CI log. |
Thanks for debugging this, finding a fix and opening a PR! This was definitely very unexpected from just looking at the original issue |
I am wondering if there are API things we can do to reduce the likelihood of this. Thread unblocking almost always occurs because of some other event an another thread that we should register a causal link with, so maybe the thread unblocking function should take a clock. It's just we've so far kept those subsystems largely orthogonal. |
Description
When running one of the tokio test with
cargo +nightly miri test --features full --test spawn_pinned
, miri reported deadlock onepoll_wait
syscall. This is likely caused by epoll shim not receiving an epoll event notification to unblock the thread when it should have received one in real world system.Version
Rustc:
Tokio:
Tokio repo master branch commit
27539ae3
Full trace
The text was updated successfully, but these errors were encountered: