Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Epoll deadlock on tokio's test #3911

Closed
tiif opened this issue Sep 24, 2024 · 60 comments · Fixed by #3932
Closed

Epoll deadlock on tokio's test #3911

tiif opened this issue Sep 24, 2024 · 60 comments · Fixed by #3932
Assignees

Comments

@tiif
Copy link
Contributor

tiif commented Sep 24, 2024

Description

When running one of the tokio test with cargo +nightly miri test --features full --test spawn_pinned, miri reported deadlock on epoll_wait syscall. This is likely caused by epoll shim not receiving an epoll event notification to unblock the thread when it should have received one in real world system.

Version

Rustc:

rustc 1.83.0-nightly (7042c269c 2024-09-23)
binary: rustc
commit-hash: 7042c269c166191cd5d8daf0409890903df7af57
commit-date: 2024-09-23
host: x86_64-unknown-linux-gnu
release: 1.83.0-nightly
LLVM version: 19.1.0

Tokio:
Tokio repo master branch commit 27539ae3

Full trace
    Finished `test` profile [unoptimized + debuginfo] target(s) in 0.06s
     Running tests/spawn_pinned.rs (target/miri/x86_64-unknown-linux-gnu/debug/deps/spawn_pinned-5d9d1b31504e33c5)

running 9 tests
test callback_panic_does_not_kill_worker ... error: deadlock: the evaluated program deadlocked
   --> /home/byt/.cargo/registry/src/index.crates.io-6f17d22bba15001f/mio-1.0.2/src/sys/unix/selector/epoll.rs:56:9
    |
56  | /         syscall!(epoll_wait(
57  | |             self.ep.as_raw_fd(),
58  | |             events.as_mut_ptr(),
59  | |             events.capacity() as i32,
60  | |             timeout,d
61  | |         ))
    | |__________^ the evaluated program deadlocked
    |
    = note: BACKTRACE on thread `unnamed-2`:
    = note: inside `mio::sys::unix::selector::Selector::select` at /home/byt/.cargo/registry/src/index.crates.io-6f17d22bba15001f/mio-1.0.2/src/sys/unix/mod.rs:8:48: 8:49
    = note: inside `mio::poll::Poll::poll` at /home/byt/.cargo/registry/src/index.crates.io-6f17d22bba15001f/mio-1.0.2/src/poll.rs:435:9: 435:61
note: inside `tokio::runtime::io::driver::Driver::turn`
   --> /home/byt/Documents/tokio/tokio/src/runtime/io/driver.rs:149:15
    |
149 |         match self.poll.poll(events, max_wait) {
    |               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
note: inside `tokio::runtime::io::driver::Driver::park`
   --> /home/byt/Documents/tokio/tokio/src/runtime/io/driver.rs:122:9
    |
122 |         self.turn(handle, None);
    |         ^^^^^^^^^^^^^^^^^^^^^^^
note: inside `tokio::runtime::signal::Driver::park`
   --> /home/byt/Documents/tokio/tokio/src/runtime/signal/mod.rs:92:9
    |
92  |         self.io.park(handle);
    |         ^^^^^^^^^^^^^^^^^^^^
note: inside `tokio::runtime::process::Driver::park`
   --> /home/byt/Documents/tokio/tokio/src/runtime/process.rs:32:9
    |
32  |         self.park.park(handle);
    |         ^^^^^^^^^^^^^^^^^^^^^^
note: inside `tokio::runtime::driver::IoStack::park`
   --> /home/byt/Documents/tokio/tokio/src/runtime/driver.rs:175:40
    |
175 |                 IoStack::Enabled(v) => v.park(handle),
    |                                        ^^^^^^^^^^^^^^
note: inside `tokio::runtime::time::Driver::park_internal`
   --> /home/byt/Documents/tokio/tokio/src/runtime/time/mod.rs:247:21
    |
247 |                     self.park.park(rt_handle);
    |                     ^^^^^^^^^^^^^^^^^^^^^^^^^
note: inside `tokio::runtime::time::Driver::park`
   --> /home/byt/Documents/tokio/tokio/src/runtime/time/mod.rs:173:9
    |
173 |         self.park_internal(handle, None);
    |         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
note: inside `tokio::runtime::driver::TimeDriver::park`
   --> /home/byt/Documents/tokio/tokio/src/runtime/driver.rs:332:55
    |
332 |                 TimeDriver::Enabled { driver, .. } => driver.park(handle),
    |                                                       ^^^^^^^^^^^^^^^^^^^
note: inside `tokio::runtime::driver::Driver::park`
   --> /home/byt/Documents/tokio/tokio/src/runtime/driver.rs:71:9
    |
71  |         self.inner.park(handle);
    |         ^^^^^^^^^^^^^^^^^^^^^^^
note: inside closure
   --> /home/byt/Documents/tokio/tokio/src/runtime/scheduler/current_thread/mod.rs:382:17
    |
382 |                 driver.park(&handle.driver);
    |                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^
note: inside `tokio::runtime::scheduler::current_thread::Context::enter::<(), {closure@tokio::runtime::scheduler::current_thread::Context::park::{closure#1}}>`
   --> /home/byt/Documents/tokio/tokio/src/runtime/scheduler/current_thread/mod.rs:423:19
    |
423 |         let ret = f();
    |                   ^^^
note: inside `tokio::runtime::scheduler::current_thread::Context::park`
   --> /home/byt/Documents/tokio/tokio/src/runtime/scheduler/current_thread/mod.rs:381:27
    |
381 |               let (c, ()) = self.enter(core, || {
    |  ___________________________^
382 | |                 driver.park(&handle.driver);
383 | |                 self.defer.wake();
384 | |             });
    | |______________^
note: inside closure
   --> /home/byt/Documents/tokio/tokio/src/runtime/scheduler/current_thread/mod.rs:724:33
    |
724 | ...                   context.park(core, handle)
    |                       ^^^^^^^^^^^^^^^^^^^^^^^^^^
note: inside closure
   --> /home/byt/Documents/tokio/tokio/src/runtime/scheduler/current_thread/mod.rs:774:68
    |
774 |         let (core, ret) = context::set_scheduler(&self.context, || f(core, context));
    |                                                                    ^^^^^^^^^^^^^^^^
note: inside `tokio::runtime::context::scoped::Scoped::<tokio::runtime::scheduler::Context>::set::<{closure@tokio::runtime::scheduler::current_thread::CoreGuard<'_>::enter<{closure@tokio::runtime::scheduler::current_thread::CoreGuard<'_>::block_on<std::pin::Pin<&mut {async fn body of tokio::task::LocalSet::run_until<{async block@tokio_util::task::spawn_pinned::LocalWorkerHandle::run::{closure#0}}>()}>>::{closure#0}}, std::option::Option<()>>::{closure#0}}, (std::boxed::Box<tokio::runtime::scheduler::current_thread::Core>, std::option::Option<()>)>`
   --> /home/byt/Documents/tokio/tokio/src/runtime/context/scoped.rs:40:9
    |
40  |         f()
    |         ^^^
note: inside closure
   --> /home/byt/Documents/tokio/tokio/src/runtime/context.rs:180:26
    |
180 |         CONTEXT.with(|c| c.scheduler.set(v, f))
    |                          ^^^^^^^^^^^^^^^^^^^^^
    = note: inside `std::thread::LocalKey::<tokio::runtime::context::Context>::try_with::<{closure@tokio::runtime::context::set_scheduler<(std::boxed::Box<tokio::runtime::scheduler::current_thread::Core>, std::option::Option<()>), {closure@tokio::runtime::scheduler::current_thread::CoreGuard<'_>::enter<{closure@tokio::runtime::scheduler::current_thread::CoreGuard<'_>::block_on<std::pin::Pin<&mut {async fn body of tokio::task::LocalSet::run_until<{async block@tokio_util::task::spawn_pinned::LocalWorkerHandle::run::{closure#0}}>()}>>::{closure#0}}, std::option::Option<()>>::{closure#0}}>::{closure#0}}, (std::boxed::Box<tokio::runtime::scheduler::current_thread::Core>, std::option::Option<()>)>` at /home/byt/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/thread/local.rs:283:12: 283:27
    = note: inside `std::thread::LocalKey::<tokio::runtime::context::Context>::with::<{closure@tokio::runtime::context::set_scheduler<(std::boxed::Box<tokio::runtime::scheduler::current_thread::Core>, std::option::Option<()>), {closure@tokio::runtime::scheduler::current_thread::CoreGuard<'_>::enter<{closure@tokio::runtime::scheduler::current_thread::CoreGuard<'_>::block_on<std::pin::Pin<&mut {async fn body of tokio::task::LocalSet::run_until<{async block@tokio_util::task::spawn_pinned::LocalWorkerHandle::run::{closure#0}}>()}>>::{closure#0}}, std::option::Option<()>>::{closure#0}}>::{closure#0}}, (std::boxed::Box<tokio::runtime::scheduler::current_thread::Core>, std::option::Option<()>)>` at /home/byt/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/thread/local.rs:260:9: 260:25
note: inside `tokio::runtime::context::set_scheduler::<(std::boxed::Box<tokio::runtime::scheduler::current_thread::Core>, std::option::Option<()>), {closure@tokio::runtime::scheduler::current_thread::CoreGuard<'_>::enter<{closure@tokio::runtime::scheduler::current_thread::CoreGuard<'_>::block_on<std::pin::Pin<&mut {async fn body of tokio::task::LocalSet::run_until<{async block@tokio_util::task::spawn_pinned::LocalWorkerHandle::run::{closure#0}}>()}>>::{closure#0}}, std::option::Option<()>>::{closure#0}}>`
   --> /home/byt/Documents/tokio/tokio/src/runtime/context.rs:180:9
    |
180 |         CONTEXT.with(|c| c.scheduler.set(v, f))
    |         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
note: inside `tokio::runtime::scheduler::current_thread::CoreGuard::<'_>::enter::<{closure@tokio::runtime::scheduler::current_thread::CoreGuard<'_>::block_on<std::pin::Pin<&mut {async fn body of tokio::task::LocalSet::run_until<{async block@tokio_util::task::spawn_pinned::LocalWorkerHandle::run::{closure#0}}>()}>>::{closure#0}}, std::option::Option<()>>`
   --> /home/byt/Documents/tokio/tokio/src/runtime/scheduler/current_thread/mod.rs:774:27
    |
774 |         let (core, ret) = context::set_scheduler(&self.context, || f(core, context));
    |                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
note: inside closure
   --> /home/byt/Documents/tokio/tokio/src/runtime/scheduler/current_thread/mod.rs:191:28
    |
191 |                     return core.block_on(future);
    |                            ^^^^^^^^^^^^^^^^^^^^^
note: inside `tokio_util::task::spawn_pinned::LocalWorkerHandle::run`
   --> /home/byt/Documents/tokio/tokio-util/src/task/spawn_pinned.rs:396:9
    |
396 | /         local_set.block_on(&runtime, async {
397 | |             while let Some(spawn_task) = task_receiver.recv().await {
398 | |                 // Calls spawn_local(future)
399 | |                 (spawn_task)();
400 | |             }
401 | |         });
    | |__________^
note: inside closure
   --> /home/byt/Documents/tokio/tokio-util/src/task/spawn_pinned.rs:381:31
    |
381 |         std::thread::spawn(|| Self::run(runtime, receiver, task_count_clone));
    |                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    = note: this error originates in the macro `syscall` (in Nightly builds, run with -Z macro-backtrace for more info)

error: deadlock: the evaluated program deadlocked
  --> /home/byt/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/sys/pal/unix/futex.rs:75:21
   |
75 |                     )
   |                     ^ the evaluated program deadlocked
   |
   = note: BACKTRACE:
   = note: inside `std::sys::pal::unix::futex::futex_wait` at /home/byt/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/sys/pal/unix/futex.rs:75:21: 75:22
   = note: inside `std::sys::sync::thread_parking::futex::Parker::park` at /home/byt/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/sys/sync/thread_parking/futex.rs:55:13: 55:50
   = note: inside `std::thread::Thread::park` at /home/byt/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/thread/mod.rs:1404:18: 1404:53
   = note: inside `std::thread::park` at /home/byt/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/thread/mod.rs:1110:9: 1110:25
   = note: inside `std::sync::mpmc::context::Context::wait_until` at /home/byt/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/sync/mpmc/context.rs:138:17: 138:31
   = note: inside closure at /home/byt/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/sync/mpmc/list.rs:447:27: 447:50
   = note: inside closure at /home/byt/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/sync/mpmc/context.rs:49:13: 49:18
   = note: inside closure at /home/byt/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/sync/mpmc/context.rs:57:31: 57:37
   = note: inside `std::thread::LocalKey::<std::cell::Cell<std::option::Option<std::sync::mpmc::context::Context>>>::try_with::<{closure@std::sync::mpmc::context::Context::with<{closure@std::sync::mpmc::list::Channel<test::event::CompletedTest>::recv::{closure#1}}, ()>::{closure#1}}, ()>` at /home/byt/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/thread/local.rs:283:12: 283:27
   = note: inside `std::sync::mpmc::context::Context::with::<{closure@std::sync::mpmc::list::Channel<test::event::CompletedTest>::recv::{closure#1}}, ()>` at /home/byt/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/sync/mpmc/context.rs:52:9: 61:15
   = note: inside `std::sync::mpmc::list::Channel::<test::event::CompletedTest>::recv` at /home/byt/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/sync/mpmc/list.rs:437:13: 458:15
   = note: inside `std::sync::mpmc::Receiver::<test::event::CompletedTest>::recv` at /home/byt/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/sync/mpmc/mod.rs:308:43: 308:58
   = note: inside `std::sync::mpsc::Receiver::<test::event::CompletedTest>::recv` at /home/byt/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/sync/mpsc/mod.rs:848:9: 848:26
   = note: inside `test::run_tests::<{closure@test::run_tests_console::{closure#2}}>` at /home/byt/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/test/src/lib.rs:379:38: 379:47
   = note: inside `test::run_tests_console` at /home/byt/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/test/src/console.rs:322:5: 322:70
   = note: inside `test::test_main` at /home/byt/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/test/src/lib.rs:137:19: 137:59
   = note: inside `test::test_main_static` at /home/byt/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/test/src/lib.rs:159:5: 159:40
   = note: inside `main`

error: deadlock: the evaluated program deadlocked
   --> /home/byt/.cargo/registry/src/index.crates.io-6f17d22bba15001f/mio-1.0.2/src/sys/unix/selector/epoll.rs:56:9
    |
56  | /         syscall!(epoll_wait(
57  | |             self.ep.as_raw_fd(),
58  | |             events.as_mut_ptr(),
59  | |             events.capacity() as i32,
60  | |             timeout,
61  | |         ))
    | |__________^ the evaluated program deadlocked
    |
    = note: BACKTRACE on thread `callback_panic_`:
    = note: inside `mio::sys::unix::selector::Selector::select` at /home/byt/.cargo/registry/src/index.crates.io-6f17d22bba15001f/mio-1.0.2/src/sys/unix/mod.rs:8:48: 8:49
    = note: inside `mio::poll::Poll::poll` at /home/byt/.cargo/registry/src/index.crates.io-6f17d22bba15001f/mio-1.0.2/src/poll.rs:435:9: 435:61
note: inside `tokio::runtime::io::driver::Driver::turn`
   --> /home/byt/Documents/tokio/tokio/src/runtime/io/driver.rs:149:15
    |
149 |         match self.poll.poll(events, max_wait) {
    |               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
note: inside `tokio::runtime::io::driver::Driver::park`
   --> /home/byt/Documents/tokio/tokio/src/runtime/io/driver.rs:122:9
    |
122 |         self.turn(handle, None);
    |         ^^^^^^^^^^^^^^^^^^^^^^^
note: inside `tokio::runtime::signal::Driver::park`
   --> /home/byt/Documents/tokio/tokio/src/runtime/signal/mod.rs:92:9
    |
92  |         self.io.park(handle);
    |         ^^^^^^^^^^^^^^^^^^^^
note: inside `tokio::runtime::process::Driver::park`
   --> /home/byt/Documents/tokio/tokio/src/runtime/process.rs:32:9
    |
32  |         self.park.park(handle);
    |         ^^^^^^^^^^^^^^^^^^^^^^
note: inside `tokio::runtime::driver::IoStack::park`
   --> /home/byt/Documents/tokio/tokio/src/runtime/driver.rs:175:40
    |
175 |                 IoStack::Enabled(v) => v.park(handle),
    |                                        ^^^^^^^^^^^^^^
note: inside `tokio::runtime::time::Driver::park_internal`
   --> /home/byt/Documents/tokio/tokio/src/runtime/time/mod.rs:247:21
    |
247 |                     self.park.park(rt_handle);
    |                     ^^^^^^^^^^^^^^^^^^^^^^^^^
note: inside `tokio::runtime::time::Driver::park`
   --> /home/byt/Documents/tokio/tokio/src/runtime/time/mod.rs:173:9
    |
173 |         self.park_internal(handle, None);
    |         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
note: inside `tokio::runtime::driver::TimeDriver::park`
   --> /home/byt/Documents/tokio/tokio/src/runtime/driver.rs:332:55
    |
332 |                 TimeDriver::Enabled { driver, .. } => driver.park(handle),
    |                                                       ^^^^^^^^^^^^^^^^^^^
note: inside `tokio::runtime::driver::Driver::park`
   --> /home/byt/Documents/tokio/tokio/src/runtime/driver.rs:71:9
    |
71  |         self.inner.park(handle);
    |         ^^^^^^^^^^^^^^^^^^^^^^^
note: inside closure
   --> /home/byt/Documents/tokio/tokio/src/runtime/scheduler/current_thread/mod.rs:382:17
    |
382 |                 driver.park(&handle.driver);
    |                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^
note: inside `tokio::runtime::scheduler::current_thread::Context::enter::<(), {closure@tokio::runtime::scheduler::current_thread::Context::park::{closure#1}}>`
   --> /home/byt/Documents/tokio/tokio/src/runtime/scheduler/current_thread/mod.rs:423:19
    |
423 |         let ret = f();
    |                   ^^^
note: inside `tokio::runtime::scheduler::current_thread::Context::park`
   --> /home/byt/Documents/tokio/tokio/src/runtime/scheduler/current_thread/mod.rs:381:27
    |
381 |               let (c, ()) = self.enter(core, || {
    |  ___________________________^
382 | |                 driver.park(&handle.driver);
383 | |                 self.defer.wake();
384 | |             });
    | |______________^
note: inside closure
   --> /home/byt/Documents/tokio/tokio/src/runtime/scheduler/current_thread/mod.rs:724:33
    |
724 | ...                   context.park(core, handle)
    |                       ^^^^^^^^^^^^^^^^^^^^^^^^^^
note: inside closure
   --> /home/byt/Documents/tokio/tokio/src/runtime/scheduler/current_thread/mod.rs:774:68
    |
774 |         let (core, ret) = context::set_scheduler(&self.context, || f(core, context));
    |                                                                    ^^^^^^^^^^^^^^^^
note: inside `tokio::runtime::context::scoped::Scoped::<tokio::runtime::scheduler::Context>::set::<{closure@tokio::runtime::scheduler::current_thread::CoreGuard<'_>::enter<{closure@tokio::runtime::scheduler::current_thread::CoreGuard<'_>::block_on<std::pin::Pin<&mut std::pin::Pin<&mut dyn futures::Future<Output = ()>>>>::{closure#0}}, std::option::Option<()>>::{closure#0}}, (std::boxed::Box<tokio::runtime::scheduler::current_thread::Core>, std::option::Option<()>)>`
   --> /home/byt/Documents/tokio/tokio/src/runtime/context/scoped.rs:40:9
    |
40  |         f()
    |         ^^^
note: inside closure
   --> /home/byt/Documents/tokio/tokio/src/runtime/context.rs:180:26
    |
180 |         CONTEXT.with(|c| c.scheduler.set(v, f))
    |                          ^^^^^^^^^^^^^^^^^^^^^
    = note: inside `std::thread::LocalKey::<tokio::runtime::context::Context>::try_with::<{closure@tokio::runtime::context::set_scheduler<(std::boxed::Box<tokio::runtime::scheduler::current_thread::Core>, std::option::Option<()>), {closure@tokio::runtime::scheduler::current_thread::CoreGuard<'_>::enter<{closure@tokio::runtime::scheduler::current_thread::CoreGuard<'_>::block_on<std::pin::Pin<&mut std::pin::Pin<&mut dyn futures::Future<Output = ()>>>>::{closure#0}}, std::option::Option<()>>::{closure#0}}>::{closure#0}}, (std::boxed::Box<tokio::runtime::scheduler::current_thread::Core>, std::option::Option<()>)>` at /home/byt/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/thread/local.rs:283:12: 283:27
    = note: inside `std::thread::LocalKey::<tokio::runtime::context::Context>::with::<{closure@tokio::runtime::context::set_scheduler<(std::boxed::Box<tokio::runtime::scheduler::current_thread::Core>, std::option::Option<()>), {closure@tokio::runtime::scheduler::current_thread::CoreGuard<'_>::enter<{closure@tokio::runtime::scheduler::current_thread::CoreGuard<'_>::block_on<std::pin::Pin<&mut std::pin::Pin<&mut dyn futures::Future<Output = ()>>>>::{closure#0}}, std::option::Option<()>>::{closure#0}}>::{closure#0}}, (std::boxed::Box<tokio::runtime::scheduler::current_thread::Core>, std::option::Option<()>)>` at /home/byt/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/thread/local.rs:260:9: 260:25
note: inside `tokio::runtime::context::set_scheduler::<(std::boxed::Box<tokio::runtime::scheduler::current_thread::Core>, std::option::Option<()>), {closure@tokio::runtime::scheduler::current_thread::CoreGuard<'_>::enter<{closure@tokio::runtime::scheduler::current_thread::CoreGuard<'_>::block_on<std::pin::Pin<&mut std::pin::Pin<&mut dyn futures::Future<Output = ()>>>>::{closure#0}}, std::option::Option<()>>::{closure#0}}>`
   --> /home/byt/Documents/tokio/tokio/src/runtime/context.rs:180:9
    |
180 |         CONTEXT.with(|c| c.scheduler.set(v, f))
    |         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
note: inside `tokio::runtime::scheduler::current_thread::CoreGuard::<'_>::enter::<{closure@tokio::runtime::scheduler::current_thread::CoreGuard<'_>::block_on<std::pin::Pin<&mut std::pin::Pin<&mut dyn futures::Future<Output = ()>>>>::{closure#0}}, std::option::Option<()>>`
   --> /home/byt/Documents/tokio/tokio/src/runtime/scheduler/current_thread/mod.rs:774:27
    |
774 |         let (core, ret) = context::set_scheduler(&self.context, || f(core, context));
    |                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
note: inside closure
   --> /home/byt/Documents/tokio/tokio/src/runtime/scheduler/current_thread/mod.rs:191:28
    |
191 |                     return core.block_on(future);
    |                            ^^^^^^^^^^^^^^^^^^^^^
note: inside `callback_panic_does_not_kill_worker`
   --> tokio-util/tests/spawn_pinned.rs:120:5
    |
120 |     assert_eq!(result.unwrap(), "test");
    |     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
note: inside closure
   --> tokio-util/tests/spawn_pinned.rs:100:47
    |
98  | #[tokio::test]
    | -------------- in this procedural macro expansion
99  | #[cfg(panic = "unwind")]
100 | async fn callback_panic_does_not_kill_worker() {
    |                                               ^
    = note: this error originates in the macro `syscall` which comes from the expansion of the attribute macro `tokio::test` (in Nightly builds, run with -Z macro-backtrace for more info)

note: some details are omitted, run with `MIRIFLAGS=-Zmiri-backtrace=full` for a verbose backtrace

error: aborting due to 3 previous errors

error: test failed, to rerun pass `-p tokio-util --test spawn_pinned`

Caused by:
  process didn't exit successfully: `/home/byt/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/bin/cargo-miri runner /home/byt/Documents/tokio/target/miri/x86_64-unknown-linux-gnu/debug/deps/spawn_pinned-5d9d1b31504e33c5` (exit status: 1)
note: test exited abnormally; to see the full output pass --nocapture to the harness.
@tiif
Copy link
Contributor Author

tiif commented Sep 24, 2024

Currently looking into this,

@rustbot claim

@FrankReh
Copy link
Contributor

I know you are still working on epoll so thought I would just ask this here. Last month this code was added to epoll_create1. It calls Epoll:default() a second time in the function and then sets its ready_list to what I would think is the default anyway and then it lets the variable go out of scope without further use. Maybe a merge artifact?

let mut epoll_instance = Epoll::default();
epoll_instance.ready_list = Rc::new(RefCell::new(BTreeMap::new()));
let fd = this.machine.fds.insert_new(Epoll::default());
Ok(Scalar::from_i32(fd))

Apologies if I am misreading something. And for using this issue thread. This would not be the cause of a deadlock.

@oli-obk
Copy link
Contributor

oli-obk commented Sep 25, 2024

Huh yea, that looks like a rebase artifact, good find. the first two lines should just not exist anymore at all

@tiif
Copy link
Contributor Author

tiif commented Sep 25, 2024

Wow, nice catch!

@tiif
Copy link
Contributor Author

tiif commented Sep 26, 2024

Reproducible:

use tokio_util::task;

#[tokio::main]
async fn main() {
    let pool = task::LocalPoolHandle::new(1);

    pool.spawn_pinned(|| async {}).await;

    pool.spawn_pinned(|| async {}).await;
}
full trace
error: deadlock: the evaluated program deadlocked
  --> /home/byt/.cargo/registry/src/index.crates.io-6f17d22bba15001f/mio-1.0.2/src/sys/unix/selector/epoll.rs:56:9
   |
56 | /         syscall!(epoll_wait(
57 | |             self.ep.as_raw_fd(),
58 | |             events.as_mut_ptr(),
59 | |             events.capacity() as i32,
60 | |             timeout,
61 | |         ))
   | |__________^ the evaluated program deadlocked
   |
   = note: BACKTRACE on thread `unnamed-2`:
   = note: inside `mio::sys::unix::selector::Selector::select` at /home/byt/.cargo/registry/src/index.crates.io-6f17d22bba15001f/mio-1.0.2/src/sys/unix/mod.rs:8:48: 8:49
   = note: inside `mio::poll::Poll::poll` at /home/byt/.cargo/registry/src/index.crates.io-6f17d22bba15001f/mio-1.0.2/src/poll.rs:435:9: 435:61
   = note: inside `tokio::runtime::io::driver::Driver::turn` at /home/byt/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/io/driver.rs:149:15: 149:47
   = note: inside `tokio::runtime::io::driver::Driver::park` at /home/byt/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/io/driver.rs:122:9: 122:32
   = note: inside `tokio::runtime::signal::Driver::park` at /home/byt/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/signal/mod.rs:92:9: 92:29
   = note: inside `tokio::runtime::process::Driver::park` at /home/byt/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/process.rs:32:9: 32:31
   = note: inside `tokio::runtime::driver::IoStack::park` at /home/byt/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/driver.rs:175:40: 175:54
   = note: inside `tokio::runtime::time::Driver::park_internal` at /home/byt/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/time/mod.rs:247:21: 247:46
   = note: inside `tokio::runtime::time::Driver::park` at /home/byt/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/time/mod.rs:173:9: 173:41
   = note: inside `tokio::runtime::driver::TimeDriver::park` at /home/byt/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/driver.rs:332:55: 332:74
   = note: inside `tokio::runtime::driver::Driver::park` at /home/byt/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/driver.rs:71:9: 71:32
   = note: inside closure at /home/byt/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/scheduler/current_thread/mod.rs:382:17: 382:44
   = note: inside `tokio::runtime::scheduler::current_thread::Context::enter::<(), {closure@tokio::runtime::scheduler::current_thread::Context::park::{closure#1}}>` at /home/byt/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/scheduler/current_thread/mod.rs:423:19: 423:22
   = note: inside `tokio::runtime::scheduler::current_thread::Context::park` at /home/byt/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/scheduler/current_thread/mod.rs:381:27: 384:15
   = note: inside closure at /home/byt/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/scheduler/current_thread/mod.rs:724:33: 724:59
   = note: inside closure at /home/byt/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/scheduler/current_thread/mod.rs:774:68: 774:84
   = note: inside `tokio::runtime::context::scoped::Scoped::<tokio::runtime::scheduler::Context>::set::<{closure@tokio::runtime::scheduler::current_thread::CoreGuard<'_>::enter<{closure@tokio::runtime::scheduler::current_thread::CoreGuard<'_>::block_on<std::pin::Pin<&mut {async fn body of tokio::task::LocalSet::run_until<{async block@tokio_util::task::spawn_pinned::LocalWorkerHandle::run::{closure#0}}>()}>>::{closure#0}}, std::option::Option<()>>::{closure#0}}, (std::boxed::Box<tokio::runtime::scheduler::current_thread::Core>, std::option::Option<()>)>` at /home/byt/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/context/scoped.rs:40:9: 40:12
   = note: inside closure at /home/byt/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/context.rs:180:26: 180:47
   = note: inside `std::thread::LocalKey::<tokio::runtime::context::Context>::try_with::<{closure@tokio::runtime::context::set_scheduler<(std::boxed::Box<tokio::runtime::scheduler::current_thread::Core>, std::option::Option<()>), {closure@tokio::runtime::scheduler::current_thread::CoreGuard<'_>::enter<{closure@tokio::runtime::scheduler::current_thread::CoreGuard<'_>::block_on<std::pin::Pin<&mut {async fn body of tokio::task::LocalSet::run_until<{async block@tokio_util::task::spawn_pinned::LocalWorkerHandle::run::{closure#0}}>()}>>::{closure#0}}, std::option::Option<()>>::{closure#0}}>::{closure#0}}, (std::boxed::Box<tokio::runtime::scheduler::current_thread::Core>, std::option::Option<()>)>` at /home/byt/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/thread/local.rs:283:12: 283:27
   = note: inside `std::thread::LocalKey::<tokio::runtime::context::Context>::with::<{closure@tokio::runtime::context::set_scheduler<(std::boxed::Box<tokio::runtime::scheduler::current_thread::Core>, std::option::Option<()>), {closure@tokio::runtime::scheduler::current_thread::CoreGuard<'_>::enter<{closure@tokio::runtime::scheduler::current_thread::CoreGuard<'_>::block_on<std::pin::Pin<&mut {async fn body of tokio::task::LocalSet::run_until<{async block@tokio_util::task::spawn_pinned::LocalWorkerHandle::run::{closure#0}}>()}>>::{closure#0}}, std::option::Option<()>>::{closure#0}}>::{closure#0}}, (std::boxed::Box<tokio::runtime::scheduler::current_thread::Core>, std::option::Option<()>)>` at /home/byt/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/thread/local.rs:260:9: 260:25
   = note: inside `tokio::runtime::context::set_scheduler::<(std::boxed::Box<tokio::runtime::scheduler::current_thread::Core>, std::option::Option<()>), {closure@tokio::runtime::scheduler::current_thread::CoreGuard<'_>::enter<{closure@tokio::runtime::scheduler::current_thread::CoreGuard<'_>::block_on<std::pin::Pin<&mut {async fn body of tokio::task::LocalSet::run_until<{async block@tokio_util::task::spawn_pinned::LocalWorkerHandle::run::{closure#0}}>()}>>::{closure#0}}, std::option::Option<()>>::{closure#0}}>` at /home/byt/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/context.rs:180:9: 180:48
   = note: inside `tokio::runtime::scheduler::current_thread::CoreGuard::<'_>::enter::<{closure@tokio::runtime::scheduler::current_thread::CoreGuard<'_>::block_on<std::pin::Pin<&mut {async fn body of tokio::task::LocalSet::run_until<{async block@tokio_util::task::spawn_pinned::LocalWorkerHandle::run::{closure#0}}>()}>>::{closure#0}}, std::option::Option<()>>` at /home/byt/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/scheduler/current_thread/mod.rs:774:27: 774:85
   = note: inside `tokio::runtime::scheduler::current_thread::CoreGuard::<'_>::block_on::<std::pin::Pin<&mut {async fn body of tokio::task::LocalSet::run_until<{async block@tokio_util::task::spawn_pinned::LocalWorkerHandle::run::{closure#0}}>()}>>` at /home/byt/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/scheduler/current_thread/mod.rs:683:19: 751:11
   = note: inside closure at /home/byt/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/scheduler/current_thread/mod.rs:191:28: 191:49
   = note: inside `tokio::runtime::context::runtime::enter_runtime::<{closure@tokio::runtime::scheduler::current_thread::CurrentThread::block_on<{async fn body of tokio::task::LocalSet::run_until<{async block@tokio_util::task::spawn_pinned::LocalWorkerHandle::run::{closure#0}}>()}>::{closure#0}}, ()>` at /home/byt/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/context/runtime.rs:65:16: 65:38
   = note: inside `tokio::runtime::scheduler::current_thread::CurrentThread::block_on::<{async fn body of tokio::task::LocalSet::run_until<{async block@tokio_util::task::spawn_pinned::LocalWorkerHandle::run::{closure#0}}>()}>` at /home/byt/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/scheduler/current_thread/mod.rs:179:9: 214:11
   = note: inside `tokio::runtime::Runtime::block_on_inner::<{async fn body of tokio::task::LocalSet::run_until<{async block@tokio_util::task::spawn_pinned::LocalWorkerHandle::run::{closure#0}}>()}>` at /home/byt/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/runtime.rs:361:47: 361:88
   = note: inside `tokio::runtime::Runtime::block_on::<{async fn body of tokio::task::LocalSet::run_until<{async block@tokio_util::task::spawn_pinned::LocalWorkerHandle::run::{closure#0}}>()}>` at /home/byt/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/runtime.rs:335:13: 335:40
   = note: inside `tokio::task::LocalSet::block_on::<{async block@tokio_util::task::spawn_pinned::LocalWorkerHandle::run::{closure#0}}>` at /home/byt/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/task/local.rs:596:9: 596:44
   = note: inside `tokio_util::task::spawn_pinned::LocalWorkerHandle::run` at /home/byt/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-util-0.7.12/src/task/spawn_pinned.rs:396:9: 401:11
   = note: inside closure at /home/byt/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-util-0.7.12/src/task/spawn_pinned.rs:381:31: 381:77
   = note: inside `std::sys::backtrace::__rust_begin_short_backtrace::<{closure@tokio_util::task::spawn_pinned::LocalWorkerHandle::new_worker::{closure#0}}, ()>` at /home/byt/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/sys/backtrace.rs:154:18: 154:21
   = note: inside closure at /home/byt/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/thread/mod.rs:522:17: 522:71
   = note: inside `<std::panic::AssertUnwindSafe<{closure@std::thread::Builder::spawn_unchecked_<'_, {closure@tokio_util::task::spawn_pinned::LocalWorkerHandle::new_worker::{closure#0}}, ()>::{closure#1}::{closure#0}}> as std::ops::FnOnce<()>>::call_once` at /home/byt/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/panic/unwind_safe.rs:272:9: 272:19
   = note: inside `std::panicking::r#try::do_call::<std::panic::AssertUnwindSafe<{closure@std::thread::Builder::spawn_unchecked_<'_, {closure@tokio_util::task::spawn_pinned::LocalWorkerHandle::new_worker::{closure#0}}, ()>::{closure#1}::{closure#0}}>, ()>` at /home/byt/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panicking.rs:557:40: 557:43
   = note: inside `std::panicking::r#try::<(), std::panic::AssertUnwindSafe<{closure@std::thread::Builder::spawn_unchecked_<'_, {closure@tokio_util::task::spawn_pinned::LocalWorkerHandle::new_worker::{closure#0}}, ()>::{closure#1}::{closure#0}}>>` at /home/byt/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panicking.rs:520:19: 520:88
   = note: inside `std::panic::catch_unwind::<std::panic::AssertUnwindSafe<{closure@std::thread::Builder::spawn_unchecked_<'_, {closure@tokio_util::task::spawn_pinned::LocalWorkerHandle::new_worker::{closure#0}}, ()>::{closure#1}::{closure#0}}>, ()>` at /home/byt/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panic.rs:348:14: 348:33
   = note: inside closure at /home/byt/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/thread/mod.rs:521:30: 523:16
   = note: inside `<{closure@std::thread::Builder::spawn_unchecked_<'_, {closure@tokio_util::task::spawn_pinned::LocalWorkerHandle::new_worker::{closure#0}}, ()>::{closure#1}} as std::ops::FnOnce<()>>::call_once - shim(vtable)` at /home/byt/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ops/function.rs:250:5: 250:71
   = note: inside `<std::boxed::Box<dyn std::ops::FnOnce()> as std::ops::FnOnce<()>>::call_once` at /home/byt/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/boxed.rs:2453:9: 2453:52
   = note: inside `<std::boxed::Box<std::boxed::Box<dyn std::ops::FnOnce()>> as std::ops::FnOnce<()>>::call_once` at /home/byt/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/boxed.rs:2453:9: 2453:52
   = note: inside `std::sys::pal::unix::thread::Thread::new::thread_start` at /home/byt/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/sys/pal/unix/thread.rs:105:17: 105:64
   = note: this error originates in the macro `syscall` (in Nightly builds, run with -Z macro-backtrace for more info)

error: deadlock: the evaluated program deadlocked
  --> /home/byt/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/sys/pal/unix/futex.rs:75:21
   |
75 |                     )
   |                     ^ the evaluated program deadlocked
   |
   = note: BACKTRACE:
   = note: inside `std::sys::pal::unix::futex::futex_wait` at /home/byt/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/sys/pal/unix/futex.rs:75:21: 75:22
   = note: inside `std::sys::sync::condvar::futex::Condvar::wait_optional_timeout` at /home/byt/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/sys/sync/condvar/futex.rs:50:17: 50:62
   = note: inside `std::sys::sync::condvar::futex::Condvar::wait` at /home/byt/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/sys/sync/condvar/futex.rs:34:9: 34:48
   = note: inside `std::sync::Condvar::wait::<()>` at /home/byt/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/sync/condvar.rs:192:13: 192:34
   = note: inside `tokio::runtime::park::Inner::park` at /home/byt/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/park.rs:116:17: 116:37
   = note: inside closure at /home/byt/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/park.rs:254:41: 254:65
   = note: inside closure at /home/byt/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/park.rs:268:41: 268:49
   = note: inside `std::thread::LocalKey::<tokio::runtime::park::ParkThread>::try_with::<{closure@tokio::runtime::park::CachedParkThread::with_current<{closure@tokio::runtime::park::CachedParkThread::park::{closure#0}}, ()>::{closure#0}}, ()>` at /home/byt/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/thread/local.rs:283:12: 283:27
   = note: inside `tokio::runtime::park::CachedParkThread::with_current::<{closure@tokio::runtime::park::CachedParkThread::park::{closure#0}}, ()>` at /home/byt/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/park.rs:268:9: 268:50
   = note: inside `tokio::runtime::park::CachedParkThread::park` at /home/byt/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/park.rs:254:9: 254:66
   = note: inside `tokio::runtime::park::CachedParkThread::block_on::<{async block@src/main.rs:3:1: 3:15}>` at /home/byt/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/park.rs:285:13: 285:24
   = note: inside `tokio::runtime::context::blocking::BlockingRegionGuard::block_on::<{async block@src/main.rs:3:1: 3:15}>` at /home/byt/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/context/blocking.rs:66:9: 66:25
   = note: inside closure at /home/byt/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/scheduler/multi_thread/mod.rs:87:13: 87:38
   = note: inside `tokio::runtime::scheduler::multi_thread::MultiThread::block_on::<{async block@src/main.rs:3:1: 3:15}>` at /home/byt/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/scheduler/multi_thread/mod.rs:86:9: 88:11
note: inside `main`
  --> src/main.rs:9:5
   |
9  |     pool.spawn_pinned(|| async {}).await;
   |     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

error: deadlock: the evaluated program deadlocked
  --> /home/byt/.cargo/registry/src/index.crates.io-6f17d22bba15001f/mio-1.0.2/src/sys/unix/selector/epoll.rs:56:9
   |
56 | /         syscall!(epoll_wait(
57 | |             self.ep.as_raw_fd(),
58 | |             events.as_mut_ptr(),
59 | |             events.capacity() as i32,
60 | |             timeout,
61 | |         ))
   | |__________^ the evaluated program deadlocked
   |
   = note: BACKTRACE on thread `tokio-runtime-w`:
   = note: inside `mio::sys::unix::selector::Selector::select` at /home/byt/.cargo/registry/src/index.crates.io-6f17d22bba15001f/mio-1.0.2/src/sys/unix/mod.rs:8:48: 8:49
   = note: inside `mio::poll::Poll::poll` at /home/byt/.cargo/registry/src/index.crates.io-6f17d22bba15001f/mio-1.0.2/src/poll.rs:435:9: 435:61
   = note: inside `tokio::runtime::io::driver::Driver::turn` at /home/byt/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/io/driver.rs:149:15: 149:47
   = note: inside `tokio::runtime::io::driver::Driver::park` at /home/byt/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/io/driver.rs:122:9: 122:32
   = note: inside `tokio::runtime::signal::Driver::park` at /home/byt/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/signal/mod.rs:92:9: 92:29
   = note: inside `tokio::runtime::process::Driver::park` at /home/byt/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/process.rs:32:9: 32:31
   = note: inside `tokio::runtime::driver::IoStack::park` at /home/byt/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/driver.rs:175:40: 175:54
   = note: inside `tokio::runtime::time::Driver::park_internal` at /home/byt/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/time/mod.rs:247:21: 247:46
   = note: inside `tokio::runtime::time::Driver::park` at /home/byt/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/time/mod.rs:173:9: 173:41
   = note: inside `tokio::runtime::driver::TimeDriver::park` at /home/byt/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/driver.rs:332:55: 332:74
   = note: inside `tokio::runtime::driver::Driver::park` at /home/byt/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/driver.rs:71:9: 71:32
   = note: inside `tokio::runtime::scheduler::multi_thread::park::Inner::park_driver` at /home/byt/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/scheduler/multi_thread/park.rs:194:9: 194:28
   = note: inside `tokio::runtime::scheduler::multi_thread::park::Inner::park` at /home/byt/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/scheduler/multi_thread/park.rs:127:13: 127:50
   = note: inside `tokio::runtime::scheduler::multi_thread::park::Parker::park` at /home/byt/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/scheduler/multi_thread/park.rs:70:9: 70:32
   = note: inside `tokio::runtime::scheduler::multi_thread::worker::Context::park_timeout` at /home/byt/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/scheduler/multi_thread/worker.rs:750:13: 750:50
   = note: inside `tokio::runtime::scheduler::multi_thread::worker::Context::park` at /home/byt/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/scheduler/multi_thread/worker.rs:718:24: 718:53
   = note: inside `tokio::runtime::scheduler::multi_thread::worker::Context::run` at /home/byt/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/scheduler/multi_thread/worker.rs:566:21: 566:36
   = note: inside closure at /home/byt/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/scheduler/multi_thread/worker.rs:513:21: 513:33
   = note: inside `tokio::runtime::context::scoped::Scoped::<tokio::runtime::scheduler::Context>::set::<{closure@tokio::runtime::scheduler::multi_thread::worker::run::{closure#0}::{closure#0}}, ()>` at /home/byt/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/context/scoped.rs:40:9: 40:12
   = note: inside closure at /home/byt/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/context.rs:180:26: 180:47
   = note: inside `std::thread::LocalKey::<tokio::runtime::context::Context>::try_with::<{closure@tokio::runtime::context::set_scheduler<(), {closure@tokio::runtime::scheduler::multi_thread::worker::run::{closure#0}::{closure#0}}>::{closure#0}}, ()>` at /home/byt/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/thread/local.rs:283:12: 283:27
   = note: inside `std::thread::LocalKey::<tokio::runtime::context::Context>::with::<{closure@tokio::runtime::context::set_scheduler<(), {closure@tokio::runtime::scheduler::multi_thread::worker::run::{closure#0}::{closure#0}}>::{closure#0}}, ()>` at /home/byt/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/thread/local.rs:260:9: 260:25
   = note: inside `tokio::runtime::context::set_scheduler::<(), {closure@tokio::runtime::scheduler::multi_thread::worker::run::{closure#0}::{closure#0}}>` at /home/byt/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/context.rs:180:9: 180:48
   = note: inside closure at /home/byt/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/scheduler/multi_thread/worker.rs:508:9: 519:11
   = note: inside `tokio::runtime::context::runtime::enter_runtime::<{closure@tokio::runtime::scheduler::multi_thread::worker::run::{closure#0}}, ()>` at /home/byt/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/context/runtime.rs:65:16: 65:38
   = note: inside `tokio::runtime::scheduler::multi_thread::worker::run` at /home/byt/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/scheduler/multi_thread/worker.rs:500:5: 520:7
   = note: inside closure at /home/byt/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/scheduler/multi_thread/worker.rs:466:45: 466:56
   = note: inside `<tokio::runtime::blocking::task::BlockingTask<{closure@tokio::runtime::scheduler::multi_thread::worker::Launch::launch::{closure#0}}> as std::future::Future>::poll` at /home/byt/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/blocking/task.rs:42:21: 42:27
   = note: inside closure at /home/byt/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/task/core.rs:331:17: 331:37
   = note: inside `tokio::loom::std::unsafe_cell::UnsafeCell::<tokio::runtime::task::core::Stage<tokio::runtime::blocking::task::BlockingTask<{closure@tokio::runtime::scheduler::multi_thread::worker::Launch::launch::{closure#0}}>>>::with_mut::<std::task::Poll<()>, {closure@tokio::runtime::task::core::Core<tokio::runtime::blocking::task::BlockingTask<{closure@tokio::runtime::scheduler::multi_thread::worker::Launch::launch::{closure#0}}>, tokio::runtime::blocking::schedule::BlockingSchedule>::poll::{closure#0}}>` at /home/byt/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/loom/std/unsafe_cell.rs:16:9: 16:24
   = note: inside `tokio::runtime::task::core::Core::<tokio::runtime::blocking::task::BlockingTask<{closure@tokio::runtime::scheduler::multi_thread::worker::Launch::launch::{closure#0}}>, tokio::runtime::blocking::schedule::BlockingSchedule>::poll` at /home/byt/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/task/core.rs:320:13: 332:15
   = note: inside closure at /home/byt/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/task/harness.rs:500:19: 500:38
   = note: inside `<std::panic::AssertUnwindSafe<{closure@tokio::runtime::task::harness::poll_future<tokio::runtime::blocking::task::BlockingTask<{closure@tokio::runtime::scheduler::multi_thread::worker::Launch::launch::{closure#0}}>, tokio::runtime::blocking::schedule::BlockingSchedule>::{closure#0}}> as std::ops::FnOnce<()>>::call_once` at /home/byt/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/panic/unwind_safe.rs:272:9: 272:19
   = note: inside `std::panicking::r#try::do_call::<std::panic::AssertUnwindSafe<{closure@tokio::runtime::task::harness::poll_future<tokio::runtime::blocking::task::BlockingTask<{closure@tokio::runtime::scheduler::multi_thread::worker::Launch::launch::{closure#0}}>, tokio::runtime::blocking::schedule::BlockingSchedule>::{closure#0}}>, std::task::Poll<()>>` at /home/byt/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panicking.rs:557:40: 557:43
   = note: inside `std::panicking::r#try::<std::task::Poll<()>, std::panic::AssertUnwindSafe<{closure@tokio::runtime::task::harness::poll_future<tokio::runtime::blocking::task::BlockingTask<{closure@tokio::runtime::scheduler::multi_thread::worker::Launch::launch::{closure#0}}>, tokio::runtime::blocking::schedule::BlockingSchedule>::{closure#0}}>>` at /home/byt/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panicking.rs:520:19: 520:88
   = note: inside `std::panic::catch_unwind::<std::panic::AssertUnwindSafe<{closure@tokio::runtime::task::harness::poll_future<tokio::runtime::blocking::task::BlockingTask<{closure@tokio::runtime::scheduler::multi_thread::worker::Launch::launch::{closure#0}}>, tokio::runtime::blocking::schedule::BlockingSchedule>::{closure#0}}>, std::task::Poll<()>>` at /home/byt/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panic.rs:348:14: 348:33
   = note: inside `tokio::runtime::task::harness::poll_future::<tokio::runtime::blocking::task::BlockingTask<{closure@tokio::runtime::scheduler::multi_thread::worker::Launch::launch::{closure#0}}>, tokio::runtime::blocking::schedule::BlockingSchedule>` at /home/byt/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/task/harness.rs:488:18: 503:8
   = note: inside `tokio::runtime::task::harness::Harness::<tokio::runtime::blocking::task::BlockingTask<{closure@tokio::runtime::scheduler::multi_thread::worker::Launch::launch::{closure#0}}>, tokio::runtime::blocking::schedule::BlockingSchedule>::poll_inner` at /home/byt/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/task/harness.rs:209:27: 209:55
   = note: inside `tokio::runtime::task::harness::Harness::<tokio::runtime::blocking::task::BlockingTask<{closure@tokio::runtime::scheduler::multi_thread::worker::Launch::launch::{closure#0}}>, tokio::runtime::blocking::schedule::BlockingSchedule>::poll` at /home/byt/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/task/harness.rs:154:15: 154:32
   = note: inside `tokio::runtime::task::raw::poll::<tokio::runtime::blocking::task::BlockingTask<{closure@tokio::runtime::scheduler::multi_thread::worker::Launch::launch::{closure#0}}>, tokio::runtime::blocking::schedule::BlockingSchedule>` at /home/byt/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/task/raw.rs:271:5: 271:19
   = note: inside `tokio::runtime::task::raw::RawTask::poll` at /home/byt/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/task/raw.rs:201:18: 201:41
   = note: inside `tokio::runtime::task::UnownedTask::<tokio::runtime::blocking::schedule::BlockingSchedule>::run` at /home/byt/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/task/mod.rs:473:9: 473:19
   = note: inside `tokio::runtime::blocking::pool::Task::run` at /home/byt/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/blocking/pool.rs:160:9: 160:24
   = note: inside `tokio::runtime::blocking::pool::Inner::run` at /home/byt/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/blocking/pool.rs:518:17: 518:27
   = note: inside closure at /home/byt/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/blocking/pool.rs:476:13: 476:54
   = note: inside `std::sys::backtrace::__rust_begin_short_backtrace::<{closure@tokio::runtime::blocking::pool::Spawner::spawn_thread::{closure#0}}, ()>` at /home/byt/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/sys/backtrace.rs:154:18: 154:21
   = note: inside closure at /home/byt/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/thread/mod.rs:522:17: 522:71
   = note: inside `<std::panic::AssertUnwindSafe<{closure@std::thread::Builder::spawn_unchecked_<'_, {closure@tokio::runtime::blocking::pool::Spawner::spawn_thread::{closure#0}}, ()>::{closure#1}::{closure#0}}> as std::ops::FnOnce<()>>::call_once` at /home/byt/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/panic/unwind_safe.rs:272:9: 272:19
   = note: inside `std::panicking::r#try::do_call::<std::panic::AssertUnwindSafe<{closure@std::thread::Builder::spawn_unchecked_<'_, {closure@tokio::runtime::blocking::pool::Spawner::spawn_thread::{closure#0}}, ()>::{closure#1}::{closure#0}}>, ()>` at /home/byt/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panicking.rs:557:40: 557:43
   = note: inside `std::panicking::r#try::<(), std::panic::AssertUnwindSafe<{closure@std::thread::Builder::spawn_unchecked_<'_, {closure@tokio::runtime::blocking::pool::Spawner::spawn_thread::{closure#0}}, ()>::{closure#1}::{closure#0}}>>` at /home/byt/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panicking.rs:520:19: 520:88
   = note: inside `std::panic::catch_unwind::<std::panic::AssertUnwindSafe<{closure@std::thread::Builder::spawn_unchecked_<'_, {closure@tokio::runtime::blocking::pool::Spawner::spawn_thread::{closure#0}}, ()>::{closure#1}::{closure#0}}>, ()>` at /home/byt/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panic.rs:348:14: 348:33
   = note: inside closure at /home/byt/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/thread/mod.rs:521:30: 523:16
   = note: inside `<{closure@std::thread::Builder::spawn_unchecked_<'_, {closure@tokio::runtime::blocking::pool::Spawner::spawn_thread::{closure#0}}, ()>::{closure#1}} as std::ops::FnOnce<()>>::call_once - shim(vtable)` at /home/byt/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ops/function.rs:250:5: 250:71
   = note: inside `<std::boxed::Box<dyn std::ops::FnOnce()> as std::ops::FnOnce<()>>::call_once` at /home/byt/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/boxed.rs:2453:9: 2453:52
   = note: inside `<std::boxed::Box<std::boxed::Box<dyn std::ops::FnOnce()>> as std::ops::FnOnce<()>>::call_once` at /home/byt/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/boxed.rs:2453:9: 2453:52
   = note: inside `std::sys::pal::unix::thread::Thread::new::thread_start` at /home/byt/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/sys/pal/unix/thread.rs:105:17: 105:64
   = note: this error originates in the macro `syscall` (in Nightly builds, run with -Z macro-backtrace for more info)

note: some details are omitted, run with `MIRIFLAGS=-Zmiri-backtrace=full` for a verbose backtrace

error: aborting due to 3 previous errors; 2 warnings emitted

@tiif
Copy link
Contributor Author

tiif commented Sep 26, 2024

Current progress:

As miri reported deadlock on both epoll and futex, this might be an issue with eitherepoll, futex, or both.

Also, when running this test with --many-seeds, not every seeds reported deadlock. So @oli-obk and I are unsure if this could potentially be an actual deadlock.

@Darksonn do you have any idea on what could possibly happen in the reproducible?

@Darksonn
Copy link

I took a quick look but wasn't able to see the issue. But be aware that the test has two runtimes, so you have two epoll instances in that test. Not sure what effect that has.

@FrankReh
Copy link
Contributor

The reproducer is helpful. But it can be simplified. The tokio function only needs to be called once, but it took about twice as many seeds before the deadlock was reported. In my build, a seed value of 9 triggered the problem. Found that, starting from 1, so eight had passed.

@oli-obk
Copy link
Contributor

oli-obk commented Sep 27, 2024

Nice! I think we can start minimizing then by manually inlining aggressively

@FrankReh
Copy link
Contributor

Maybe this helps understand the failing code a little better. There are three threads being reported as deadlocked. I think this means there are three threads with no chance of making further progress in the simulator. Likely they aren't actually deadlocked with each other. But the design would have called for a clean exit on the main thread after the nop future was polled on the worker thread.

So it appears an event was lost or an event didn't get created.

Of the three threads, two are blocked on epoll, and one is blocked on a futex.

The two epolls being blocked on are actually in different tokio runtimes. There is the multi_thread tokio runtime/scheduler that was responsible for the futex blocked thread and the main thread. And there is a current_thread (aka single threaded tokio) scheduler that is also blocked on an epoll.

The output I'm looking at shows the three thread backtraces, I'll number then 1, 2 and 3, in the order they are printed.

1 - shows a thread name of "unnamed-2" and is the current_thread tokio blocked on epoll.

2 - shows no thread name, is blocked on a futex and condvar, and deeper down the stack shows is part of the tokio multi_thread runtime.

3 - shows a thread name of "tokio-runtime-w", and is the multi-thread runtime blocked on epoll.

I'm going to go out on a limb and guess that thread 1 above is the sole worker thread that was created for the pool and it is in its normal state of having nothing to do so it waits. The logic in Miri should have signaled that the work it was given has completed.

@RalfJung
Copy link
Member

shows no thread name,

That means it is the main thread.

Miri doesn't know which thread is blocked "on which other thread", so when it reports a deadlock that means all threads are blocked on something, but the actual deadlocked cycle might not involve all the mentioned threads.

@FrankReh
Copy link
Contributor

FrankReh commented Sep 27, 2024

But maybe deadlock is a misnomer. I take deadlock to mean a mutual embrace, while a simulator like Miri is probably only reporting that there is no simulated thread able to make further progress. None is technically blocked waiting for another.

And to my naive eye, it is interesting that the main thread is reported blocked on the futex. I always thought that tokio would run its scheduler on the main thread, but probably that was just a bad assumption.

@RalfJung
Copy link
Member

I think deadlock is the correct term. If no thread can make progress, that means there's a subset of threads that are in a dependency cycle.

@FrankReh
Copy link
Contributor

The thread blocked on the futex is interesting.

I suspect it was started by the multi_thread scheduler to watch for when the worker thread wants to signal that it completed the work given to it.

If the futex/condvar is blocked, I think that points to either the worker thread never seeing the work it was supposed to do, or it did the work and signaled the completion, but the simulated futex didn't match the send with the receive.

I think the worker's work was received and done, as evidenced by other steps I took in narrowing down the possibilities. But that analysis also ran into new problems not worth mentioning here so everything here needs to be taken with a grain of salt.

@Darksonn
Copy link

Generally, each Tokio runtime will have one thread blocked on epoll, and all other threads blocked on a futex/condvar. In the case of the main thread, that's a block_on call which is always in the futex/condvar category.

@oli-obk
Copy link
Contributor

oli-obk commented Sep 27, 2024

Maybe we need to do a log dumpl of all FFI calls and its args, as well as thread switches in a successful run and in a run that deadlocks and compare them. May have some insights.

@oli-obk
Copy link
Contributor

oli-obk commented Sep 27, 2024

One thing @tiif noticed was that epoll_wait was never invoked on a real system, even tho miri ends up with a deadlock waiting on such a call finishing

@Darksonn
Copy link

that epoll_wait was never invoked on a real system

I'm not convinced that this is really true. When the extra threads become idle, they would use epoll_wait to sleep. Of course, there's a possibility that the main thread causes the process to abort before that happens.

@FrankReh
Copy link
Contributor

I'm not convinced that this is really true.

Thank you. I also took that as a red-herring. And I'm running into my own fair share of those. Glad I don't post something everytime I think a new interesting pattern has emerged. :)

By looking at the tokio-util code, I see where the current_thread runtime comes from. So that makes sense.

I don't know how to dump FFI calls, I'm presuming those are simulated FFI calls. But I would like to see how Futexes are simulated in Miri anyway so this is good.

@oli-obk
Copy link
Contributor

oli-obk commented Sep 27, 2024

This strace of a real execution does not contain a single epoll_wait: https://rust-lang.zulipchat.com/user_uploads/4715/MR22oZW2Mj8PVXHkh_XNk--8/log

We don't have a good way of logging emulated FFI calls. A full trace may be too noisy, so it may require some log filtering or modifying of miri to just log the interesting parts

@FrankReh
Copy link
Contributor

The interplay of the code running in the worker that would use condvar to indicate the work is done and then presumably call notify_one, and the condvar loop running in the main thread which is waiting to see some shared state change is where I want to get further visibility. I'm still looking for the tokio call that does these two sides, and then I'll look at the Miri shim for those pieces. (I think shim is the right term but I could be wrong.)

@FrankReh
Copy link
Contributor

The reproducer can be made even smaller. The tokio current_thread scheduler can be used for main and then the stack dumps show only two threads were active. Not sure this makes analyzing the problem easier though.

use tokio_util::task;

#[tokio::main(flavor = "current_thread")]
async fn main() {
    let pool = task::LocalPoolHandle::new(1);

    let _ = pool.spawn_pinned_by_idx(|| async {}, 0).await;
}

@FrankReh
Copy link
Contributor

Just an update - I am making progress in understanding where the passing path diverges from the failing path in the "best" case. Best defined by the longest common path created by the Miri PRNG. Thank goodness for the INFO messages from the miri::machine module. I clearly see a difference in the miri shim use of the first futex. On the failing side, there is never a futex_wake called because the tokio code seems to have already thought there is no-one left on the other side of an unbounded mpsc channel (funny I'm not sure yet if the sender or the receiver was thought to be gone - I just see on the passing path that tokio::sync::maps::UnboundedSender code kicks in while on the failing side, std::rt::panic_count::count_is_zero kicks in). But even this, take with a grain of salt. I need to reproduce the results, and hopefully somehow reduce the diff noise that came before. I haven't looked at how Miri uses its PRNG from the seed - maybe there is already a way to do this, but it would be nice if there were a second level of seed that could be kicked in after 'n' calls to the first PRNG - 'n' could then be determined manually with another binary-search. So when either a good seed or a bad seed was found, one could play with when to diverge down a new path. As it is, I have a passing seed of 167 and a failing seed of 178 that show remarkable congruency for well over a quarter of the extensive trace outputs (33,000 some lines are the same). But before interesting things happen about lines 105,000, lots of small and medium differences - and the significance of those differences ahead of the large divergence I haven't figured out. I also have ideas of reducing the noise by running tokio with features features enabled and even hard-wiring some cfg macros like for coop and MetricsBatch, that one wouldn't expect to affect the final results, but because of their use of the cell module, certainly cause a lot of variation possibilities for Miri from seed to seed.

If someone finds a quicker solution, I won't mind but this investigation is certainly fun so far.

@Darksonn
Copy link

It sounds like you want exactly what loom does, just under MIRI :)

@oli-obk
Copy link
Contributor

oli-obk commented Sep 28, 2024

I think a seed fuel for breaking the rng after n rng invocations is a great idea. All we'd need to do is track the number of RNG invocations and interleave a single rand bool after N has been reached.

@FrankReh
Copy link
Contributor

@Darksonn Question, does the "net" feature on tokio and on tokio-util have anything in common?

In my new try at reducing the tracing output, I started to run with less than "full" features for tokio and tokio-util and to my surprise, found lots of ways to get the failing test for epoll to pass. I wouldn't be surprised if without "net", there is no epoll, but then how does the main, mentioned above, even run?

Anyway after binary searches on the features for the two crates, I found adding "net" to either dependency caused the failure path.

So in case it helps anyone figure this out faster, running with a Cargo.toml where tokio just uses "rt-multi-thread" and "macros", and tokio-util just uses "rt", I didn't see that test fail, with a hundred different seeds, while usually about one in five would fail. With "net" added to either, it would fail basically immediately. So big difference.

Again, I wouldn't be surprised that without "net" on one or the other, I'm not testing what I want to be testing. But I have to go out for a while.

Later, I can modify a local tokio to see which branches from the "net" feature is significant to this.

@tiif
Copy link
Contributor Author

tiif commented Sep 28, 2024

I observed that adding certain dependencies will invoke epoll, which is pretty surprising to me too. For example in
#3858 (comment), the ICE will only be triggered only after adding test-util.

Also, thanks for helping to narrow down the cause. I will keep following this thread while trying different stuff on my side.

@FrankReh
Copy link
Contributor

Well, okay. The "net" feature to tokio-util passes "net" to tokio. Should have looked at its Cargo.toml before sounding like I didn't know how to figure anything! So I'll focus on what aspect of "net" to tokio makes the difference.

@FrankReh
Copy link
Contributor

A bit more narrowing. The tokio/net feature causes feature mio/os-poll to be pulled in. I've gone to running the test with the current_thread runtime as that only brings two threads into play instead of three and still displays the failure about 20% of the seeded tests - so fewer differences to compare.

I've narrowed it down to the PRNG in the miri concurrency::weak_memory module. Interesting that it is not the PRNG responsible for the thread switching or any of the other ten places a PRNG is used by miri.

Here are some notes I made along the way.

Three things: after further refining of the output: I can say
 1. which Miri PRNG function is hitting the error, the rng used by fetch_store in the Miri
    concurrency::weak_memory module,
 2. the last commonality between the failing and passing main threads is evident,
 3. and the initial differences on the unnamed-1 thread is evident.

The TL;DR
    If someone wants to peruse and throw out suggestions how to follow ideas that either Miri is
    doing something wrong or Tokio is doing something wrong, I'm all ears.

1. Think I've narrowed down where the PRNG affects the weak memory, in a fetch_store:
        (Miri code)
        ...
        let chosen = candidates.choose(rng).expect("store buffer cannot be empty");
        if std::ptr::eq(chosen, self.buffer.back().expect("store buffer cannot be empty")) {
            (chosen, LoadRecency::Latest)
        } else {
            (chosen, LoadRecency::Outdated)
        }

    If I change the Miri code to always return the first element of the candidates, the failures go away.


2. Main thread details:
    At the end of the failed main thread:

        INFO miri::machine Continuing in mio::sys::unix::selector::epoll::Selector::select
        INFO miri::concurrency::thread ---------- Now executing on thread `main` (previous: `unnamed-1`) ----------------------------------------
        [src/shims/unix/linux/epoll.rs:451:9] EpollInterestTable::epoll_wait
        [src/shims/unix/linux/epoll.rs:134:9] Epoll::get_ready_list

        And the main thread has nothing left to do, and then the deadcode error is reported.

    While on a passing main thread, the tracing continues from there:

        INFO miri::concurrency::thread ---------- Now executing on thread `main` (previous: `unnamed-1`) ----------------------------------------
        [src/shims/unix/linux/epoll.rs:657:5] blocking_epoll_callback
        [src/shims/unix/linux/epoll.rs:134:9] Epoll::get_ready_list
        [src/shims/unix/linux/epoll.rs:605:5] ready_list_next
        [src/shims/unix/linux/epoll.rs:181:9] EpollInterestTable::get_epoll_interest
        [src/shims/unix/linux/epoll.rs:605:5] ready_list_next

3. Unnamed-1 thread details (the thread created for the pool):

    There are a good number of differences between the passing version and the failing version.
    I'll just list the last line that the two versions would have in common with the hope that
    someone can point me to what to look at further in Miri and Tokio.

  Last common line before a difference:
     INFO miri::machine Continuing in tokio::runtime::scheduler::inject::Inject::<std::sync::Arc<tokio::runtime::scheduler::current_thread::Handle>>::pop

 Then the paths merge again and here is the next last line before a difference:
     INFO miri::machine Continuing in tokio::runtime::scheduler::current_thread::CoreGuard::<'_>::block_on::<std::pin::Pin<&mut {async fn body of tokio::task::LocalSet::run_until<{async block@tokio_util::task::spawn_pinned::LocalWorkerHandle::run::{closure#0}}>()}>>::{closure#0}

 Agsin (same description):
     INFO miri::machine Continuing in tokio::runtime::scheduler::current_thread::CoreGuard::<'_>::block_on::<std::pin::Pin<&mut {async fn body of tokio::task::LocalSet::run_until<{async block@tokio_util::task::spawn_pinned::LocalWorkerHandle::run::{closure#0}}>()}>>::{closure#0}

 And then finally things start to swing more wildly.
 The two paths had merged again but then instead of a common line followed by lots of Miri trace on
 one side and nothing on the other (where I take it the PRNG had kicked in),
 now there is a difference of one trace line:

    passing marked with a 'p', failing with an 'f':
      INFO miri::machine Leaving std::cell::RefCell::<std::option::Option<std::boxed::Box<tokio::runtime::scheduler::current_thread::Core>>>::borrow_mut
    p INFO miri::machine Continuing in tokio::runtime::scheduler::current_thread::Context::enter::<(), {closure@tokio::runtime::scheduler::current_thread::Context::run_task<(), {closure@tokio::runtime::scheduler::current_thread::CoreGuard<'_>::block_on<std::pin::Pin<&mut {async fn body of tokio::task::LocalSet::run_until<{async block@tokio_util::task::spawn_pinned::LocalWorkerHandle::run::{closure#0}}>()}>>::{closure#0}::{closure#1}}>::{closure#0}}>
    f INFO miri::machine Continuing in tokio::runtime::scheduler::current_thread::Context::enter::<(), {closure@tokio::runtime::scheduler::current_thread::Context::park::{closure#1}}>
      INFO miri::machine Leaving std::ptr::NonNull::<std::option::Option<std::boxed::Box<tokio::runtime::scheduler::current_thread::Core>>>::as_ptr

@FrankReh
Copy link
Contributor

FrankReh commented Sep 30, 2024

@RalfJung That actually makes a lot of sense. Thanks for spelling out what probably seemed obvious to you. Let me see if I can follow the bread crumbs in eventfd and the test you laid out.

--Edit

So eventfd has a read and a write and they join their VClock to the thread, or from the thread, respectively.

@tiif
Copy link
Contributor Author

tiif commented Sep 30, 2024

Thanks again for all the help here!

So likely every interest list also needs to track a vector clock, and every time an event gets added to the interest list, the current threads' clock is joined into the interest list clock. And then when a wakeup bappens, the woken-up thread acquires the interest list clock.

A thread is woken up only if there is at least one EpollEventInstance in the ready_list. So would it be possible that you are talking about synchronise the ready_list with the woken up thread?

How it works is (with a lot of details omitted):

  1. epoll_wait blocks
  2. An event happen to target file description
  3. Insert EpollEventInstance to the ready_list
  4. Unblock one thread

@RalfJung
Copy link
Member

RalfJung commented Sep 30, 2024 via email

@FrankReh
Copy link
Contributor

There are two ready_list defined in the epoll module. Still wrapping my head around the fact there are two things holding the same type of BTreeMap. Is there really one but through cloning, the two types share it? If so, then it sounds like a new struct to wrap the BTreeMap, or essentially turn it into a tuple of BTreeMap and VClock.

@tiif
Copy link
Contributor Author

tiif commented Sep 30, 2024

They are the same thing. If an EpollEventInterest is registered under an Epoll instance, then it will inherit the same ready_list.

@tiif
Copy link
Contributor Author

tiif commented Sep 30, 2024

As expected, I managed to produce a data race error (and sadly not a deadlock) with the test below in libc-epoll-blocking.rs with ./miri run --dep ./tests/pass-dep/libc/libc-epoll-blocking.rs -Zmiri-preemption-rate=0:

fn test_epoll_race() {
    // Create an epoll instance.
    let epfd = unsafe { libc::epoll_create1(0) };
    assert_ne!(epfd, -1);

    // Create an eventfd instances.
    let flags = libc::EFD_NONBLOCK | libc::EFD_CLOEXEC;
    let fd = unsafe { libc::eventfd(0, flags) };

    // Register eventfd with the epoll instance.
    let mut ev = libc::epoll_event { events: EPOLL_IN_OUT_ET, u64: fd as u64 };
    let res = unsafe { libc::epoll_ctl(epfd, libc::EPOLL_CTL_ADD, fd, &mut ev) };
    assert_eq!(res, 0);

    static mut VAL: u8 = 0;
    let thread1 = thread::spawn(move || {
        // Write to the static mut variable.
        unsafe { VAL = 1 };
        // Write to the eventfd instance.
        let sized_8_data: [u8; 8] = 1_u64.to_ne_bytes();
        let res = unsafe { libc::write(fd, sized_8_data.as_ptr() as *const libc::c_void, 8) };
        // read returns number of bytes has been read, which is always 8.
        assert_eq!(res, 8);
    });
    thread::yield_now();
    // epoll_wait for the event to happen.
    let expected_event = u32::try_from(libc::EPOLLIN | libc::EPOLLOUT).unwrap();
    let expected_value = u64::try_from(fd).unwrap();
    check_epoll_wait::<8>(epfd, &[(expected_event, expected_value)], -1);
    // Read from the static mut variable.
    unsafe { assert_eq!(VAL, 1) };
    thread1.join().unwrap();
}

I kind of hope I can reproduce the deadlock issue purely using syscalls to check whether it is fixed.

@FrankReh
Copy link
Contributor

Should the VClock be per ready_list, or per entry in the ready_list, the EpollEventInterest? The latter is more fine grained I think and would only cause a clock sync from the thread resolving the event to the thread waiting for the event. Otherwise there seems like more clock syncing to the thread being woken then is called for.

@RalfJung
Copy link
Member

Isn't the ready list already per-epoll-instance?

But I guess per event is indeed better, since a call to epoll_wait may only consume some of the events in the list, and then only those should have their clock acquired.

@tiif
Copy link
Contributor Author

tiif commented Sep 30, 2024

Isn't the ready list already per-epoll-instance?

Yes.

I have a concern here, if we were to support level-triggered epoll in future, one event will be able to wake up more than one thread at once. I assume a clock can only be acquire once, so would this cause problem?

So for level-triggered epoll, something like this might happen:

  1. thread1: block on epoll_wait
  2. thread2: block on epoll_wait
  3. event happen to target file description
  4. thread1 unblock
  5. thread2 unblock

What we currently support is edge-triggered epoll, and one event will only wake up one thread, but this doesn't apply to level-triggered epoll.

@RalfJung
Copy link
Member

I assume a clock can only be acquire once

That assumption is wrong. :) It can be acquired any number of times.

@FrankReh
Copy link
Contributor

@tiif Thank you for putting my latest confusion so clearly. I'm looking for where a fd event triggers the epoll mechanism, and wondering how the epoll event hooks up to the ready lists. Probably there is one global for all epoll events and that can be mapped back to the epolls?

@RalfJung
Copy link
Member

I kind of hope I can reproduce the deadlock issue purely using syscalls to check whether it is fixed.

As far as I am concerned, the above is a better test. But if you want to keep looking I won't stop you. :)

@tiif
Copy link
Contributor Author

tiif commented Sep 30, 2024

That assumption is wrong. :) It can be acquired any number of times.

Nice to hear that! :D

I'm looking for where a fd event triggers the epoll mechanism

Whenever an event happened, check_and_update_readiness will be invoked. From there, the event will be added to the ready list and do all the unblock thread stuff.

We have a global map that maps target file description to its associated EpollEventInterest, so inside check_and_update_readiness, we use that global map to retrieve all EpollEventInterest that is related to target file description and update the corresponding ready_list.

But I think my explanation is vague here, if you need more explanation you can always open a thread in zulip or ping me directly :)

@FrankReh
Copy link
Contributor

@tiif Perfectly clear. I was going down that path but hadn't noticed the this.machine.epoll_interests. So I think the clock can be per ready list. I should be able to test this in a few minutes now but you may want to fix it all up to your liking anyway, I totally understand.

@tiif
Copy link
Contributor Author

tiif commented Sep 30, 2024

Just go for it :), It'd be nice if you can help to test the tokio reproducible above at the same time and make sure it can pass (but you don't need to add it into the PR).

@FrankReh
Copy link
Contributor

Yes, the tokio reproducible is one of my goals. Just getting it to compile is satisfying but not that satisfying.

@FrankReh
Copy link
Contributor

Adding VClock to the epoll ready_list fixes the tokio reproducer.

@RalfJung
Copy link
Member

RalfJung commented Oct 1, 2024

Awesome. :) Nice job identifying which part of the non-determinism is the culprit here.

@FrankReh
Copy link
Contributor

FrankReh commented Oct 1, 2024

For the PR, I was asked to include the test case described above. I could use some help in figuring out how to get the test either run or compiled. When I try to run it, it won't compile but maybe there is a different way the test should be run?

I tried

./miri test libc-epoll-blocking

but that fails because the test harness says the function didn't compile and it shows a reasonable compiler error:

--- tests/pass-dep/libc/libc-epoll-blocking.stderr
+++ <stderr output>
+error: creating a shared reference to mutable static is discouraged
+  --> tests/pass-dep/libc/libc-epoll-blocking.rs:LL:CC
+   |
+LL |     unsafe { assert_eq!(VAL, 1) };
+   |                         ^^^ shared reference to mutable static
+   |
+   = note: for more information, see <https://doc.rust-lang.org/nightly/edition-guide/rust-2024/static-mut-references.html>
+   = note: shared references to mutable statics are dangerous; it's undefined behavior if the static is mutated or if a mutable reference is created for it while the shared reference lives
+   = note: `-D static-mut-refs` implied by `-D warnings`
+   = help: to override `-D warnings` add `#[allow(static_mut_refs)]`
+
+error: miri cannot be run on programs that fail compilation
+
+error: aborting due to 2 previous errors
+

@FrankReh
Copy link
Contributor

FrankReh commented Oct 1, 2024

Follow-up question. I found another way to run the test so now it's just a warning but then an assertion fails.

$ ./miri run --dep tests/pass-dep/libc/libc-epoll-blocking.rs
warning: creating a shared reference to mutable static is discouraged
   --> tests/pass-dep/libc/libc-epoll-blocking.rs:173:25
    |
173 |     unsafe { assert_eq!(VAL, 1) };
    |                         ^^^ shared reference to mutable static
    |
    = note: for more information, see <https://doc.rust-lang.org/nightly/edition-guide/rust-2024/static-mut-references.html>
    = note: shared references to mutable statics are dangerous; it's undefined behavior if the static is mutated or if a mutable reference is created for it while the shared reference lives
    = note: `#[warn(static_mut_refs)]` on by default

thread 'main' panicked at tests/pass-dep/libc/libc-epoll-blocking.rs:171:5:
assertion `left == right` failed: got wrong events
  left: 4
 right: 5

and here you can see what line 171 in that file is:

   142	// This test shows a data_race before epoll had vector clocks added.
   143	fn test_epoll_race() {
   144	    // Create an epoll instance.
   145	    let epfd = unsafe { libc::epoll_create1(0) };
   146	    assert_ne!(epfd, -1);
   147	
   148	    // Create an eventfd instance.
   149	    let flags = libc::EFD_NONBLOCK | libc::EFD_CLOEXEC;
   150	    let fd = unsafe { libc::eventfd(0, flags) };
   151	
   152	    // Register eventfd with the epoll instance.
   153	    let mut ev = libc::epoll_event { events: EPOLL_IN_OUT_ET, u64: fd as u64 };
   154	    let res = unsafe { libc::epoll_ctl(epfd, libc::EPOLL_CTL_ADD, fd, &mut ev) };
   155	    assert_eq!(res, 0);
   156	
   157	    static mut VAL: u8 = 0;
   158	    let thread1 = thread::spawn(move || {
   159	        // Write to the static mut variable.
   160	        unsafe { VAL = 1 };
   161	        // Write to the eventfd instance.
   162	        let sized_8_data: [u8; 8] = 1_u64.to_ne_bytes();
   163	        let res = unsafe { libc::write(fd, sized_8_data.as_ptr() as *const libc::c_void, 8) };
   164	        // read returns number of bytes that have been read, which is always 8.
   165	        assert_eq!(res, 8);
   166	    });
   167	    thread::yield_now();
   168	    // epoll_wait for the event to happen.
   169	    let expected_event = u32::try_from(libc::EPOLLIN | libc::EPOLLOUT).unwrap();
   170	    let expected_value = u64::try_from(fd).unwrap();
   171	    check_epoll_wait::<8>(epfd, &[(expected_event, expected_value)], -1);
   172	    // Read from the static mut variable.
   173	    unsafe { assert_eq!(VAL, 1) };
   174	    thread1.join().unwrap();
   175	}

If I run it as ./miri run --dep ./tests/pass-dep/libc/libc-epoll-blocking.rs -Zmiri-preemption-rate=0 then the test passes and there is just the warning about the shared reference to mutable static.

Is this the normal way to run tests and should I just commit this to the PR as is? The PR's CI isn't going to flag the warning, the assertion, or the compile error?

@oli-obk
Copy link
Contributor

oli-obk commented Oct 1, 2024

you can grep for -Zmiri-preemption-rate to see how other tests add that flag.

the warnings can be disabled with the attribute mentioned in the diagnostic message

@FrankReh
Copy link
Contributor

FrankReh commented Oct 1, 2024

Under what conditions do the //@compile-flags: kick in? I see the relevant one already in the file under test. But I also see a comment to indicate what test function it is needed for on the line above it and if I remove the lines completely, only the new/last function asserts a failure, not the one the comments indicated the flag was there for.

So why does putting the flag on the command line to miri work but not the comment in the header? I find dozens of test files with similar lines in their header.

Here is what the file already has:

//@only-target: linux
// test_epoll_block_then_unblock depends on a deterministic schedule.
//@compile-flags: -Zmiri-preemption-rate=0

I found the ui_test docs talk about this comment flag, but it doesn't help me understand why passing the -Z flag on the test command line makes it pass but not having it on the command line but in the test file, like dozens of others, and only this new test function fails?

@FrankReh
Copy link
Contributor

FrankReh commented Oct 1, 2024

Well, as soon as I posted that, I thought of something else to try.

Just ./miri test and the tests pass. But only if that @compile-flags is in the file.
without that compile-flags line, the whole test reports the new epoll test function fails, but not the other function, maybe due to a different seed?

But when the test is run with a FILTER, either as a test or a run, the flag has to be provided on the command-line. Was this advice somewhere I should have been reading first - it would probably be in the ui_test docs but I don't see it? Or am I still misinterpreting things and the results are explained some other way? 😕

@tiif
Copy link
Contributor Author

tiif commented Oct 2, 2024

There are two ways to run the test in miri, one is ./miri test, another one is ./miri run. ./miri test will understand the @compile-flags (probably because this command is using ui_test) . In ./miri run, I need to pass all the flags myself, and it behaves like running miri normally on a program, it won't give me hint to bless the test if it fails, it will just provide the normal output. Usually if ./miri test pass, it will pass on CI.

There is a brief introduction about testing in https://github.com/rust-lang/miri/blob/master/CONTRIBUTING.md#building-and-testing-miri

If you are unsure, you can push the change to the PR first, and we can help to check what's wrong by reading the CI log.

@oli-obk
Copy link
Contributor

oli-obk commented Oct 2, 2024

Thanks for debugging this, finding a fix and opening a PR! This was definitely very unexpected from just looking at the original issue

@RalfJung
Copy link
Member

RalfJung commented Oct 3, 2024

I am wondering if there are API things we can do to reduce the likelihood of this. Thread unblocking almost always occurs because of some other event an another thread that we should register a causal link with, so maybe the thread unblocking function should take a clock. It's just we've so far kept those subsystems largely orthogonal.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants