Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Manually running a schedule in a sub-world hangs #10032

Closed
Bytekeeper opened this issue Oct 5, 2023 · 6 comments
Closed

Manually running a schedule in a sub-world hangs #10032

Bytekeeper opened this issue Oct 5, 2023 · 6 comments
Labels
A-ECS Entities, components, systems, and events C-Bug An unexpected or incorrect behavior P-Regression Functionality that used to work but no longer does. Add a test for this! S-Needs-Investigation This issue requires detective work to figure out what's going wrong

Comments

@Bytekeeper
Copy link

Bevy version

0.11.3

What you did

I tried to run the main schedule in a "sub-world". The following demo program should terminate, but as long as the "sub-world" contains a system, it will hang. (Please note that in a more elaborate example some systems might even run before it hangs)

use bevy::app::{AppExit, MainSchedulePlugin};
use bevy::prelude::*;

#[derive(Resource)]
pub struct SubWorld(World);

fn call_sub_app(mut world: ResMut<SubWorld>) {
    world.0.run_schedule(Main);
}

fn main() {
    let mut sub_app = App::empty();
    sub_app
        .add_plugins(MainSchedulePlugin)
        // Without systems, the app will exit immediately
        .add_systems(Update, || ());
    sub_app.finish();
    sub_app.cleanup();
    let world = sub_app.world;
    let mut app = App::empty();
    app.add_plugins(MainSchedulePlugin)
        .add_systems(Update, call_sub_app)
        .insert_resource(SubWorld(world));
    app.world.send_event(AppExit);
    app.run();
}

What went wrong

I expected the inner run_schedule to run and complete. After that, the outer system should continue (and terminate).
Instead it hangs, I assume there's a deadlock (here's an excerpt from GDB):

#0  0x00007fcd4b50ed6d in syscall () from /usr/lib/libc.so.6
#1  0x000055cabc4690f1 in std::sys::unix::futex::futex_wait ()
    at library/std/src/sys/unix/futex.rs:62
#2  std::sys::unix::locks::futex_condvar::Condvar::wait_optional_timeout ()
    at library/std/src/sys/unix/locks/futex_condvar.rs:49
#3  std::sys::unix::locks::futex_condvar::Condvar::wait ()
    at library/std/src/sys/unix/locks/futex_condvar.rs:33
#4  0x000055cabc414ba2 in std::sync::condvar::Condvar::wait<()> (
    self=0x55cabc5d88a0, guard=...)
    at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/std/src/sync/condvar.rs:191
#5  0x000055cabc417b8f in parking::Inner::park (self=0x55cabc5d8890, timeout=...)
    at src/lib.rs:358
#6  0x000055cabc4178d7 in parking::Parker::park (self=0x7fcd4b7f9c20)
    at src/lib.rs:119
#7  0x000055cabc31f50d in futures_lite::future::block_on::{closure#0}<alloc::vec::Vec<(), alloc::alloc::Global>, bevy_tasks::task_pool::{impl#2}::scope_with_executor_inner::{async_block_env#0}<bevy_ecs::schedule::executor::multi_threaded::{impl#2}::run::{closure_env#1}, ()>> (cache=0x7fcd4b7f9c18)
    at /home/dante/.cargo/registry/src/index.crates.io-6f17d22bba15001f/futures-lite-1.13.0/src/future.rs:91
#8  0x000055cabc3344db in std::thread::local::LocalKey<core::cell::RefCell<(parking::Parker, core::task::wake::Waker)>>::try_with<core::cell::RefCell<(parking::Parker, core::task::wake::Waker)>, futures_lite::future::block_on::{closure_env#0}<alloc::vec::Vec<(), alloc::alloc::Global>, bevy_tasks::task_pool::{impl#2}::scope_with_executor_inner::{async_block_env#0}<bevy_ecs::schedule::executor::multi_threaded::{impl#2}::run::{closure_env#1}, ()>>, alloc::vec::Vec<(), alloc::alloc::Global>> (
    self=0x55cabc4f8590, f=...)
    at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/std/src/thread/local.rs:270
#9  0x000055cabc333932 in std::thread::local::LocalKey<core::cell::RefCell<(parking::Parker, core::task::wake::Waker)>>::with<core::cell::RefCell<(parking::Parker, core::task::wake::Waker)>, futures_lite::future::block_on::{closure_env#0}<alloc::vec::Vec<(), alloc::alloc::Global>, bevy_tasks::task_pool::{impl#2}::scope_with_executor_inner::{async_block_env#0}<bevy_ecs::schedule::executor::multi_threaded::{impl#2}::run::{closure_env#1}, ()>>, alloc::vec::Vec<(), alloc::alloc::Global>> (
    self=0x55cabc4f8590, f=...)
    at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/std/src/thread/local.rs:246
#10 0x000055cabc31f2a9 in futures_lite::future::block_on<alloc::vec::Vec<(), alloc::alloc::Global>, bevy_tasks::task_pool::{impl#2}::scope_with_executor_inner::{async_block_env#0}<bevy_ecs::schedule::executor::multi_threaded::{impl#2}::run::{closure_env#1}, ()>> (future=...)
@Bytekeeper Bytekeeper added C-Bug An unexpected or incorrect behavior S-Needs-Triage This issue needs to be labelled labels Oct 5, 2023
@hymm
Copy link
Contributor

hymm commented Oct 5, 2023

can you do cargo tree and see if async_executor is at 1.5.4? If it's that or 1.5.3 can you try pinning to 1.5.1 cargo update -p async-executor --precise 1.5.1.

@Bytekeeper
Copy link
Author

Bytekeeper commented Oct 5, 2023

It's at 1.5.4:

cargo tree | rg async-executor
        │   │   │   │   ├── async-executor v1.5.4

Pinning it to 1.5.1 seems to resolve the issue!

@hymm
Copy link
Contributor

hymm commented Oct 5, 2023

async-executor made some changes to improve performance recently, but it's likely causing problems.

It might also be possible that the changes re-exposed this bug that we worked around before #7825 (comment).

@hymm hymm added this to the 0.12 milestone Oct 5, 2023
@alice-i-cecile alice-i-cecile added A-ECS Entities, components, systems, and events P-Regression Functionality that used to work but no longer does. Add a test for this! and removed S-Needs-Triage This issue needs to be labelled labels Oct 9, 2023
@maniwani
Copy link
Contributor

maniwani commented Oct 9, 2023

I encountered this in #9122. Examples with state transitions (like alien_cake_addict) just froze, but pinning async-executor to version 1.5.1 resolves it.

@hymm
Copy link
Contributor

hymm commented Oct 17, 2023

async-executor rolled back the change that was causing this and released 1.6.0. So this should be fixed, but I haven't tested yet.

@alice-i-cecile alice-i-cecile removed this from the 0.12 milestone Oct 26, 2023
@alice-i-cecile alice-i-cecile added the S-Needs-Investigation This issue requires detective work to figure out what's going wrong label Oct 26, 2023
@hymm
Copy link
Contributor

hymm commented Feb 17, 2024

The cause of this (queueing task to local queue) was completely rolled back in async executor.

@hymm hymm closed this as completed Feb 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-ECS Entities, components, systems, and events C-Bug An unexpected or incorrect behavior P-Regression Functionality that used to work but no longer does. Add a test for this! S-Needs-Investigation This issue requires detective work to figure out what's going wrong
Projects
None yet
Development

No branches or pull requests

4 participants