Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non send as thread marker + TLS #6657

Closed
wants to merge 6 commits into from

Conversation

hymm
Copy link
Contributor

@hymm hymm commented Nov 17, 2022

Objective

  • Keep world send by removing NonSend Resources

Solution

  • Add a thread local storage resource that can hold non send types. This uses the thread-local-object library which seems to have a security advisory in one of it's dependencies, so that will either need to be changed or the whole library needs to replaced.
  • Add a main thread marker that tells a system to run on the main thread
  • Delete all code related to non send resources

Changelog

  • replace uses of non send with a main thread marker system param and a tls resource.

Migration Guide

before:

struct MyResource(pub u32);

fn main() {
    App::new()
        .insert_non_send_resource(MyResource(0))
        .add_system(my_system);
}

fn my_system(res: NonSendMut<MyResource>) {
    res.0 = 1;
}

after:

fn main() {
    App::new()
       .insert_resouce(Tls::new(MyResource(0))
       .add_system(my_system);
}

fn my_system(
    _marker: MainThread,
    res: ResMut<Tls<MyResource>>,
) {
    res.get_mut(|res_inner| {
        res_inner.0 = 1;
    });
}

Notes

Keeping this in draft until a decision has been made on what solution we want. This needs to be cleaned up a bit more before being merged.

@maniwani
Copy link
Contributor

maniwani commented Nov 18, 2022

If Tls<T> is a resource in the World, how do you conditionally force commands and exclusive systems to run on the main thread?

It seems like exclusive systems can just have a non-send marker param, but I'm not sure about commands. Would those just be unable to access the data?

@alice-i-cecile alice-i-cecile added A-ECS Entities, components, systems, and events X-Controversial There is active debate or serious implications around merging this PR P-Unsound A bug that results in undefined compiler behavior D-Complex Quite challenging from either a design or technical perspective. Ask for help! labels Nov 18, 2022
@hymm
Copy link
Contributor Author

hymm commented Nov 18, 2022

exclusive systems

In the current PR + pipelined rendering you could do:

fn my_exclusive_system(world: &mut World) {
    let main_thread_executor = world.get_resource::<MainThreadExecutor>();
    ComputeTaskPool::get().scope_with_executor(false, main_thread_executor.clone(), |scope| {
        let tls = world.get_resource::<Tls<MyResource>>();
        scope.spawn_on_scope(async move {
            tls.get(|tls| {
                /* do something with the tls */
            });
        });
    });
}

This solution would work for normal systems too and is pretty close to your recommended solution.

commands

So this is a little weird. The way the lib works is that it's just a thread local hashmap and the thing that gets stored in the world is just the key into the hashmap. So if we called remove::<Tls>() what actually happens is that the key is dropped, but the actual data in the TLS would only get dropped when the thread exits.

The story around dropping is probably the biggest outstanding issue in this PR. I shimmed a drop impl onto Tls, but that only really cleans up the tls on the thread the Tls resource is dropped from. It doesn't help if there's resources on other threads. Another short term solution would be to have a main thread system that runs and cleans up the audio Tls resource on app exit. Long term is trickier and might require some type of generalized garbage cleanup system.

To be clear I think of this PR + the ThreadExecutor from pipelined rendering as a minimal set of tools to let us deal with non send data by shoving it into TLS. There should almost certainly be follow up PR's to provide more ergonomic API's and replace the thread-local-object library with our own abstractions. The lib has some annoying limitation that users would probably hit like not being able to have 2 get_mut's nested one inside the other and only allowing usize number of ThreadLocal::new()'s and not ever cleaning up the memory used in the hashmap.

@maniwani
Copy link
Contributor

maniwani commented Nov 18, 2022

So this is a little weird. The way the lib works is that it's just a thread local hashmap and the thing that gets stored in the world is just the key into the hashmap. So if we called remove::() what actually happens is that the key is dropped, but the actual data in the TLS would only get dropped when the thread exits.

I understand how thread_local_object works. What I was pointing out was that (1) exclusive systems did not receive a similar param to make them run on the main thread and (2) commands can't use the keys as-is. You would need to send a task that applies commands over to the main thread (like stageless does).

To be clear I think of this PR + the ThreadExecutor from pipelined rendering as a minimal set of tools to let us deal with non-send data by shoving it into thread-local storage.

There should almost certainly be follow up PR's to provide more ergonomic API's and replace the thread-local-object library with our own abstractions. The lib has some annoying limitations like not being able to nest get_mut calls and never cleaning up the memory used by the hash map.

(edited)
Using the existing API for resources the world doesn't actually own seems like the source the caveats you've mentioned, moreso than the weirdness of thread_local_object.

This solution would work for normal systems too and is pretty close to your recommended solution.

(edited)
As you say, the exclusive system example you described as a fallback for when ResMut<Tls<T>> can't be used as a param is fundamentally the same as the general solution I suggested here (except more verbose).

I really think that the ThreadExecutor or "ThreadExecutor scope maker" is the only resource we need. It's clearer about ownership, reduces impact on the main thread (only blocks if you actually run the closure), and gives you a way to drop those resources.

This PR feels like a longer route to that same conclusion.

fn my_system(main_thread: Res<ThreadLocal<Main>>) {
    // access specific resource
    main_thread.run(|res: &mut MyResource| {
        res.0 = 1;
    });

    // or access entire TLS
    main_thread.run(|tls: &mut TLS| {
        // drop a resource normally
        tls.remove_resource::<MyResource>();
    });
}

I don't think I'm describing this very well, so I'll try to submit a PR for comparison.

(edited)
Also, using Res/ResMut here won't detect changes properly. Since any world can mutate the TLS, it needs its own change counter (or we need one for the whole app).

@james7132
Copy link
Member

Closing this out as the alternative in checking the thread ID when accessing NonSend has been merged in #6534. Likewise, the shift to remove NonSend from the World entirely in #9122 may render this PR redundant.

@james7132 james7132 closed this Mar 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-ECS Entities, components, systems, and events D-Complex Quite challenging from either a design or technical perspective. Ask for help! P-Unsound A bug that results in undefined compiler behavior X-Controversial There is active debate or serious implications around merging this PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants