-
-
Notifications
You must be signed in to change notification settings - Fork 14.4k
Switch the destructors implementation for thread locals on Windows to use FLS #148799
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Switch the destructors implementation for thread locals on Windows to use FLS #148799
Conversation
|
r? @ChrisDenton rustbot has assigned @ChrisDenton. Use |
This comment has been minimized.
This comment has been minimized.
|
Hi @ChrisDenton, any chance you can take a look at this? Tests are only failing because of missing support in Miri (which I have implemented in a branch) Thanks! |
|
Wouldn't this conflict with the use of fibers by user code? If you switch to a fiber, access a tls variable for the first time on a thread, switch back and destroy the fiber, the tls variable would get incorrectly deinitialized. And if you move a fiber to another thread and exit the original thread, the tls variable would get deallocated while the fiber still has a reference to it that will cause a use-after-free when destroying the fiber. |
No - The way the new code works is that Edit: fibers can't be moved between threads, so the text below is not really relevant. I also update the code to match my next comment about order of fiber/thread exit. However, I now think you are technically right @bjorn3 , but only if the user starts the executable outside of Rust and performs runtime initialization (or otherwise triggers I couldn’t find whether there is even a documented way to do this (since In theory, I could add a check when setting the destructor that However, it seems better to document that this usage is unsupported if that's something that's not currently guaranteed to work. |
That still doesn't account for multiple fibers running on the same thread or fibers migrating between threads, right? The TLS variables have unique storage per thread, while the fiber-local destructor runs once per fiber on whichever thread destroys the fiber in the end as I understand it.
|
I will add a comment in the code so it's more clear, but no - because we only execute the My understanding is that there isn't a safe way to exit a fiber without terminating the thread anyway (to use fibers, a thread must always start by calling ConvertThreadToFiber and DeleteFiber says "If the currently running fiber calls DeleteFiber, its thread calls ExitThread and terminates. However, if a currently running fiber is deleted by another fiber, the thread running the deleted fiber is likely to terminate abnormally because the fiber stack has been freed.")
Hm, seems like |
6502684 to
0327bec
Compare
|
The Miri subtree was changed cc @rust-lang/miri |
This comment has been minimized.
This comment has been minimized.
|
I did see this before the holidays but didn't have time to investigate. Last time I considered this I had concerns because rust does not manage threads, except those it spawns itself (and even then only to a degree). Which means an FLS destructor may run before the OS thread finishes whereas TLS is expected to be valid for the duration of the OS thread. Maybe those concerns are unfounded or mitigate but I'd want to be very sure before switching to it. I'll ping the windows group in case anyone has reasons we should/shouldn't do this. @rustbot ping windows |
This comment was marked as outdated.
This comment was marked as outdated.
|
Hey Windows Group! This bug has been identified as a good "Windows candidate". cc @albertlarsan68 @arlosi @ChrisDenton @danielframpton @dpaoliello @gdr-at-ms @kennykerr @luqmana @nico-abram @retep998 @sivadeilra @wesleywiser |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, I'm in favor of this approach: it replaces undocumented features that have caused issues with a documented feature.
Is it possible to delete a fiber that isn't running and will never run again? |
Assuming that Fiber destruction is linked to thread destruction (or that we can somehow only run this code when the last fiber is destroyed), then is the gap between the fibers being destroyed and the thread being destroyed observable? I think it only matters to code running in DLL Detach, which shouldn't be messing with TLS stuff in Rust anyway... |
|
We have no way of ensuring that A very quick sketchuse std::ffi::c_void;
use windows::Win32::System::Threading::{
ConvertThreadToFiber, CreateFiber, DeleteFiber, FlsAlloc, FlsSetValue, SwitchToFiber, ConvertFiberToThread,
};
fn main() {
unsafe {
let main = ConvertThreadToFiber(None);
let fiber = CreateFiber(0, Some(fiber_start), Some(main));
println!("switching to another fiber");
SwitchToFiber(fiber);
DeleteFiber(fiber); // Invokes the FLS callback.
println!("end of main fiber");
}
}
unsafe extern "system" fn fls_callback(_param: *const c_void) {
println!("fls dealloc");
}
extern "system" fn fiber_start(main: *mut c_void) {
println!("fiber started");
unsafe {
let index = FlsAlloc(Some(fls_callback));
let _ = FlsSetValue(index, Some(1234 as _));
SwitchToFiber(main);
};
} |
I totally missed that! Thanks @ChrisDenton's for the sketch 🙏
I think That's a bit of a hack (which might defeat the purpose of replacing the current |
This comment has been minimized.
This comment has been minimized.
ab9eddd to
456fa3b
Compare
This comment has been minimized.
This comment has been minimized.
25cf05e to
e559e78
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot! The Miri code largely LGTM, though I have some nits.
0179ba3 to
e14adff
Compare
e14adff to
4a4d0dd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
r=me on the Miri parts (modulo the last assertion nit). Thanks a lot!
4a4d0dd to
0c3496f
Compare
|
@RalfJung sorry but I added another small change because I missed something 🙈 : turns out Windows zeros the key's value after the dtor is run (also in Wine: https://github.com/wine-mirror/wine/blob/wine-11.0/dlls/ntdll/thread.c#L715). |
I don't know about wasmtime specifically but I'm not particularly keen on that in general. We can't guarantee that rust's main is run. For example, if it's compiled as a DLL or the entry point of another language is used (e.g. C/C++). I'm also not sure we should stably guarantee that |
|
For Wasmtime we do expect entrypoints outside of Rust (e.g. using the C API of Wasmtime), so we can't rely on Rust's runtime initialization. With this change I'd probably go the route of thread-local-with-dtor-that-does-nothing and hit that whenever we fiber switch to ensure it's initialized for the current thread. The main worry for me is that we allow arbitrary Rust code (the embedder of Wasmtime) to run within a Windows fiber, and that's the risk of breakage here. It sounds like we can mitigate that with thread-local-with-dtor-that-does-nothing, however. Otherwise though, right, we don't let anything abnormally kill the fiber -- or at least not baked into Wasmtime. Fibers always exit "cleanly" from the perspective of the fiber itself. Put another way, if host code panics or wasm traps, we always catch that within the context of the fiber, exit the fiber, then do whatever's necessary when we're back on a thread. |
90d305b to
cf421a4
Compare
cf421a4 to
c72a0fb
Compare
|
@alexcrichton Should I open a PR to add the TLS+Drop access somewhere before the function you mentioned? Suggested addition to `resume`Should I also add another test like |
|
Thanks for testing! And yeah no worries about the debug tests, they're a bit finnicky with precise verisons of installed tools anyway. By no means feel obligated to send a PR to Wasmtime, but if you're willing it'd be much appreciated! The change you propose is what I was thinking as well, so looks good to me 👍 |
This comment has been minimized.
This comment has been minimized.
c72a0fb to
66af8be
Compare
|
This PR was rebased onto a different main commit. Here's a range-diff highlighting what actually changed. Rebasing is a normal part of keeping PRs up to date, so no action is needed—this note is just to help reviewers. |
Summary
Switch the thread local destructors implementation on Windows to use the Fiber Local Storage APIs, which provide native support for setting a callback to be called on thread termination, replacing the current
tls_callbacksymbol-based implementation.Except for some spellchecking, no LLMs were used to produce code / comments / text in this PR.
Current Implementation
On Windows, in order to support thread locals with destructors,
the standard library uses a special
tls_callbacksymbol that is used to call thedestructors::run()hook on thread termination.This has two downsides:
LocalKey's documentation.as an example of point 2, this code, which uses
JoinHandle::joinin a thread local Drop impl, will deadlock on stable:Join-on-Drop Deadlock Example
Proposed Change
We can use the
Fls{Alloc,Set,Get,Free}functions (see https://devblogs.microsoft.com/oldnewthing/20191011-00/?p=102989)to implement the dtor callback needed for thread locals that have a Drop implementation.
We allocate a single key, and use its destructor callback to run all the registered destructors when a thread is shutting down.
With this implementation, the above code sample will not deadlock (but it still might not be a good idea to do this!).
Safety and Compatibility
Destructors will only run once: we use the common
thread_local+ atomic pattern to only set the Fls maker value once. The destructor callback is only called when that value is non-zero, so we are guaranteed that it will only be called once.Destructors will only run at thread exit: we verify that we are not running in a fiber during the destructors callback. This means that using fibers (which is very rare) will result in thread local being leaked, unless the fiber is converted back to a thread using
ConvertFiberToThreadbefore thread termination. This is not ideal, but should be OK as destructors are not guaranteed to run, but it needs to be documented.rtmodule).It might be possible for the user to use something like the current
tls_callbackto observe an already-freed thread locals, which is something that can also happen in the current implementation.Destructors will only run on the correct thread: Fibers cannot be moved between threads.Destructors will only run on the correct thread: they are registered to a thread_local list, so fiber movement between threads does not matter.
Users cannot observe different locals because they are using fibers: because we only use an Fls local marker to trigger the destructors callback, we don't change anything about how users interact with "normal" thread locals and fiber locals.
Other Notes
The implementation is based on the
key::racyandguard::applecode, because we need aLazyKey-like racey static and anenablefunction.While TLS slots are limited to 1088,
FLS slots are currently limited to 4000
per process.
Miri
Because miri is aware to the thread local implementation, I also implemented these functions and support for them in the interpreter here:
https://github.com/rust-lang/miri/compare/master...ohadravid:miri:windows-fls-support?expand=1
I guess that this will need to be merged before this PR (if this is accepted) - let me know and I'll open that PR as well.
Targets without
target_thread_localIn
*-gnuWindows targets, thetarget_thread_localfeature is unavailable.We could also change the "key" (non-
target_thread_local) Windows impl atlibrary\std\src\sys\thread_local\key\windows.rsto be based on the Fls functions. I can add it to this PR, or as a separate PR, if you think this is preferable.
Also, I used a
Cellin a#[thread_local]to store the resulting key, like the other implementations.This works, but I'm not sure if this is 100% OK given that we have these targets as well.