-
Notifications
You must be signed in to change notification settings - Fork 201
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[NativeAOT-LLVM] Threads and GC Poll #2806
Comments
The general idea is that a poll needs to be present on all control paths that are "reasonably long". In the upstream suspension scheme, that's:
The most difficult part here is the fully interruptible case - the general algorithm would need to guarantee timely suspension for all kinds of loops, including irreducible control flow. It is a problem that has been solved elsewhere though, I just don't have the best algorithm for it on-hand. What is 'partially' and 'fully' interruptible code?In the general case, code can contain loops, which can starve the GC. So, we need to be able to suspend the code in the middle of such loops. This is where "full interruptibility" comes in: GC info is recorded for each instructions, and the suspending thread, if it sees that the current IP is in this region, leaves the suspended thread stopped as-is. GC info for fully interruptible methods is (relatively) large, so it is only done in cases where this granular control is needed, i. e. with loops that lack any GC safe points (managed calls). See https://github.com/dotnet/runtime/blob/f1332ab0d82ee0e21ca387cbd1c8a87c5dfa4906/src/coreclr/jit/flowgraph.cpp#L3980C40-L3984. For other methods (the majority of them), only GC info at callsites is recorded, the assumption being that all callees, save a few helpers, are themselves interruptible and thus serve as implicit polls. This assumption is then realized via return address hijacking, where the return address on the stack of the executing method is overwritten by the suspending thread to point to a routine that will wait for GC to complete, thus "catching" the thread once is executes
It should be done in lowering. I am not sure what would be the best place to do it in lowering. Given the interplay with LSSA and the need to predict how poll insertion will affect how much will need to be spilled (for which liveness information can be required), it may be best to do just before LSSA, or even as part of it. |
There is another interesting question here that I haven't come across an answer to so far: does WASM even allow explicit polls? That is, if we have a loop:
Is the WASM compiler allowed to rewrite it to:
E. g. in IL memory model that's allowed - you need to use a stronger volatile (acquire) load to circumvent it, while all that is actually needed is a relaxed load, like |
Sounds like we need to establish this first. |
Some Silverlight configurations used explicit GC polls. The code to insert the explicit GC polls was deleted in dotnet/runtime#42664 . It inserted explicit GC polls on back edges and returns.
It is not allowed. Infinite or long running loop on one thread should not prevent other threads from making a progress. |
Hmm, I am not sure what you mean. I meant can the WASM -> native code compiler perform this transformation (like the IL -> native code one can). I. e. the question is what are the semantics of ordinary loads in WASM (as opposed to atomic, which are always sequentially consistent). |
Wouldn't we use the wasm atomics here? |
I don't see how that would work performance-wise - inserting |
Why do we need to use |
It is not a question of IL -> WASM emission, but WASM -> native code emission. A WASM atomic load is sequentially consistent, like |
Sequentially consistent load does not require full CompareExchange (in C/C++ definition of sequential consistency at last): https://godbolt.org/z/YqGGY4zK3 . Is wasm different for some reason? |
No, my bad. But it will still require a stronger barrier: https://godbolt.org/z/GaET7znrP. Another question I forgot about initially is the "process-wide barrier" that is used in a few places (in suspension, at least). WASM doesn't have an equivalent. |
Process-wide barrier allows Preemptive<->Cooperative transitions without any memory barriers. If the process-wide barrier is not available, Preemptive<->Cooperative transition will have to have memory barriers. The current thread suspension algorithm works like this:
In the absence of process-wide barrier, I do not see a way how to make this algorithm thread safe by adding memory barriers around reads of
|
Note that the full interruptibility check was generalized into a simple cycle check in dotnet/runtime#95299. It should not be hard to adapt it to break all cycles it can find by placing GC polls into any block in those cycles. The code deleted in dotnet/runtime#42664 looks unnecessarily complicated to me. |
I'd like to move threads forward a little bit. We need to insert calls to
RhpGcPoll
(currently a NOP) into the generated code, I think we are going to do this when making managed calls, is that enough places? Are we concerned with a method that has a long running loop that makes no calls?Secondly, should we be doing this in
llvmlower.cpp
or infgInsertGCPolls
or... ?Thanks.
The text was updated successfully, but these errors were encountered: