-
-
Notifications
You must be signed in to change notification settings - Fork 5.7k
Description
Disclaimer
This issue is a bit of a summary of some discussion I had with @vtjnash and @gbaraldi this morning. There are some semantic decisions to be made, but I don't think any of it is urgent, so the purpose of this issue is more to have a coherent place where the issue is written up. As usual, errors, omissions and representations are my own understanding and may not be accurate.
Overview
When LLVM performs optimization and code generation, it may learn more about the behavior of the piece of code that it generates. Examples of such pieces of information that may be of interest to us are:
- Various LLVM function attributes that may let us optimize better (e.g. memory access information; see also Add attribute inference passes to the LLVM pipeline. #58424)
- The (non-)presence of safepoints (gc state transition support in codegen #33097)
- Vectorization cost model / legality (Proposed semantics for implicit vectorization of primitives #56481)
- Escape analysis
- Task stack sizes (to make tiny tasks much faster)
- Cancellation safety (Very WIP: Architecture for robust cancellation #60281)
- The size of variable sized allocations (we don't have very many of these at the moment, but may in the future - also covers some of the concerns in WIP/RFC: Add
awaitmechanism #58532)
The general issue with all of these is that LLVM is not particularly careful about an issue we call "IPO safety" or "IPO soundness" (See llvm/llvm-project#27148 and linked discussions for the OG, although there were several later instances of similar issues as well). The general issue is that in an interprocedural setting you may only use attributes of the called functions, that must be true of all executions of the function (even if the function was optimized differently, unoptimized, ran in the interpreter, etc.). At the Julia IR level we perform extensive IPO, but are very careful to only infer IPO safe attributes.
Note that it is also in general fine to use non-IPO information to synthesize different execution in the current function (as long as you are past the point where you need to maintain IPO safety), as long as you fully replace the call. For example, if LLVM infers non-IPO-:consistent-cy for a particular optimized function, it would be legal to use that for constant folding - ref #49353).
However, the question we're asking here is somewhat different, namely: If we are in codegen.cpp and we're codegen'ing a specific :invoke by emitting a direct call to a specific piece of code (i.e. no dynamic dispatch., etc), can we peek at the call target and use non-IPO-safe information about it?
In practice this is actually two questions:
- Is is semantically legal for us to do this?
- Do we have any way to actually propagate this information?
Right now the answer to is essentially no. LLVM's OrcJIT has the abillity to do arbitrary interposition, which can be used for various things (delay compilation until first invocation, debug deoptimization, etc.). However, the only thing we are using this ability for at the moment is threaded compilation. For this use case, we definitely could impose an ordering requirement to address both of those issues, however, that would restrict us from using this feature for other things in the future.
There is some disagreement of whether such a restriction is a good idea or not. In general, @gbaraldi and I thought it was probably a good idea, @vtjnash was more skeptical. The argument against was that we'd generally like to do interposition (e.g. for debugging) at the julia level and re-codegen whole call trees rather than silently swapping out function pointers. However, there was general agreement that when generating code in contexts where the JIT won't exist at runtime (i.e. --trim), these optimizations should be legal.