-
-
Notifications
You must be signed in to change notification settings - Fork 5.7k
WIP/RFC: Add await mechanism
#58532
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
WIP/RFC: Add await mechanism
#58532
Conversation
# Introduction
This PR adds a new control flow mechanism called `await`. In this PR,
it is only exposed by the macro `@Base.Experimental.oc_await`, which
has the following docstring:
```
Base.Experimental.@oc_await [argt] [C->retblock]
Capture the current function's execution context for later resumption.
By default, immediately returns to the caller, returning an `OpaqueClosure` that
may be invoked to continue execution. If the optional `C->retblock` argument is
provided, `reblock` is executed in the context of the current function, with the
continuation bound to `C`. If `argt` is provided, the continuation will further
expect arguments `argt` to be provided when invoked.
```
Adding a feature like this was part of the original design of opaque closures,
but was never fully implemented for lack of immediate need. There are serveral
ways to think of this feature:
1. As an alternative representation for `:new_opaque_closure` that is
more friendly to other optimization passes.
2. An an implementation of a particular kind of delimited contiuation
3. As an implementation of C++20-style coroutines
The key important feature of this design is that the decision of which
values go in the capture list/residual/etc. is deferred all the way through
to the last possible moment in LLVM. As a result, all the ordinary optimizations
(DCE, SROA, various AD transforms, etc.) can be applied as usual across the
suspension boundary.
## Implementation status
As of the writing of this commit message, the implementation is minimal.
I've added lowering support, and the new IR node type, as well as support
in the interpreter to play with the semantics. However, there is no compiler
support or optimization support yet (so you need to run `julia --compile=min`
to play with it).
Semantic TODO:
- [ ] How does this mix with try/catch
- [ ] Does `await` capture other task-bound state,
- [ ] `scope` (yes?)
- [ ] locks? (no?)
- [ ] timing? (no?)
- [ ] rng? (no?)
Representational TODO:
- [ ] How does `argt` get represented in the continuation
Inference TODO:
- [ ] Implement AwaitNode inference support
Codegen TODO:
- [ ] Define and implement `julia.coro` intrinsics to lower this to
- [ ] Implement the appropriate lowering
Runtime TODO:
- [ ] Allow the OC captures to be allocated inline with a GC descriptor for pointers
## Detailed semantic discussion
### General semantic details
There are some semantic/similarities with try catch (in that they're both kinds of
continuations). However, the semantics are quite different:
1. Try/catch always jumps up the stack, `await` makes no assumptions (but copies
the state of the topmost stackframe, so there are two independent copies of it).
2. `await` is always delimited by `return` (which terminates the continuation).
3. `await` is multi-shot. However, I think single-shot is useful, so there is a
currently unused `flags` argument that might be used to ask for a single-shot
continuation.
### Syntax level
This adds a new syntax form `(symbolicawait continue_at argt flags)`.
`continue_at` is a label name created with `symboliclabel`. The semantics
are that the execution of `symbolicawait` captures all local slots and
ssa values and returns an opaque closure that, when-called, restores
all local slots and ssa values and resumes the execution at the label `continue_at`.
Regular execution continues as usual at the next statement after `symbolicawait`.
Modifications to slots (or ssavalues) after `symbolicawait` do not affect
the value of said slots/ssavlues in the continuation.
### IR level
This adds a new `AwaitNode`. It is in some ways similar structurally to
`EnterNode` in that it has a non-local successors, that may later be jumped to.
The non-local succsesor in both `AwaitNode` (i.e. the continuation) and `EnterNode`
(i.e. the catch block), is a statement/bb index integer inside the struct. However,
there are also some differences:
1. AwaitNode is always delimited by `ReturnNode`, there are no equivalent `:leave` or
`:pop_exception` statements.
2. `AwaitNode` returns a regular value (an opaque closure) not a token. `AwaitNode`
may be DCE'd if there are no uses.
### LLVM level [unimplemented]
The rough plan is to implement something similar to `llvm.coro`, although we cannot
use it directly, since we need special handling for our GC-tracked pointers. However,
we may be able to borrow some code.
## Potential users
I have the following potential use cases in mind immediately, although the
mechanism is of course quite general.
In Base:
1. `Task`
2. The futures mechanism in `Compiler`
In downstream packages:
1. The carried residual in reverse-mode AD packages like Diffractor or Enzyme
(I have no direct insight into Enzyme, but since the plan is to expose this
down to the LLVM level, I imagine it could use it).
2. Carried state between torn partitions in DAECompiler.
|
Summarizing some design questions from this morning: 1. Isn't this too many allocations?Q: In the generator use case the PR as is would allocate one OC per iteration. Isn't this too many? A: Yes, it's too many. My proposal is to have a The optimizer propagates the necessary information backwards to allocate the state and the old state is dead on entry (after the oneshot check) and can be re-used. This reduces the total number of allocations to 1. To get to zero, we'd need a way to query LLVM for the size in the callee so that we can allocate an appropriate stack size. We do not currently have such a facility, but there would be several clients for this, so we should discuss it spearately. 2. Which scope does
|
I'm inclined to say it's disallowed inside try/catch(/finally) for the time being. I think it would be surprising if: Didn't catch exceptions from print twice? Probably not. I think it would be a reasonable semantics for the try/catch to last until an exit from the try/catch region on the |
Per discussion this mornings, there are snakes down the path of compile time queries of optimizer properties (which I'll do a separate writeup on). Proposed fix is the following: It's a little bit more complicated because of the magic |
|
FWIW, I'm very interested in this and hope it could still be made to happen. Just about everything that (mis-)uses |
Introduction
This PR adds a new control flow mechanism called
await. In this PR, it is only exposed by the macro@Base.Experimental.oc_await, which has the following docstring:Adding a feature like this was part of the original design of opaque closures, but was never fully implemented for lack of immediate need. There are serveral ways to think of this feature:
:new_opaque_closurethat is more friendly to other optimization passes.The key important feature of this design is that the decision of which values go in the capture list/residual/etc. is deferred all the way through to the last possible moment in LLVM. As a result, all the ordinary optimizations (DCE, SROA, various AD transforms, etc.) can be applied as usual across the suspension boundary.
Implementation status
As of the writing of this commit message, the implementation is minimal. I've added lowering support, and the new IR node type, as well as support in the interpreter to play with the semantics. However, there is no compiler support or optimization support yet (so you need to run
julia --compile=minto play with it).Semantic TODO:
awaitcapture other task-bound state,scope(yes?)Representational TODO:
argtget represented in the continuationInference TODO:
Codegen TODO:
julia.corointrinsics to lower this toRuntime TODO:
Detailed semantic discussion
General semantic details
There are some semantic/similarities with try catch (in that they're both kinds of continuations). However, the semantics are quite different:
awaitmakes no assumptions (but copies the state of the topmost stackframe, so there are two independent copies of it).awaitis always delimited byreturn(which terminates the continuation).awaitis multi-shot. However, I think single-shot is useful, so there is a currently unusedflagsargument that might be used to ask for a single-shot continuation.Syntax level
This adds a new syntax form
(symbolicawait continue_at argt flags).continue_atis a label name created withsymboliclabel. The semantics are that the execution ofsymbolicawaitcaptures all local slots and ssa values and returns an opaque closure that, when-called, restores all local slots and ssa values and resumes the execution at the labelcontinue_at. Regular execution continues as usual at the next statement aftersymbolicawait. Modifications to slots (or ssavalues) aftersymbolicawaitdo not affect the value of said slots/ssavlues in the continuation.IR level
This adds a new
AwaitNode. It is in some ways similar structurally toEnterNodein that it has a non-local successors, that may later be jumped to. The non-local succsesor in bothAwaitNode(i.e. the continuation) andEnterNode(i.e. the catch block), is a statement/bb index integer inside the struct. However, there are also some differences:ReturnNode, there are no equivalent:leaveor:pop_exceptionstatements.AwaitNodereturns a regular value (an opaque closure) not a token.AwaitNodemay be DCE'd if there are no uses.LLVM level [unimplemented]
The rough plan is to implement something similar to
llvm.coro, although we cannot use it directly, since we need special handling for our GC-tracked pointers. However, we may be able to borrow some code.Potential users
I have the following potential use cases in mind immediately, although the mechanism is of course quite general.
In Base:
TaskCompilerIn downstream packages: