WIP/RFC: Add `await` mechanism #58532

Keno · 2025-05-27T04:47:32Z

Introduction

This PR adds a new control flow mechanism called await. In this PR, it is only exposed by the macro @Base.Experimental.oc_await, which has the following docstring:

    Base.Experimental.@oc_await [argt] [C->retblock]

Capture the current function's execution context for later resumption.
By default, immediately returns to the caller, returning an `OpaqueClosure` that
may be invoked to continue execution. If the optional `C->retblock` argument is
provided, `reblock` is executed in the context of the current function, with the
continuation bound to `C`. If `argt` is provided, the continuation will further
expect arguments `argt` to be provided when invoked.

Adding a feature like this was part of the original design of opaque closures, but was never fully implemented for lack of immediate need. There are serveral ways to think of this feature:

As an alternative representation for :new_opaque_closure that is more friendly to other optimization passes.
An an implementation of a particular kind of delimited contiuation
As an implementation of C++20-style coroutines

The key important feature of this design is that the decision of which values go in the capture list/residual/etc. is deferred all the way through to the last possible moment in LLVM. As a result, all the ordinary optimizations (DCE, SROA, various AD transforms, etc.) can be applied as usual across the suspension boundary.

Implementation status

As of the writing of this commit message, the implementation is minimal. I've added lowering support, and the new IR node type, as well as support in the interpreter to play with the semantics. However, there is no compiler support or optimization support yet (so you need to run julia --compile=min to play with it).

Semantic TODO:

Representational TODO:

How does argt get represented in the continuation

Inference TODO:

Implement AwaitNode inference support

Codegen TODO:

Define and implement julia.coro intrinsics to lower this to
Implement the appropriate lowering

Runtime TODO:

Allow the OC captures to be allocated inline with a GC descriptor for pointers

Detailed semantic discussion

General semantic details

There are some semantic/similarities with try catch (in that they're both kinds of continuations). However, the semantics are quite different:

Try/catch always jumps up the stack, await makes no assumptions (but copies the state of the topmost stackframe, so there are two independent copies of it).
await is always delimited by return (which terminates the continuation).
await is multi-shot. However, I think single-shot is useful, so there is a currently unused flags argument that might be used to ask for a single-shot continuation.

Syntax level

This adds a new syntax form (symbolicawait continue_at argt flags). continue_at is a label name created with symboliclabel. The semantics are that the execution of symbolicawait captures all local slots and ssa values and returns an opaque closure that, when-called, restores all local slots and ssa values and resumes the execution at the label continue_at. Regular execution continues as usual at the next statement after symbolicawait. Modifications to slots (or ssavalues) after symbolicawait do not affect the value of said slots/ssavlues in the continuation.

IR level

This adds a new AwaitNode. It is in some ways similar structurally to EnterNode in that it has a non-local successors, that may later be jumped to. The non-local succsesor in both AwaitNode (i.e. the continuation) and EnterNode (i.e. the catch block), is a statement/bb index integer inside the struct. However, there are also some differences:

AwaitNode is always delimited by ReturnNode, there are no equivalent :leave or :pop_exception statements.
AwaitNode returns a regular value (an opaque closure) not a token. AwaitNode may be DCE'd if there are no uses.

LLVM level [unimplemented]

The rough plan is to implement something similar to llvm.coro, although we cannot use it directly, since we need special handling for our GC-tracked pointers. However, we may be able to borrow some code.

Potential users

I have the following potential use cases in mind immediately, although the mechanism is of course quite general.

In Base:

Task
The futures mechanism in Compiler

In downstream packages:

The carried residual in reverse-mode AD packages like Diffractor or Enzyme (I have no direct insight into Enzyme, but since the plan is to expose this down to the LLVM level, I imagine it could use it).
Carried state between torn partitions in DAECompiler.
A faster, more reliable implementation of ResumableFunctions.jl

# Introduction This PR adds a new control flow mechanism called `await`. In this PR, it is only exposed by the macro `@Base.Experimental.oc_await`, which has the following docstring: ``` Base.Experimental.@oc_await [argt] [C->retblock] Capture the current function's execution context for later resumption. By default, immediately returns to the caller, returning an `OpaqueClosure` that may be invoked to continue execution. If the optional `C->retblock` argument is provided, `reblock` is executed in the context of the current function, with the continuation bound to `C`. If `argt` is provided, the continuation will further expect arguments `argt` to be provided when invoked. ``` Adding a feature like this was part of the original design of opaque closures, but was never fully implemented for lack of immediate need. There are serveral ways to think of this feature: 1. As an alternative representation for `:new_opaque_closure` that is more friendly to other optimization passes. 2. An an implementation of a particular kind of delimited contiuation 3. As an implementation of C++20-style coroutines The key important feature of this design is that the decision of which values go in the capture list/residual/etc. is deferred all the way through to the last possible moment in LLVM. As a result, all the ordinary optimizations (DCE, SROA, various AD transforms, etc.) can be applied as usual across the suspension boundary. ## Implementation status As of the writing of this commit message, the implementation is minimal. I've added lowering support, and the new IR node type, as well as support in the interpreter to play with the semantics. However, there is no compiler support or optimization support yet (so you need to run `julia --compile=min` to play with it). Semantic TODO: - [ ] How does this mix with try/catch - [ ] Does `await` capture other task-bound state, - [ ] `scope` (yes?) - [ ] locks? (no?) - [ ] timing? (no?) - [ ] rng? (no?) Representational TODO: - [ ] How does `argt` get represented in the continuation Inference TODO: - [ ] Implement AwaitNode inference support Codegen TODO: - [ ] Define and implement `julia.coro` intrinsics to lower this to - [ ] Implement the appropriate lowering Runtime TODO: - [ ] Allow the OC captures to be allocated inline with a GC descriptor for pointers ## Detailed semantic discussion ### General semantic details There are some semantic/similarities with try catch (in that they're both kinds of continuations). However, the semantics are quite different: 1. Try/catch always jumps up the stack, `await` makes no assumptions (but copies the state of the topmost stackframe, so there are two independent copies of it). 2. `await` is always delimited by `return` (which terminates the continuation). 3. `await` is multi-shot. However, I think single-shot is useful, so there is a currently unused `flags` argument that might be used to ask for a single-shot continuation. ### Syntax level This adds a new syntax form `(symbolicawait continue_at argt flags)`. `continue_at` is a label name created with `symboliclabel`. The semantics are that the execution of `symbolicawait` captures all local slots and ssa values and returns an opaque closure that, when-called, restores all local slots and ssa values and resumes the execution at the label `continue_at`. Regular execution continues as usual at the next statement after `symbolicawait`. Modifications to slots (or ssavalues) after `symbolicawait` do not affect the value of said slots/ssavlues in the continuation. ### IR level This adds a new `AwaitNode`. It is in some ways similar structurally to `EnterNode` in that it has a non-local successors, that may later be jumped to. The non-local succsesor in both `AwaitNode` (i.e. the continuation) and `EnterNode` (i.e. the catch block), is a statement/bb index integer inside the struct. However, there are also some differences: 1. AwaitNode is always delimited by `ReturnNode`, there are no equivalent `:leave` or `:pop_exception` statements. 2. `AwaitNode` returns a regular value (an opaque closure) not a token. `AwaitNode` may be DCE'd if there are no uses. ### LLVM level [unimplemented] The rough plan is to implement something similar to `llvm.coro`, although we cannot use it directly, since we need special handling for our GC-tracked pointers. However, we may be able to borrow some code. ## Potential users I have the following potential use cases in mind immediately, although the mechanism is of course quite general. In Base: 1. `Task` 2. The futures mechanism in `Compiler` In downstream packages: 1. The carried residual in reverse-mode AD packages like Diffractor or Enzyme (I have no direct insight into Enzyme, but since the plan is to expose this down to the LLVM level, I imagine it could use it). 2. Carried state between torn partitions in DAECompiler.

Keno · 2025-05-27T22:12:44Z

Summarizing some design questions from this morning:

1. Isn't this too many allocations?

Q: In the generator use case

function foo()
    i = 0
    while true
        @yield i
        i += 1
    end
end

the PR as is would allocate one OC per iteration. Isn't this too many?

A: Yes, it's too many. My proposal is to have a prealloc_await_state() intrinsic that's used (in combination with one-shot await) like:

function foo()
    i = 0
    state = prealloc_await_state()
    while true
        oc = await(state, AWAIT_ONE_SHOT) continue at #cont
        return (oc, i)
        @label cont
        i += 1
    end
end

The optimizer propagates the necessary information backwards to allocate the state and the old state is dead on entry (after the oneshot check) and can be re-used. This reduces the total number of allocations to 1. To get to zero, we'd need a way to query LLVM for the size in the callee so that we can allocate an appropriate stack size. We do not currently have such a facility, but there would be several clients for this, so we should discuss it spearately.

2. Which scope does `return` return from?

In

#1 %1 = await Tuple{Int64} resuming #2
   %2 = (%1)(1)
   %3 = add_int(10, %2)
        return %3

#2   return 100

Does this return 100 or 110?

A: return returns from the most recent invocation, so this returns 110.

3. What are the slot capture semantics?

In

x = 1
f = $(Expr(:await, :cont))
f()
return x
@label cont
x = 2
return

Does the original invocation return 1 or 2?

A: Slot state is forked at the await point. The original invocation returns 1.

4. Does AwaitNode need the rest of the new_opaque_closure arguments

I think probably yes on isva and nargs (Although isva could be folded into flags). For the rt lb/ub. I was thinking of splitting into a new intrinsic:

constrain_opaque_closure(oc, lb, ub)

and moving the arguments out of new_opaque_closure also.

Keno · 2025-05-27T22:17:28Z

How does this mix with try/catch

I'm inclined to say it's disallowed inside try/catch(/finally) for the time being. I think it would be surprising if:

try
a()
@await
b()
catch
end

Didn't catch exceptions from b. Also, should:

try
a()
@await
b()
finally
println("Finally")
end

print twice? Probably not.

I think it would be a reasonable semantics for the try/catch to last until an exit from the try/catch region on the continue side of the await, but that can be arbitrarily later of course (so the finally needs to be registered as a a finalizer on the OpaqueClosure)? I think this is too complicated for unclear benefit at this time, so my inclination is to disallow this.

Keno · 2025-05-28T17:19:28Z

To get to zero, we'd need a way to query LLVM for the size in the callee so that we can allocate an appropriate stack size.

Per discussion this mornings, there are snakes down the path of compile time queries of optimizer properties (which I'll do a separate writeup on). Proposed fix is the following:

struct AwaitBuffer
     inuse::Bool #= probably sunk into one of the following integers =#
     npointers::UInt
     nbytes::UInt
     #= [ npointers * Any ] =#
     #= [ nbytes * UInt8 ] =#
end

@noinline function foo()
    size = await_size()
    oc = await(nothing, Tuple{AwaitBuffer}, AWAIT_NO_STATE)
    return (size, oc)
    state = $(Expr(:await_acquire, Argument(1), size))
    i = 0
    while true
        oc = await(state, Tuple{}, AWAIT_ONE_SHOT) continue at #cont
        return (i, oc)
        @label cont
        i += 1
    end
end

@inline function iterate_resumable(f)
    (size, oc) = f()
    oc(AwaitBuffer(size))
end
iterate_resumable(f, oc) = oc()

It's a little bit more complicated because of the magic await_acquire, but I think it does what's needed.

MasonProtter · 2025-11-30T12:46:14Z

FWIW, I'm very interested in this and hope it could still be made to happen. Just about everything that (mis-)uses try/catch as a non-local control-flow construct should really be using this sort of mechanism, and potentially even a lot of 'legitimate' uses of try/catch could be replaced with it.

Keno force-pushed the kf/awaitnode branch from 76ecb3e to e72eb6d Compare May 27, 2025 05:03

Keno mentioned this pull request May 29, 2025

Decide on usability of non-IPO information of directly invoked CodeInstances #58556

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

WIP/RFC: Add `await` mechanism #58532

WIP/RFC: Add `await` mechanism #58532

Keno commented May 27, 2025 •

edited

Loading

Uh oh!

Keno commented May 27, 2025 •

edited

Loading

Uh oh!

Keno commented May 27, 2025 •

edited

Loading

Uh oh!

Keno commented May 28, 2025

Uh oh!

MasonProtter commented Nov 30, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

WIP/RFC: Add await mechanism #58532

Are you sure you want to change the base?

WIP/RFC: Add await mechanism #58532

Conversation

Keno commented May 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Introduction

Implementation status

Detailed semantic discussion

General semantic details

Syntax level

IR level

LLVM level [unimplemented]

Potential users

Uh oh!

Keno commented May 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

1. Isn't this too many allocations?

2. Which scope does return return from?

3. What are the slot capture semantics?

4. Does AwaitNode need the rest of the new_opaque_closure arguments

Uh oh!

Keno commented May 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Keno commented May 28, 2025

Uh oh!

MasonProtter commented Nov 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

WIP/RFC: Add `await` mechanism #58532

WIP/RFC: Add `await` mechanism #58532

Keno commented May 27, 2025 •

edited

Loading

Keno commented May 27, 2025 •

edited

Loading

2. Which scope does `return` return from?

Keno commented May 27, 2025 •

edited

Loading

MasonProtter commented Nov 30, 2025 •

edited

Loading