Skip to content
Merged
Changes from 20 commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
220 changes: 220 additions & 0 deletions SetjmpLongjmp.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,220 @@
# C setjmp/longjmp in WebAssembly

## Overview

This document describes a convention to implement C setjmp/longjmp via
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that this convention is now the default for LLVM, right? I think this document should say that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's documented in the "Implementations" section.

[WebAssembly exception-handling proposal].

This document also briefly mentions another convention based on JavaScript
exceptions.

[WebAssembly exception-handling proposal]: https://github.com/WebAssembly/exception-handling

## Runtime ABI

### Linear memory structures

This convention uses a few structures on the WebAssembly linear memory.

#### Reserved area in jmp_buf

The first 6 words of C jmp_buf is reserved for the use by the runtime.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a note here making it explicit that the contents of these 6 words are not public?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok.

It should also have large enough alignment to store C pointers.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You use the word "word" in the previous sentence and "C pointer" in this sentence, but are we not talking about the same thing? IIUC normally when folks way "word" they mean "pointer sized thing", so should we just stick to using one or other in this document?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(or am I misunderstand and are you using pointer and word to mean different sized things)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You use the word "word" in the previous sentence and "C pointer" in this sentence, but are we not talking about the same thing? IIUC normally when folks way "word" they mean "pointer sized thing", so should we just stick to using one or other in this document?

i added explanation.

The contents of this area is private to the runtime implementation.

##### Notes about the size of reserved area in jmp_buf

Emscripten has been using 6 words. (`unsigned long [6]`)

GCC and Clang uses `intptr_t [5]` for their [setjmp/longjmp builtins].
It isn't relevant right now though, because LLVM's WebAssembly target
doesn't provide these builtins.

[setjmp/longjmp builtins]: https://gcc.gnu.org/onlinedocs/gcc/Nonlocal-Gotos.html

#### __WasmLongjmpArgs

An equivalent of the following structure is used to associate necessary
data to the WebAssembly exception.

```c
struct __WasmLongjmpArgs {
void *env; // a pointer to jmp_buf
int val;
};
```

The lifetime of this structure is rather short. It lives only during a
single longjmp execution.
A runtime can use a part of `jmp_buf` for this structure. It's also ok to use
a separate thread-local storage to place this structure. A runtime without
multi-threading support can simply place this structure in a global variable.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this __WasmLongjmpArgs struct exposed as part of the ABI between libc and the compiler? If it isn't, would it make sense to remove this section, or at least add a caveat that this section is non-normative?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's a part of the ABI. the compiler-generated code needs to know how to read members of this structure.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I misunderstood that part then.

In that case, I wonder if it would make sense to further simplify the ABI, from this:

      pop __WasmLongjmpArgs pointer from the operand stack
      $env = __WasmLongjmpArgs.env
      $val = __WasmLongjmpArgs.val
      $label = $__wasm_setjmp_test($env, $func_invocation_id)
      if ($label == 0) {
         ;; not for us. rethrow.
         call $__wasm_longjmp($env, $val)
      }

to this:

      $args = pop __WasmLongJmpArgs pointer from the operand stack
      $label = $__wasm_setjmp_test($args, $func_invocation_id)

doing the loading of $val and $env, as well as the if ($label == 0) and the __wasm_longjmp call while we're at it, all within the __wasm_setjmp_test call.

That way, we'd have less code inline. Would that make sense?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well, at least $val needs to be visible to the catching logic as it's the return value of setjmp()
while we can make __wasm_setjmp_test somehow return $val as well, it seems like the opposite (at least incompatible) direction from multivalue TODO, where we can make $env and $val direct parameters of the exception.
personally i'm ok with either ways.
@aheejin do you have any preference? (asking because i guess the multivalue TODO comment is yours.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As @yamt said, val needs to be accessible from the catching code, so I'm not sure what's the point of passing arg to __wasm_setjmp_test and making it return val. Also

if ($label == 0) {
  ;; not for us. rethrow.
  call $__wasm_longjmp($env, $val)
}

this part cannot go into __wasm_setjmp_test, no? It doesn't look like we can save a ton here, regardsless of whether we do multivalue or not.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this part cannot go into __wasm_setjmp_test, no?

i guess it can. why not?
__wasm_setjmp_test can just call __wasm_longjmp internally. or throw the exception directly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, yeah, you're right. Would you like to submit a PR to the LLVM repo doing this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, yeah, you're right. Would you like to submit a PR to the LLVM repo doing this?

do you only mean "make __wasm_setjmp_test rethrow"?
or what sunfishcode suggested?
or both?

i added them to the "Future directions" section for now.


### Exception

This convention uses a WebAssembly exception to perform a non-local jump
for C `longjmp`.

The name of exception is `__c_longjmp`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where does this name come from and what meaning does this name have?

We have this enum
https://github.com/llvm/llvm-project/blob/299b636a8f1c9cb2382f9dce4cdf6ec6330a79c6/llvm/include/llvm/CodeGen/WasmEHFuncInfo.h#L27
but this is just the name of an enum and doesn't have any meaning outside of LLVM. (Also the enum name is C_LONGJMP)

Should we instead say we assume the tag index 1 to be the longjmp index in C-based toolchain like LLVM?

Copy link
Contributor Author

@yamt yamt Apr 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where does this name come from and what meaning does this name have?

see https://github.com/llvm/llvm-project/blob/b7a93bc1f230fe01f38f3648437cee74f339c5ac/llvm/lib/Target/WebAssembly/WebAssemblyISelDAGToDAG.cpp#L110

We have this enum
https://github.com/llvm/llvm-project/blob/299b636a8f1c9cb2382f9dce4cdf6ec6330a79c6/llvm/include/llvm/CodeGen/WasmEHFuncInfo.h#L27
but this is just the name of an enum and doesn't have any meaning outside of LLVM. (Also the enum name is C_LONGJMP)

Should we instead say we assume the tag index 1 to be the longjmp index in C-based toolchain like LLVM?

the name is exposed to the outside for dynamic-linking modules.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The C symbol name used for the exception tag is ..

At the very least mention that it is a tag that we are talking about?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i improved exception vs tag distinction a bit.

The type of exception is `(param i32)`. (Or, `(param i64)` for [memory64])
The parameter of the exception is the address of `__WasmLongjmpArgs` structure
on the linear memory.

[memory64]: https://github.com/WebAssembly/memory64

### functions

```c
void __wasm_setjmp(jmp_buf env, uint32_t label, void *func_invocation_id);
uint32_t __wasm_setjmp_test(jmp_buf env, void *func_invocation_id);
void __wasm_longjmp(jmp_buf env, int val);
```

`__wasm_setjmp` records the necessary data in the `env` so that it can be
used by `__wasm_longjmp` later.
`label` is a non-zero identifier to distinguish setjmp call-sites within
the function. Note that a C function can contain multiple setjmp() calls.
`func_invocation_id` is the identifier to distinguish invocations of this
C function. Note that, when a C function which calls setjmp() is invoked
recursively, setjmp/longjmp needs to distinguish them.

`__wasm_setjmp_test` tests if the longjmp target belongs to the current
function invocation. if it does, this function returns the `label` value
saved by `__wasm_setjmp`. Otherwise, it returns 0.

`__wasm_longjmp` is similar to C `longjmp`.
If `val` is 0, it's `__wasm_longjmp`'s responsibility to convert it to 1.
It performs a long jump by filling a `__WasmLongjmpArgs` structure and
throwing `__c_longjmp` exception with its address.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about mentioning that we throw an exception with a tag C_LONGJMP? It can be hard to associate what this function does with the 'Exception' section above.

Also, given that most of this doc is not exclusive to Wasm EH, we can mention we have emscriptne_longjmp JS-based EH in which we throw a JS exception.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i added some explanation about JS exception based convention.
i'm not sure how far it should be explained in this doc though.


## Code conversion

The C compiler detects `setjmp` and `longjmp` calls in a program and
converts them into the corresponding WebAssembly exception-handling
instructions and calls to the above mentioned runtime ABI.

### functions calling setjmp()

On the function entry, the compiler would generate the logic to create
the identifier of this function invocation, typically by performing an
equivalent of `alloca(1)`. Note that the alloca size is not important
because the pointer is merely used as an identifier and never be dereferenced.

Also, the compiler converts C `setjmp` calls to `__wasm_setjmp` calls.

For each setjmp callsite, the compiler allocates non-zero identifier called
"label". The label value passed to `__wasm_setjmp` is recorded by the
runtime and returned by later `__wasm_setjmp_test` when processing a longjmp
to the corresponding jmp_buf.

Also, for code blocks which possibly call `longjmp` directly or indirectly,
the compiler generates instructions to catch and process the
`__c_longjmp` exception accordingly.

When catching the exception, the compiler-generated logic calls
`__wasm_setjmp_test` to see if the exception is for this invocation
of this function.
If it is, `__wasm_setjmp_test` returns the non-zero label value recorded by
the last `__wasm_setjmp` call for the jmp_buf. The compiler-generated logic
can use the label value to pretend a return from the corresponding setjmp.
Otherwise, `__wasm_setjmp_test` returns 0. In that case, the
compiler-generated logic should rethrow the exception by calling
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably it's also ok to rethrow with delegate/throw_ref?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, yes, I think we can rethrow (or throw_ref) it. Not sure whether we need delegate though. But given that it is discouraged to use the current rethrow instruction in LLVM because it is replaced by the new throw_ref, and the new throw_ref has not been implemented in LLVM yet, we can probably try this later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea, i meant rethrow, not delegate.

`__wasm_longjmp` so that it can be eventually caught by the right function.

For an example, a C function like this would be converted like
the following pseudo code.
```c
void f(void) {
jmp_buf env;
if (!setjmp(env)) {
might_call_longjmp(env);
}
}
```

```wat
$func_invocation_id = alloca(1)

;; 100 is a label generated by the compiler
call $__wasm_setjmp($env, 100, $func_invocation_id)

block
block (result i32)
try_table (catch $__c_longjmp 0)
call $might_call_longjmp
end
;; might_call_longjmp didn't call longjmp
br 1
end
;; might_call_longjmp called longjmp
pop __WasmLongjmpArgs pointer from the operand stack
$env = __WasmLongjmpArgs.env
$val = __WasmLongjmpArgs.val
$label = $__wasm_setjmp_test($env, $func_invocation_id)
if ($label == 0) {
;; not for us. rethrow.
call $__wasm_longjmp($env, $val)
}
;; ours.
;; somehow jump to the block corresponding to the $label
...
...
end
```

### longjmp calls

The compiler converts C `longjmp` calls to `__wasm_longjmp` calls.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, maybe we could mention emscripten_longjmp too


## Dynamic-linking consideration

In case of [dynamic-linking], it's the dynamic linker's responsibility
to provide the exception tag for this convention with the name
"env.__c_longjmp". Modules should import the tag so that cross-module
longjmp works.

[dynamic-linking]: DynamicLinking.md

## Emscripten JavaScript-based exceptions

Emscripten has a mode to use JavaScript-based exceptions instead of
WebAssembly exceptions. In that mode, `emscripten_longjmp` function,
which throws a JavaScript exception, is used instead of `__wasm_longjmp`.

```c
void emscripten_longjmp(uintptr_t env, int val);
```

The compiler translates C function calls which possibly ends up with
calling `longjmp` to indirect calls via a JavaScript wrapper which
catches the JavaScript exception.

## Implementations

* LLVM (19 and later) has a pass ([WebAssemblyLowerEmscriptenEHSjLj.cpp])
to perform the convertion mentioned above. It can be enabled with the
`-mllvm -wasm-enable-sjlj` option.

Note: as of writing this, LLVM produces a bit older version of
exception-handling instructions. (`try`, `delegate`, etc)
binaryen has a conversion from the old instructions to the latest
instructions. (`try_table` etc.)

* Emscripten (3.1.57 or later) has the runtime support ([emscripten_setjmp.c])
for the convention documented above.

* wasi-libc has the runtime support ([wasi-libc rt.c]) for the convention
documented above.

[WebAssemblyLowerEmscriptenEHSjLj.cpp]: https://github.com/llvm/llvm-project/blob/70deb7bfe90af91c68454b70683fbe98feaea87d/llvm/lib/Target/WebAssembly/WebAssemblyLowerEmscriptenEHSjLj.cpp
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
[WebAssemblyLowerEmscriptenEHSjLj.cpp]: https://github.com/llvm/llvm-project/blob/70deb7bfe90af91c68454b70683fbe98feaea87d/llvm/lib/Target/WebAssembly/WebAssemblyLowerEmscriptenEHSjLj.cpp
[WebAssemblyLowerEmscriptenEHSjLj.cpp]: https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/WebAssembly/WebAssemblyLowerEmscriptenEHSjLj.cpp

Not sure if this needs to be from a specific commit version, given that we are not specifying specific lines

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i prefer to always use permalink as the files can be renamed/removed in future.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in that case we should fix the link, because it means the doc is pointing to a wrong file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well, after all, what this doc refers to is today's version of the file.


[emscripten_setjmp.c]: https://github.com/emscripten-core/emscripten/blob/7d66497d96cdcffa394ad67d87f7118137edf9ab/system/lib/compiler-rt/emscripten_setjmp.c

[wasi-libc rt.c]: https://github.com/WebAssembly/wasi-libc/blob/d03829489904d38c624f6de9983190f1e5e7c9c5/libc-top-half/musl/src/setjmp/wasm32/rt.c

## Future directions

* `__WasmLongjmpArgs` can be replaced with WebAssembly multivalue.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly, this is just an internal implementation detail, right? If so, would it make sense to omit it in this document?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no. as mentioned above, it's a part of the current ABI. (thus making it use multivalue is unfortunately another ABI change.)


* If/When WebAssembly exception gets more ubiquitous, we might want to move
the runtime to compiler-rt.