When WebAssembly was first introduced to the world (the "MVP" of WebAssembly) functions could have any number of parameters but could have at most one result. Recently, in the last year, WebAssembly has stabilized the multi-value proposal which means that functions are now capable of returning multiple return values. While purportedly supported by all modern engines adoption of multiple returns has not yet reached C/Rust.
This feature of returning multiple values is not the default everywhere and notably is not on by default in LLVM (but there is an LLVM +multivalue
feature for it). Additionally the original C ABI for WebAssembly was defined without multi-value support. There has been a small amount of dicussion about a new C ABI which leverages multiple return values, but I'm at least not personally aware of much progress on this front or what will be happening here in the future.
Another important piece of background information (which is orthogonal to the multi-value above) is that Rust's wasm32-unknown-unknown
target does not implement the same C ABI that clang implements. When the wasm32-unknown-unknown
target was first introduced I didn't really understand what I was doing and then accidentally started relying on what was implemented. The disagreement between the wasm32-unknown-unknown
ABI and the clang definition of an ABI largely boils down to how aggregates are passed as parameters. For example:
#[repr(C)]
pub struct A(i32, i32);
#[no_mangle]
pub extern fn foo(_: A) {}
will differ on wasm32-unknown-unknown
and other wasm targets:
$ rustc foo.rs --crate-type cdylib --target wasm32-unknown-unknown
$ wasm2wat foo.wasm | grep foo
(func $foo (type 0) (param i32 i32)
(export "foo" (func $foo))
$ rustc foo.rs --crate-type cdylib --target wasm32-wasi
$ wasm2wat foo.wasm | grep foo
(func $foo (type 0) (param i32)
(export "foo" (func $foo))
Notably here the wasm32-unknown-unknown
target passes A
as two i32
parameters, where the wasm32-wasi
target (and other wasm targets like wasm32-unknown-emscripten
) will pass A
by reference indirectly.
The consequence of this ABI mismatch is that it is more difficult to mix C and Rust code in the same wasm project. The same signature won't agree when languages call each other which can produce linking errors. The reason that this has not been fixed is that the wasm-bindgen
project relies on the behavior of wasm32-unknown-unknown
and is not easily fixed.
It's expected that at some point in the future the canonical definition of the C ABI will be updated to incorporate multiple return-values into it. Users of C/C++ will, presumably, have the desire to write functions that have multiple return values and this'll get implemented. Some uncertainty, though, about this still remains:
- It's not clear when this will happen. I don't really follow the C side of things too too closely and it's mostly Googlers who have been maintaining and evolving LLVM and the impression I get is that this isn't super high on their priority list.
- It's not clear what a new C ABI might look like. Historically decisions in clang (I believe) have been evaluated by benchmarking Emscripten output in v8. Given this it's not obvious (to me at least) that a new C ABI will suit the needs of all wasm users if it's driven by performance rather than expressibility.
Overall I am personally pretty uncertain about what will happen with the C ABI in the future. The resolution of a "wasm" ABI for Rust might be "Alex just go talk to people and figure out what C will do first".
If, for a moment, we assume that the C ABI will either move too slowly or will not suit Rust's wasm needs in the future, then this leaves an expressibility gap that needs to be filled. This was the original motivation for the wasm
ABI in Rust. Notably:
- The "wasm" ABI would be defined exclusively in terms of WebAssembly signatures. There would be a definition from types in Rust and how they would appear in the WebAssembly function signature when the function is compiled (both for multiple parameters and multiple results)
- The "wasm" ABI would enable switching the
wasm32-unknown-unknown
target to match C's default ABI without breaking projects likewasm-bindgen
. Thewasm-bindgen
project would migrate to use "wasm" for its purposes and then by switching the target's defaults it would hopefully fix more than it would break in the ecosystem.
As mentioned above in the "future of extern "C"
", however, this motivation is all moot if the C ABI magically changes tomorrow to incorporate multiple returns as well as changes to suit wasm-bindgen
.
The actual literal definition of extern "wasm"
is a bit up in the air. Right now it's a bit hand-wavy and simply says "what if all values were splatted into their components and then that was all passed by value". This means that Rust would still have to define things like what the ABI for u128
is, and the definition would be Rust-specific. It's expected that this could largely lift from the existing ABI for C in terms of definition but would likely need Rust-specific bits as well.
Another unclear aspect of a hypothetical "wasm" ABI would be whether any other language would ever want to use it. C users might be content in waiting for whatever the multi-result support looks like in C, and other languages may just match that rather than doing something specific like Rust.
-
What makes wasm-bindgen reliant on the existing C ABI?
- Wasm-bindgen is all about communication between JS
- JS and tool need to agree
- The way this works today:
- There is a macro that lowers Rust types down to simple types that JS understands
- Code assumes that we can use traits to map rust types to simpler Rust types that the ABI natively understands
- Changing the compiler will mean that the bindgen version and the compiler version must be tightly coordinated
- Why can't you use
#[cfg(version)]
?- Because the JS code is produced by the CLI
- The wasm-bindgen CLI would have to understand what version of rust made the binary
- Goal: to have a version of Rust that supports both old and new so that we can make the transition happen
-
Rough plan:
- We would make wasm-bindgen use extern "wasm" once it hits stable; this would only be compatible with newer compilers, but you'd get a nice clean error.
-
Would you change back to the C ABI later, once it is updated to match clang?
- No, clang's behavior is not desirable.
-
Regarding breaking the ecosystem:
- only Alex cares ;)
- changing the C ABI would be a bug fix
-
pnkfelix: To be concrete, is the proposal that the code example from the top would change to this:
#[repr(C)] pub struct A(i32, i32); #[no_mangle] pub extern "wasm" fn foo(_: A) {}
in order to get the same effect for the wasm32-unknown-unknown
target?
Will the compiler actually then have the same output on both wasm32-unknown-unknown
and wasm32-wasi
, both with extern "wasm"
and the way it was written up above? Or will wasm32-wasi
just not support extern "wasm"?
- Yes! That would be how the ABI works.
- Felix: Isn't that what extern C would do?
- Alex: clang passes all structs by reference
- Alex: but rustc presently does splat for wasm32-unknown-unknown
- every other wasm target matches clang (e.g., wasm32-emscripten and friends)
- Precise proposal:
- "C" would always match clang
- "wasm" would do the thing Alex wants (splat, multiple return values)
- pnkfelix: I’ll claim ignorance: Can you summarize different purposes of wasm32-unknown-unknown vs wasm32-wasi targets?
- Only difference is the stdlib
- unknown-unknown ignores prints, returns errors, etc
- wasm32-wasi is built against the WASI APIs for interacting with the "operating system"
- Josh: Is the alternative of a different target (that changes
extern "C"
) worth exploring? That would allow one Rust version to support both ABIs, as well. And that would have different stability guarantees, in that we can deprecate targets.- e.g.,
wasmwasm-unknown-unknown
could make "C" use the proposed "wasm" abi - Alex: Disadvantage would be that C and Rust would not agree
- So projects that want to intermix code generated by clang and Rust would be unable to do so
- Josh: So goal is not just to be able to have both ABIs be around for a transition
- It's that we want both ABIs for interop between C and wasm-bindgen-generated code
- e.g.,
- Josh: Do we expect that
extern "wasm"
can continue to evolve with WebAssembly as WebAssembly gains more features, or will it become static as people use it? For instance, if WebAssembly gains native support for passingu128
, can we evolveextern "wasm"
safely over time, without breaking existing users? How can we make sureextern "wasm"
doesn't become stale just asextern "C"
has?- extern wasm ABI could be limited to things that translate directly to WASM, along with aggregates
- things like u128 for example would not be supported
- this way, when extern wasm adds u128, we can add support for those
- if we were to stabilize extern wasm, this would be a conservative and suitable approach
- scottmcm: like the ctypes lint, but different rules, hard error?
- alex: yes.
- josh: so you can add new items when wasm/Rust/wasm-bindgen agree?
- Alex: new items would probably not be added unless there is no future path to support it
- e.g., we might permit
u8
because wasm will never add a type for that - but we would avoid
u128
since wasm might do it
- niko: and you would expect rust users to map e.g.
u128
tou64
- alex: yes. but aggregates would work.
- cramertj: you'd be checking the field types of aggregates too, then
- alex: yes. I imagine we'd require
repr(C)
, though we could addrepr(wasm)
- cramertj:
repr(wasm)
would be good because it's an add'l semver requirement that one cannot add non-wasm-enabled types to your type - alex: field order matters, so
repr(C)
makes sense, but I'm not sure ifrepr(wasm)
is needed; we would be checking generics post-monomorphization - cramertj: my point is that changing a "C" struct would cause wasm things to just plain stop compiling, since you're using a type that is not wasm-compatible
- alex: that's true. If you have e.g. a generic function that has a WASM ABI, then some instantiations will error out, but others won't. That's funky.
- alex: yes. I imagine we'd require
- Niko: is it an accurate summary that the motivation to add extern "wasm" is:
- extern "C" is unlikely to be updated on a reasonable timeframe
- we can migrate folks over gradually to "extern wasm" and then modify or deprecate the old behavior of the C ABI to match clang
- Alex: I don't really know how likely it is for "C" to change
- pnkfelix: Even if it did change, would it change how we think is best?
- Alex: unclear. But if C does have an ABI that supports multi-value, wasm-bindgen should probably match it eventually.
- Taylor: what types would we expect to see allowed in the signatures of
extern "wasm"
functions?#[repr(C)]
types?#[repr(wasm)
types? Is there potential for mismatch of aggregate representations for any#[repr(C)]
types?- already answered
- Taylor: what level of documentation and stability guarantees do we want to offer for our
extern "wasm"
ABI? Is it "something that magically works with bindgen", and bindgen is lifted to some kind of "priveleged" library? Or do we provide a full specification of the ABI that other folks could rely on?- Alex: The goal of extern wasm is that, given a Rust signature, you can precisely predict the wasm signature. Therefore I would expect us to document this.
- Alex: This would be Rust specific.
- cramertj: so e.g. someone else could write code against our definition of extern wasm and we would expect it to work going forward?
- Alex: Yes. This is why it's useful to wasm-bindgen.
- pnkfelix: to clarify, things like
u8
would be supported- but some things like
u128
that may be supported in the future would be rejected? - alex: maybe. maybe that doesn't work.
- alex: goal is still to have most Rust code generally work, hence the support for
u8
. Things likeu128
are much more unusual.
- but some things like
- Mark: Are we expecting "C" and "wasm" to eventually become synonymous? Will there be repr(wasm) of some kind on structs, alongside lints?
- Answered. no.
- Josh: Will
extern "wasm"
exist on other targets? Might it make sense to use it as a "portable safe ABI" in the future, to handle things like slices and strings safely?- No.
- Josh: I can imagine that people might want to have two different targets where the only difference is the ABI. That's kind of a pain.
- Alex: I've done this with macros, but that would be a downside.
- Josh: we should document the expected pattern.
- Scott: Is there a use for making this be
extern "splatted"
and available on things other than WASM?- Alex: I don't want to be the one to add a new ABI just because I want splatting... but if it really is useful for others, we could do that.
- Mark: the predominant use case is bridging javascript, right? how often would you want this for functions that might be C on other platforms?
- Alex: Well, if interface types makes progress, then you might define native plugins using it, and the same ABI you're using for wasm might be used for native modules too. So you might want the same function to be compiled both to wasm and to native code.
- Alex: so e.g. you might want to write a plugin that can be compiled to either wasm or to C ABI. But if we make it an error to use wasm on other targets to start
- Emmanuel: Will the C ABI and wasm ABI merge in the future (like when the C ABI gets support for multiple returns)?
- Discussed, but likely no.
- If they do merge we can get rid of wasm.
- Multiple returns: second motivation for the wasm ABI, unsupported by C
- Would be a way for us to roll out multiple return support in Rust w/o new targets, ABIs, etc
- We have arm targets for "softfloat" and "hardfloat" for example, because whole world has to agree
- But 'wasm' and 'wasm-multivalue' doesn't make sense as everyone will eventually use the latter
- It's hard to know how clang will approach this; they might add a codegen flag
- pnkfelix: isn't the analogous situation to soft/hard-float what version of the VM you are using?
- Alex: yes, it's just that we expect all VMs to support multivalue return
- e.g., if we bind WASI and WASI has functions that return multiple things
- we would use wasm to match that
- Scott: when rust does multiple return values, what happens?
- Alex: if you return a Rust tuple in a "C" ABI, you get a warning
- LLVM supports it by returning an aggregate by value, which LLVM translates to a multivalue return
- We could say that we define returning a tuple like
-> (T, U)
is how we translate multivalue returns
- Josh: Alternative solutions?
- Alex: the main one is to push the C ABI to match what we want and support multivalue
- Mark: doesn't fit the desire for a simple translation that hard errors on weird stuff
- Alex: true. It gives us less flexibility when wasm evolves in the future.
- answer from the C side is "it breaks"
- by separating wasm and C, we insulate users from some amount of breakage
- Scott: Would there be a use for
repr(wasm)
to go with this? Are there any cases whererepr(C)
would have a different definition of which things are aggregates than what wasm would want? Would it make sense to define that to allowenum
s to be able to be passed, like we definedrepr(C, u8)
onenum
s?- Alex: maybe. Discussed previously.
- Sounds like it's more
repr(linear)
than needing anything particularly wasm-specific, if it's up to the lint/monomorphization anyway to catch the problems? - pnkfelix: without repr(wasm) we could get earlier errors...
- niko: ...but then if you have a struct that wants to pass over an FFI boundary you will have to know which target you're using
- pnkfelix: but with repr attribute you have more flexibility, not as hard, you can use
cfg_attr
- cramertj: it's kind of an extra set of requirements, a subset of C
- cramertj: I do like that
repr(wasm)
means libraries exposing types with private fields are promising to support wasm compilation- alex: it's already an ABI breaking change
- cramertj: right, but now it's a compilation error as well, so recompiling the world will cause problems