-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improvements to the Emscripten / WebAssembly API #7649
Comments
Thanks for writing this up, @rianhunter! Also, your project is very exciting - incredible you can run nginx that way :)
Minor terminology note, those are wasm imports. I guess we can say the host exports them to the wasm, but it may confuse people.
Side note, that should not be needed any more (it helps emulate i64s, but wasm has them natively).
Agreed we should make this easy. I'm fine with a new metadata segment for this. Another option is that the dynamic linking section already has something similar, and we support that already when you build with In other words, if you use a dynamic library this should already be emitted. However, dynamic libraries have some downsides, like they are relocatable - they don't know where their static memory will be (which adds overhead; a dynamic library without relocation may be possible, though). Another issue is dynamic libraries do not link in system libraries (but we can add an option for that too in principle). But on the other hand dynamic libraries solve several of the other issues you raise, see comments below, so I do wonder if maybe that isn't the right direction (to improve dynamic linking towards what you want).
Interesting. I think the deeper issue here is that emscripten minifying the import names is a valid optimization only because it knows the JS + wasm combination it emits will be used together. So perhaps we should add an option that says the wasm will be loaded without the JS. If that flag is on, we'd disable optimizations like this one (and metadce). But that is already true for dynamic libraries, which we only emit wasm for.
Also something already handled by dynamic libraries, as just mentioned - we emit only wasm for them.
Also present in dynamic libraries (see link above), the table size is in the dylink section.
Agreed, we should fix this. We used 32 because in the early JS days of emscripten 64-bit ints were very slow, but today there is no reason not to.
Sounds fine to me.
This is a complex question. We picked musl as our libc for convenience, and it uses linux syscalls since that's all it supports, but it definitely is not the optimal API for what you're doing here, where you do want portability to other OSes. I think what you want is maybe to not link in musl at all - then you'll get libc and POSIX calls as the interface you need to implement, instead of syscalls. (Note that the headers may cause portability issues - if you use musl's, stuff may not match the local system, etc.) Dynamic libraries actually do that for you right now, as they don't link in system libraries at all.
Seems ok to me, but we'd need to make sure a new ABI is efficient. We did optimize musl's a bit, mainly by making syscalls static, but sounds like what you propose would not interfere with that, so this might be fine.
Do we not follow the POSIX or other relevant APIs when doing so? Maybe we have some bugs there. |
Emscripten is a huge part of that story. Thanks for WebAssembly and Emscripten. Just doing my part :)
Yes yes very true. I was being loose with my language. The "env" host module exports those objects and the generated "asm" module names those imports.
Cool, do you have a preference of serialization format for something like that? I would default to JSON but maybe it should be a in binary encoding similar to how WASM is encoded itself.
I see. So dynamic modules are intended to be compiled as standalone WASM files (with embedded metadata) that Emscripten knows how to load? Do you think extending the existing dynamic library generation code to support standalone WASM makes the most sense?
Yeah maybe the right thing to do here is not minify the import names when the output file ends in
Okay, when I get around to implementing this, I'll do it on a struct by struct basis and whichever ones are appropriate can be determined at code review time.
My guess is that some musl code will still be necessary, even if most of the POSIX functions are delegated to the host implementation. How comfortable are you with diverging from upstream musl? Do you often pull in code from upstream musl? Or do you consider it mostly done?
I don't think it's a bug. For certain syscalls like Seems like you're mostly okay with these changes. How should I go about starting the process? Should I just go ahead and submit PRs for these things (along with tests). What about documenting the API, should I just host that anywhere for now until the implementation is more firmed up? I can also submit doc changes to Emscripten. |
I think that might be best, yes. Then we are using an existing standard, and don't need to define anything new. It's a good starting point since it should work out of the box right now, actually, if you build with
Note how it emits just a
We don't pull often, but would like to keep the possibility of doing so. However, if it's a matter of using some musl functions and not others, we already do that quite a bit. The thing we should avoid is modifying musl code, but I hope that's not necessary here? The
Sure!
Yeah, would be good to document this. In general we want to document the wasm/JS "ABI" that we currently have, so those two might be the same document or related documents. |
After looking into this a bit more, I see very clearly what you're saying. What I'm looking for does share a lot with the dynamic loading use-case. Ultimately my WebAssembly runtime will have to support dynamic linking anyway as well. At the same time, a minimal amount of code should be implemented by the runtime, e.g. I don't think I think the right thing to do is add the dylink section to the main module, and potentially extend it to indicate that the module isn't relocatable. I would expect that using
Nope, no musl edits should be necessary. It seems to mostly be a matter of not linking certain function implementations (e.g. |
I opened #7733 which adds an option to emit a wasm by itself (no JS) that isn't a side module. It will prevent minification of the imports and exports in that case, as discussed earlier.
Interesting idea. Might be worth opening an issue here to discuss that: https://github.com/WebAssembly/tool-conventions I'm curious what other people think - maybe it should be called something else when not in a dynamic library, in particular. But it does seem to contain what we want for non-dynamic libraries too, yeah. |
I thought about it a bit more, and I think dylink section alone is not sufficient. I think there should also be version metadata so the runtime knows which ABI to use (e.g. in cases like 32bit vs 64bit off_t). |
Yeah, some ABI metadata may make sense. Perhaps we should define an emscripten metadata section that includes whatever we need, and have an option to emit it (it would increase code size by default). As an emscripten internal thing it wouldn't need to be discussed on tool-conventions, unless we can figure out a way to generalize it, but I'm not sure that's practical - thoughts? |
Since it will be versioning the OS/runtime interface expected by the binary, and I think we may want to generalize the POSIX ABI so other tools can conform, I think eventually some aspect of the Emscripten metadata section will need to be generalized. I think a good start for now would be to keep it specific to Emscripten while we work on the details.
|
When we get further along with other tools that want to be ABI-compatible, I recommend just making This is an area that I'm hoping the reference-sysroot can help bridge. It currently specifies a 64-bit off_t, for example. |
A lot of these changes will probably happen incrementally, so versioning would allow seamless compatibility along the way. I don't expect every ABI version to be considered stable, and I would only expect third party implementors to support stable ABI versions. The ABI versioning should probably conform to semantic versioning conventions, where major version 0 promises no compatibility with other versions one way or another. I agree that the first stable version ("1.0") should address everything critical here, like off_t being 64-bit. |
Until now we allowed either emitting JS+wasm, or for shared modules only, a wasm by itself (a side module, which would be loaded by other code that is responsible for any JS support code). There are other use cases than shared modules, though: * #7649 - running wasm in non-JS environments * Users that prefer to write their own JS support code, see https://github.com/kripken/emscripten/wiki/WebAssembly-Standalone This PR is a step in helping those use cases, by allowing emcc input.cpp -o output.wasm, which was previously an error. What it does is * Not emit the JS file. * Note that we are not emitting the JS file, which prevents some optimizations that only make sense when emcc emits both and can coordinate them, like minifying names between the two (as came up in the discussion in #7649).
This is related to issue emscripten-core#7649 (specifically point 7). The getdents64() API is not easily portable across different platforms due to the fact that there is no standardized way to read dirents from a raw file descriptor or seek a raw file descriptor to a specific dirent, and in practice each major platform does it differently with different contraints. Platforms tend to only support an opendir()/readdir()-based directory traversal API. To make it easier to run Emscripten-generated WASM files in non-browser contexts, this change makes generated WASM files import opendir()/readdir() functions from the host environment instead of the Linuxy getdents64() syscall. This makes the host functions that support those calls a lot less hacky. Note that this does not change the dirent ABI, which continues to have the same structure in memory as before. This allows us to continue using higher level musl code that interacts with directories, like scandir().
Related to emscripten-core#7649 (point 5) and subset of emscripten-core#7799 * Update python.bc * Update output wasm sizes * Update wasm backend tests
This issue has been automatically marked as stale because there has been no activity in the past year. It will be closed automatically if no further activity occurs in the next 7 days. Feel free to re-open at any time if this issue is still relevant. |
Related to emscripten-core#7649 (point 5) and subset of emscripten-core#7799 * Update python.bc * Update output wasm sizes * Update wasm backend tests
Proposed Emscripten / WebAssembly API Changes
Since late September 2018, I've been implementing a straightforward translation layer from host calls from Emscripten-generated WebAssembly to the C library of the underlying host. The result today is a WebAssembly runtime that fully implements all the required functionality to run an unmodified version of Nginx compiled with Emscripten. While it works, there were a few modifications I would make to the existing Emscripten<->WebAssembly API to correct some inadequacies, simplify the translation logic, and optimize the translation logic.
These changes are proposed as minor edits to the existing API but long term it would be good if we could fully document the existing API with the intention of publishing it as a POSIX API standard for WebAssembly. I'm looking for feedback on these proposals before I potentially start working on a patch for Emscripten.
1. Include memory layout information in WASM binary (Critical) (DONE 0d83546)
The WASM module produced by Emscripten requires a few exports from the
host to run successfully:
memoryBase
/__memory_base
__table_base
tempDouble
DYNAMICTOP_PTR
STACKTOP
STACK_MAX
The value of these exports depend on the target
STATIC_BUMP
value. This value is not encoded in the generated WASM file itself, and must be parsed out from the accompanying.js
file.I propose the
STATIC_BUMP
value be encoded in a custom "Emscripten Metadata" section in the generated WASM file.2. Compiling with
-O3
shouldn't minify WebAssembly import names (critical) (DONE 2852283)This was recently changed in Emscripten.
-O3
now by default mangles import names. I propose the default-O3
doesn't mangle the import names, and an extra option be added to do that extra minification if desired. Alternatively maintaining the current behavior and adding an option to disable the name mangling is another options.3. Compile straight to
.wasm
file (recommended) (DONE 2852283)It should be possible to specify a
.wasm
file as the output file. Right now you must specify a file ending in.js
, when the.js
file isn't always necessary.4. Include table layout information in WASM binary (Critical) (DONE 0d83546)
The WASM module generated by Emscripten imports its index-zero table from the host. The expected size of this table is not generic and must be parsed out from the accompanying
.js
file as well.I propose either:
generated WASM file.
5.
off_t
andino_t
should be 64-bits (Critical) (DONE 4e4a794)Most file systems support 64-bit
off_t
andino_t
. The WASM generated by Emscripten expects a 32-bitoff_t
andino_t
. This is a problem when delegating the host functions required by the generated WASM file directly to the host's C library since it's not possible to return the full value produced by the host to the hosted WASM.I propose we convert
off_t
andino_t
to 64-bits.6. More POSIX types should be 64-bits (Recommended)
Emscripten ends up using 32-bit versions of many POSIX structures, whereas the host implementation usually uses 64-bit values in its corresponding structures. While pointers might always necessarily be
32-bits in value and don't make sense in host context without conversion to host pointers first anyway, there are a few POSIX structures that in theory could be directly forwarded to the host
implementation. Examples include
struct timespec
,struct stat
,struct tm
, and more. If Emscripten encoded these using 64-bit integer values, then no conversion step would be required on the most popular data model on POSIX systems for the foreseeable future, LP64.7. POSIX API, not System Call API (Recommended)
At first, it seemed like the API between Emscripten-generated WebAssembly and the Emscripten host should be based on system calls implied by POSIX. This doesn't work in the case of higher-level APIs that have implementation-defined behavior. Examples include: readdir() and friends, getgrent and friends, getpwent and friends, sem_post() and friends, and more.
Emscripten currently uses musl library code to implement readdir() and requires the host to implement the
getdents64()
system call. Only Linux provides an API that hasgetdents64()
semantics, other POSIX systems like OpenBSD and macOS provide slightly similar but incompatible APIs and furthermore those APIs are unsupported. POSIX only requires that implementations providereaddir()
and friends. It would be better for Emscripten-generated WebAssembly to generate calls to host-definedreaddir()
and friends instead ofgetdents64()
to more easily implement the API on all POSIX systems.This requires some modification to the Emscripten's version of musl which adds extra complexity but I think is overall worth it.
8. Better function signatures for system calls (Recommended)
This is related to the previous recommendation: right now the system call API names the system call by number (e.g. _syscall220). Also the arguments to the system calls are passed in a buffer instead of as WebAssembly function arguments, this requires extra work on the host implement to decode the arguments. I propose all APIs use descriptive names and pass arguments normally instead of in an indirect buffer.
9. API function calls are responsible for setting Errno (Recommended)
Right now some Emscripten APIs expect the system call to return a negative errno number to signal error, and some return the error in the
errno
value. I think all APIs should behave similarly for consistency.The text was updated successfully, but these errors were encountered: