-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Emscripten & WASI & POSIX #9479
Comments
We are trying to move emscripten in the direction of WASI where possible to avoid supporting two different ABIs. This means emscripten's wasm files will start to look more WASI compatible over time. @kripken even recently added a Its sounds like you are saying that it would be hard for you integrate part of the WASI implementing into you emscripten ABI support? But in the long run, if you could do this, wouldn't it actually reduce your code complexity? I admit I didn't look at your source yet, so I don't fully understand why this would be a problem. |
Also, what do you think this Supporting only |
Thanks for raising the issue @syrusakbary ! There are multiple things happening here. One is that, regardless of Over time we hope to remove the other But another issue is that we don't think it's practical to have two modes, 100% wasi and 100% non-wasi, because wasi is still a work in progress, as is our support for it. In the meantime, custom embedders may use wasi support but also add extra support for non-wasi stuff. I don't see a way around that (even if we assume wasi will eventually have graphics and audio etc., it will take many years). We do have the That's the big picture from our perspective. I'd like to understand more what specifically is inconvenient for wasmer, and work to find a solution. |
Wasmer Emscripten implementation and WASI implementation don't share the same context data.
Because of that, it's hard to use Emscripten and WASI together for the same underlying filesystem. Why is hard?Based on the example that I provided before, the set of imports are: (import "env" "__cxa_uncaught_exceptions" (func $env.__cxa_uncaught_exceptions (type $t16)))
(import "env" "__cxa_atexit" (func $env.__cxa_atexit (type $t1)))
(import "env" "__syscall6" (func $env.__syscall6 (type $t2)))
(import "env" "__syscall145" (func $env.__syscall145 (type $t2)))
(import "env" "__syscall140" (func $env.__syscall140 (type $t2)))
(import "wasi_unstable" "fd_write" (func $wasi_unstable.fd_write (type $t11)))
(import "env" "getenv" (func $env.getenv (type $t0)))
(import "env" "__map_file" (func $env.__map_file (type $t2)))
(import "env" "__syscall91" (func $env.__syscall91 (type $t2)))
(import "env" "strftime_l" (func $env.strftime_l (type $t7)))
(import "env" "__cxa_pure_virtual" (func $env.__cxa_pure_virtual (type $t12)))
(import "env" "pthread_cond_broadcast" (func $env.pthread_cond_broadcast (type $t0)))
(import "env" "pthread_cond_wait" (func $env.pthread_cond_wait (type $t2)))
(import "wasi_unstable" "proc_exit" (func $wasi_unstable.proc_exit (type $t10))) But let's get analyze them more detailed: Cpp calls (are separated from fs) 👍: # This is a Cpp method 👍
(import "env" "__cxa_uncaught_exceptions" (func $env.__cxa_uncaught_exceptions (type $t16)))
# This is a Cpp method 👍
(import "env" "__cxa_atexit" (func $env.__cxa_atexit (type $t1)))
# This is a Cpp method 👍
(import "env" "__cxa_pure_virtual" (func $env.__cxa_pure_virtual (type $t12))) Filesystem calls (non-wasi) ❗️ # This is a call to close a file ❗️(it should be WASI fd_close)
(import "env" "__syscall6" (func $env.__syscall6 (type $t2)))
# This is a call to read a file descriptor ❗️(it should be WASI fd_readv)
(import "env" "__syscall145" (func $env.__syscall145 (type $t2)))
# This is a call to seek a file descriptor ❗️(it should be WASI fd_seek)
(import "env" "__syscall140" (func $env.__syscall140 (type $t2))) WASI filesystem calls # This is a WASI call 👍
(import "wasi_unstable" "fd_write" (func $wasi_unstable.fd_write (type $t11))) Other calls: # This could be a WASI call, but doesn't interact with the fs so it's also ok to not use WASI 👍
(import "env" "getenv" (func $env.getenv (type $t0)))
# This map_file can be a bit challenging
(import "env" "__map_file" (func $env.__map_file (type $t2)))
# This map_file can be a bit challenging
(import "env" "__syscall91" (func $env.__syscall91 (type $t2)))
# This could be a WASI call, but doesn't interact with the fs so it's also ok to not use WASI 👍
(import "env" "strftime_l" (func $env.strftime_l (type $t7)))
# This is a non-filesystem method 👍
(import "env" "pthread_cond_broadcast" (func $env.pthread_cond_broadcast (type $t0)))
# This is a non-filesystem method 👍
(import "env" "pthread_cond_wait" (func $env.pthread_cond_wait (type $t2)))
# This is a WASI call 👍
(import "wasi_unstable" "proc_exit" (func $wasi_unstable.proc_exit (type $t10))) Ideal solutionRight now, the filesystem calls are intermixed between WASI ( Our WASI implementation relies on different sandboxed filesystem descriptors that can't be reused in the normal Emscripten context. It would be awesome, if for the filesystem we can relay completely in WASI (so it can be easily decoupled from the other Emscripten syscalls). Adding also @AndrewScheidecker to the thread as he might have some extra input / ideas |
The plan is to move as many of the emscripten syscalls as possible to WASI syscalls. So the ones you mention will be moving to WASI very soon I imagine. However there will inevitably be syscall in emscripten that don't map to WASI syscalls. This might well include filesystem syscalls that take the same file descriptors as the WASI syscalls. I imagine we will get to a place where the emscripten syscalls represent a super of the WASI syscalls. For the time being since you have two different sandbox models I imagine you will need to continue to maintain two different version of the WASI syscalls (i.e. fd_write for emscripten will not be the same function as fd_write for WASI). However in the long term I would hope that both WASI and emscripten runtimes would use the same sandboxed implementation. Is there any reason you wouldn't want the same filesystem sandbox when running emscripten-built binaries? |
Yes, as @sbc100 said, we intend to fix many of those soon. E.g. I do understand that the intermediate state may be harder to support for your embedding. But we have to move incrementally in Emscripten. One option might be to say that wasmer doesn't support Emscripten versions in that intermediate state - so wasmer would support older versions, and eventually new-enough versions once that work is done, but not versions X-Y in the middle. But also as @sbc100 said, I expect we'll always be a superset of wasi in some form or other. Hopefully not for filesystem I/O, but definitely for other stuff (graphics, audio, etc.). |
We would love to have the same sandbox in Emscripten. But it's just quite hard if the filesystem calls are intermixed between emscripten and WASI (meaning: it's hard if we use
I think that would be awesome ❤️
Yeah, I think that's the good approach. I just wished all WASI (regarding the fs) was implemented at once so we could reuse the WASI sandboxed code that we have :) |
@syrusakbary what do you think about the STANDALONE_WASM mode? Do you want to be able to run arbitrary emscripten binaries or would you be ok requiring they be built with STANDALONE_WASM ? (i.e. do you build all your binaries yourself or do you want to run emscripten binaries from the web?) |
I think it's a good move to use the WASI ABI for as much of the Emscripten functionality as possible. The ABI changes are a short-term burden for WAVM, but in the long-term it will be much less of a burden if the WASI and Emscripten environments can share code.
Don't rule out adding graphics and audio APIs to WASI! I would implement them in WAVM if they existed.
I think it's ok to have a flag to produce binaries that work in non-browser environments. It needs to be possible to unambiguously detect a binary compiled without it and produce a nice error message, though. |
That's true, I'm also very excited about these in WASI, but I think/hope WASI won't have the same graphics APIs that Emscripten has. I spent some time experimenting with the SDL and OpenGL stuff and got it kind of working in Wasmer, but it had some serious issues. The two most prominent are the main loop and the security. The main loop is inverted and works with callbacks which isn't that natural outside the web or JS. Securely executing OpenGL is really tricky and we have to care about and handle the differences between OpenGL, OpenGL ES, and WebGL. There's been some discussion about using WebGPU in WASI, which apparently solves some or all of these problems. To the general topic: it's tricky. Being able to execute Wasm from the web directly is really neat. However there are already some issues with this because of versioning and how often Emscripten changes. Emscripten compiles to a complete working solution because it can generate the relevant JS and Wasm that work together, the issue is that this means that supporting arbitrary Emscripten Wasm from the web is non-trivial because it's a moving target. I'm not sure if Emscripten stores its version info in the Wasm anywhere, but we're currently not detecting it or using it at Wasmer, we just vaguely target the latest version. If Emscripten can move all its FS calls into WASI eventually, that would be a good change in my opinion. However, if it can't then things may start to get really complex. In the future WASI fds will be opaque references so Emscripten's will have to be too, or you'll need extra layers of abstraction to keep track of the relationship between Emscripten file handles and WASi file handles and you'll have to sync metadata between them. Any calls outside the WASI ones need to interact with the sandbox appropriately. I think this bad-case scenario of partial migration will introduce the complexity primarily on the Emscripten compiler side, which will need to be reimplemented on the host side. |
Specifically regarding WASI using reference types. If/when that happens both emscripten libc and wasi-libc will need to convert between ref types and integer fds anyway. Because libc is based on file descriptors. I don't see problem doing this. If anything I see an opportunity to one day share libc code between wasi-libc and emscripten's libc. |
We do have an option (
Yeah, we definitely don't plan to do a partial migration - the goal is to have something simple at the end, hopefully just using wasi for filesystem stuff. However, there is some chance of encountering problems with using the wasi API. |
After doing more work here, I realized that Emscripten switching to 100% wasi for our ABI would preclude full POSIX support. One example: In NODERAWFS mode we literally propagate file operations to node's FS API directly. For example, if we ask to create a file with mode Likewise, there are plenty of I don't think we want to regress this, as it's useful to compile to js+wasm and run in node with full POSIX powers! As a result, I think we may want to think about something like this:
In practice, I think the majority of programs doing pure computation, and maybe some printf logging, would not need Maybe there's a better solution that I'm missing, though? |
Your approach sounds reasonable. My only question would be what is the value in the PURE_WASI mode? It seems that any reasonable sized app that targets that web would not be able to use it anyway. And small or non-web codebases might as well use |
One advantage would be that users with a web port don't need a new toolchain to get a wasi build, they can just flip a flag. Another advantage is that Emscripten could support wasi + other stuff, like say OpenGL, which is not in the wasi SDK. That might make sense in say a game engine plugin, if their runtime already has wasi support for printing and files. But, yeah, I wish we could do better here - there would be more value for users if Emscripten emitted wasi by default. But abandoning POSIX support doesn't seem worth that. Curious if others feel otherwise though! |
I'm not totally against a PURE_WASI option if there are user out there who would want it. In terms of giving up POSIX compatibility and/to accepting regressions in size and/or performance, and agree that we should not sacrifice those things for WASI compatibility. We could make compromised here and there of course if the loss is negligible, and we can continue to push the WASI standard in places where we think it makes sense (as you have started to do already). |
Thanks for the updates and for keeping us in the loop!
Agreed, we are now investigating into more ways to compile project easily into WASI. Perhaps it will be tricky for Emscripten to adopt WASI fully regarding possible regressions in size/performance. You probably have much more context than I do on the feasibility of this :) |
What do you think about the POSIX issue mentioned earlier? I'm curious if wasmer is interested to run applications that use more POSIX than WASI can support. As a concrete example, you can't implement the commandline tool
Similarly, you can't port something like Python 100% in WASI, because people can use it to look at those permissions. That stuff does work in Emscripten's POSIX support currently. How important do you think it is? |
I'd be curious to hear your thoughts on how important POSIX support is in WAVM, specifically POSIX stuff that doesn't fit into WASI, see the above example. |
I do want to support as much of POSIX as I can in WAVM, one way or another. That would ideally be through some standardized ABI like WASI. I don't see why WASI wouldn't eventually support all of POSIX. |
Thanks @AndrewScheidecker! Good to know. |
Yeah, I think WASI would eventually support all of POSIX. Or, at least, that's where we would like to move towards :)
We actually got |
Oh, interesting! How did you do it? |
Here are the commits into Rust's wapm-packages/coreutils@b76e18d#diff-e0d0d10a53bd466a68dc6509c9057367R209 |
Oh, but doesn't that commit just skip the POSIX stuff that WASI can't do? Or did I misunderstand it? |
I think this would be a preferable approach IMHO. |
Is there a way to run emscripten produced wasm with wasmer today, I get ImportNotFound(s) for with and without -s STANDALONE_WASM=1 ? |
Most non-trivial programs that build with emscripten contain a lot of non-wasi imports. Only programs with extremely minimal requirements will end up depending on wasi alone. For warmer I believe the maintain some amount of support for emscripten custom syscall layer so you might have some success there. Are are the list of imports that are missing? |
List is shorter for STANDALONE_WASM=0:
|
This is CoreRT BTW (a .Net compiler/runtime) so it's going to fit in the "non-trivial" category. |
I would have thought the list would be shorter for Depending on your programs dependencies will most likely need to implement a lot of extra stuff on the embedder side to support using emscripten-generated wasm files in such environments. |
I wonder what people are doing for CI environments for emscripten built wasm. Running firefox --headless was not totally reliable for us. |
Node works pretty good. The emscripten generated JS works find there, and it even supports threads these days. |
It would be good to see the list for |
@kripken I guess you meam =1. =0 is posted above. The list for =1 is
Which includes a bunch of symbols that are undefined in my code base e.g. @sbc100 Thanks for node tip! |
As you say that list contains mostly symbols related to your project. It would be useful if you could filter out all those symbols so we could see the resulting list of emscripten-internal symbol. From first glance it doesn't look that big. A few Out of interest how do you plan in injecting symbols like |
Tried to format the list better, removing my stuff
Things |
Much better. Thanks. So all the things that start with "wasi_unstable" should be implement by the wasi runtime already (so I don't know why they are showing up in that list. All the pthread symbols I guess are there because you program depends on threads? We should probably be stubbing those out inside the wasm binary when you are not building with So that leaves:
For some of those you will probably want to modify you code to remove the dependency. For others we can work on removing the dependency in the WASM_STANDALONE builds. We would need to look at them on a case by case bases. BTW, I'm working on change that will make the syscall symbols appear as names rather than number to make that a little more readable. |
Sorry for the delay replying @sbc100 , I just read your message. It's awesome that Emscripten supports all WASI syscalls. This will unblock Wasmer Emscripten integration, which we paused because of security concerns. But now that Emscripten is using the bulk of WASI we should be able to revisit it! Edit: perhaps I misread the message. Just to make sure: does Emscripten uses WASI syscalls everywhere where is possible? (excluding things that are not supported by WASI, of course) |
This issue has been automatically marked as stale because there has been no activity in the past year. It will be closed automatically if no further activity occurs in the next 30 days. Feel free to re-open at any time if this issue is still relevant. |
A follow up on this issue. We have created https://wasix.org/ as a way to accomplish this: it currently supports threads, sockets, signals, longjmp/setjmp, and many more.... hope this makes easier to compile to Wasm in the future! |
After compiling this simple example with emscripten (using
tot-upstream
, withem++ issue_577.cpp -s WASM=1 -o issue_577.wasm
):The WebAssembly file has the following imports:
Note that there
wasi_unstable
imports are mixed with the emscriptenenv
ones.(See attached generated wasm: issue_577.wasm.zip)
This makes a bit hard for standalone-wasm implementors, as now we have to mix two different ABIs (WASI and Emscripten POSIX-like) and make sure they both run properly. This is quite challenging as both WASI and Emscripten have a different data structure in the VM context (in Wasmer, the struct holding the the VM WASI data is different than the Emscripten data)
I think Emscripten should adopt only WASI when all the imports are WASI-like, otherwise use the already existing ABI.
Thoughts @kripken?
The text was updated successfully, but these errors were encountered: