-
Notifications
You must be signed in to change notification settings - Fork 263
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Serialize a compiled module #618
Comments
I guess what we are really talking about here is externalizing the compilation cache. plus possibly some guard to make sure if the structure of the code changes the cache is invalidated. codes map[wasm.ModuleID][]*code // guarded by mutex. I think we can look into this after SIMD is done, wdyt @mathetake? PS, related, but not exactly the same as this: #179 |
I think this will also help us improve Currently, spinning up a new worker using |
Oops, I must have misread. This feature would still potentially be useful for us. We're currently shipping a |
Note that this comes with severe security concerns -- as this means that we allow users to directly execute any native code without validation passes (which are applied during Regardless, I will work on this after finishing SIMD instructions (== completion of Wasm 2.0 Draft). Stay tuned! |
thanks for the feedback folks, this is great. don't worry too much about issue classification as we can sort it out. |
I think all is set to implement the externalization of compiled native code... |
one thing we have to figure out is that how to do the version/invalidation; if the internal of the compilers change, the compiled module won't work with the latest version of wazero. Therefore, we have to know if the compiled module binary's version (version of compiling wazero) matches the one of running wazero's version. Maybe embedding commit hash into the global variable helps in general if wazero is the executable (via Go's linker flag), but this time wazero is library. |
ack we need to look at existing VMs and how they do bytecode validation and to what degree this applies to us. Especially we shouldn't commit to a serialization mechanic prior to aug-31 which is our first beta, but that doesn't prevent experimentation before that. |
@mathetake the challenge with the commit hash approach is that it will change, requiring re-serialization even if the compiler internals did not change. This project is currently pushing several commits a day. Changes to the ARM compiler for example would not need to impact modules compiled for x86 (not sure if that's even a legitimate example, but you get the idea). Obviously, the harder problem is how do you actually version optimistically. I briefly looked at Wasmer and it looks like they have a dedicated serialization format that is manually adjusted whenever breaking changes are made. See wasmerio/wasmer#2747, CosmWasm/cosmwasm#1223. The downside of this approach is that it requires someone who is involved enough in the project to bump that version if a change to the project would break the serialization format. The other option (and I don't know what this would actually look like in practice yet) is to hand-craft a wasm module that is able to get as close to 100% test coverage of the underlying interpreter as possible, and then the CI/CD indicates that we need to bump the serialization format if running that module fails. |
good points, @clarkmcc. I also don't believe a commit hash approach would work unless stable tags are in use (ex monthly tags starting end of august), and even then I think we may end up needing something more fine tuned than that I plan to add some research here as well as the problem isn't unlike other tools with compilation cache regardless of webassembly. I'll also have a look at the links you mentioned while takeshi is out. |
To be transparent about current thinking and to not block others doing it :) First, if someone can profile what's taking the longest for their code, if possible, as there are multiple stages. For example, the code is parsed and converted to wazeroir (which should become more stable since we have all 2.0 features now). That's is an easier thing to cache if a problem, though it varies on the feature flags used. Next, the code isn't organized for cache externalization right now. For example, there are some organization done to support concurrent compilation and some caching aspects (ex compiler.engine.codes). This doesn't really imply we can externalize, yet, as for example the inputs aren't explicitly organized in a way to invalidate heuristically. Ex the inputs are the module, feature flags, specifically the table and global elements of the module can affect how functions are compiled. I expect that there's more organization work to do before attempting to externalize iotw, because the focus so far has been on completing compiler features. Related art I was thinking about are tools that also deal with varied inputs that produce an externalized, but validated cache. We can think about go's code cache, and things in other environments with a lot of mileage, such as java's validation or gradle's layered validation approaches. Not trying to over-analyze here, but the very least we should deep dive into is how other wasm tools and how go works. I do think it is worth looking at at least on outside tool (ex java or graalvm) as often that gives perspective. This sort of research can run concurrent to code organization and other sorts of things. I think something like this is tricky enough to take several weeks of work to pull off in a sustainable way. Since not all consumers update wazero commit-by-commit, we may be able to cheat and have some iterative progress in an experimental area that indeed punts some to a commit hash to buy time. However, I think by the end of research organization and design, commit hash yeah will either be tentative, or an cheaper validation option available alongside a heuristic one. Meanwhile to set expectations, takeshi's out for 10 days. I don't plan to re-organize the compiler while he's away, so the thing that could help if anyone wants to contribute is more background on alternative compilation caches, and/or profiling of wasm they use to see where things are hurting most. Profiling might also show a way to progress meanwhile (ex parallel function compilation), which might be good enough for a bit. Hope this helps! |
This adds the experimental support of the file system compilation cache. Notably, experimental.WithCompilationCacheDirName allows users to configure where the compiler writes the cache into. Versioning/validation of binary compatibility has been done via the release tag (which will be created from the end of this month). More specifically, the cache file starts with a header with the hardcoded wazero version. Fixes #618 Signed-off-by: Takeshi Yoneda <[email protected]> Co-authored-by: Crypt Keeper <[email protected]>
Describe the solution you'd like
Is it possible to replicate this feature from wasmer-go? Compiling a module results in a significant number of allocations, so I'd like to compile a batch of modules ahead of time to reduce the runtime memory overhead.
The text was updated successfully, but these errors were encountered: