Serialize a compiled module #618

clarkmcc · 2022-06-03T01:56:40Z

Describe the solution you'd like
Is it possible to replicate this feature from wasmer-go? Compiling a module results in a significant number of allocations, so I'd like to compile a batch of modules ahead of time to reduce the runtime memory overhead.

codefromthecrypt · 2022-06-03T02:04:08Z

I guess what we are really talking about here is externalizing the compilation cache. plus possibly some guard to make sure if the structure of the code changes the cache is invalidated.

		codes           map[wasm.ModuleID][]*code // guarded by mutex.

I think we can look into this after SIMD is done, wdyt @mathetake?

PS, related, but not exactly the same as this: #179

F21 · 2022-06-03T06:04:16Z

I think this will also help us improve mjml-go performance: https://github.com/Boostport/mjml-go#benchmarks

Currently, spinning up a new worker using InstantiateModule() is quite expensive, so if we can clone a module, it would drive a lot of performance improvements.

codefromthecrypt · 2022-06-03T06:16:48Z

@F21 I think you already have an issues about InstantiateModule() #602 which is unrelated to this because Compile happens before that. Just setting expectations.

F21 · 2022-06-03T06:28:25Z

Oops, I must have misread. This feature would still potentially be useful for us. We're currently shipping a .wasm compressed with brotli, which is decompressed and compiled in init(). If the file size from serializing a compiled module is acceptable, we can just ship that directly in our library and remove the .wasm completely.

mathetake · 2022-06-03T06:35:41Z

Note that this comes with severe security concerns -- as this means that we allow users to directly execute any native code without validation passes (which are applied during wazero.CompileBinary). That means we also should provide some ways to do binary signing or something like that E.g. https://github.com/wasm-signatures/design

Regardless, I will work on this after finishing SIMD instructions (== completion of Wasm 2.0 Draft). Stay tuned!

codefromthecrypt · 2022-06-03T06:51:22Z

thanks for the feedback folks, this is great. don't worry too much about issue classification as we can sort it out.

mathetake · 2022-06-30T05:35:44Z

I think all is set to implement the externalization of compiled native code...

mathetake · 2022-06-30T05:44:25Z

one thing we have to figure out is that how to do the version/invalidation; if the internal of the compilers change, the compiled module won't work with the latest version of wazero. Therefore, we have to know if the compiled module binary's version (version of compiling wazero) matches the one of running wazero's version. Maybe embedding commit hash into the global variable helps in general if wazero is the executable (via Go's linker flag), but this time wazero is library.

codefromthecrypt · 2022-06-30T05:48:19Z

ack we need to look at existing VMs and how they do bytecode validation and to what degree this applies to us. Especially we shouldn't commit to a serialization mechanic prior to aug-31 which is our first beta, but that doesn't prevent experimentation before that.

clarkmcc · 2022-07-01T13:07:38Z

@mathetake the challenge with the commit hash approach is that it will change, requiring re-serialization even if the compiler internals did not change. This project is currently pushing several commits a day. Changes to the ARM compiler for example would not need to impact modules compiled for x86 (not sure if that's even a legitimate example, but you get the idea).

Obviously, the harder problem is how do you actually version optimistically. I briefly looked at Wasmer and it looks like they have a dedicated serialization format that is manually adjusted whenever breaking changes are made. See wasmerio/wasmer#2747, CosmWasm/cosmwasm#1223. The downside of this approach is that it requires someone who is involved enough in the project to bump that version if a change to the project would break the serialization format.

The other option (and I don't know what this would actually look like in practice yet) is to hand-craft a wasm module that is able to get as close to 100% test coverage of the underlying interpreter as possible, and then the CI/CD indicates that we need to bump the serialization format if running that module fails.

codefromthecrypt · 2022-07-01T23:02:48Z

good points, @clarkmcc. I also don't believe a commit hash approach would work unless stable tags are in use (ex monthly tags starting end of august), and even then I think we may end up needing something more fine tuned than that

I plan to add some research here as well as the problem isn't unlike other tools with compilation cache regardless of webassembly. I'll also have a look at the links you mentioned while takeshi is out.

codefromthecrypt · 2022-07-01T23:50:58Z

To be transparent about current thinking and to not block others doing it :)

First, if someone can profile what's taking the longest for their code, if possible, as there are multiple stages. For example, the code is parsed and converted to wazeroir (which should become more stable since we have all 2.0 features now). That's is an easier thing to cache if a problem, though it varies on the feature flags used.

Next, the code isn't organized for cache externalization right now. For example, there are some organization done to support concurrent compilation and some caching aspects (ex compiler.engine.codes). This doesn't really imply we can externalize, yet, as for example the inputs aren't explicitly organized in a way to invalidate heuristically. Ex the inputs are the module, feature flags, specifically the table and global elements of the module can affect how functions are compiled. I expect that there's more organization work to do before attempting to externalize iotw, because the focus so far has been on completing compiler features.

Related art I was thinking about are tools that also deal with varied inputs that produce an externalized, but validated cache. We can think about go's code cache, and things in other environments with a lot of mileage, such as java's validation or gradle's layered validation approaches. Not trying to over-analyze here, but the very least we should deep dive into is how other wasm tools and how go works. I do think it is worth looking at at least on outside tool (ex java or graalvm) as often that gives perspective. This sort of research can run concurrent to code organization and other sorts of things.

I think something like this is tricky enough to take several weeks of work to pull off in a sustainable way. Since not all consumers update wazero commit-by-commit, we may be able to cheat and have some iterative progress in an experimental area that indeed punts some to a commit hash to buy time. However, I think by the end of research organization and design, commit hash yeah will either be tentative, or an cheaper validation option available alongside a heuristic one.

Meanwhile to set expectations, takeshi's out for 10 days. I don't plan to re-organize the compiler while he's away, so the thing that could help if anyone wants to contribute is more background on alternative compilation caches, and/or profiling of wasm they use to see where things are hurting most. Profiling might also show a way to progress meanwhile (ex parallel function compilation), which might be good enough for a bit.

Hope this helps!

This adds the experimental support of the file system compilation cache. Notably, experimental.WithCompilationCacheDirName allows users to configure where the compiler writes the cache into. Versioning/validation of binary compatibility has been done via the release tag (which will be created from the end of this month). More specifically, the cache file starts with a header with the hardcoded wazero version. Fixes #618 Signed-off-by: Takeshi Yoneda <[email protected]> Co-authored-by: Crypt Keeper <[email protected]>

clarkmcc added the enhancement New feature or request label Jun 3, 2022

This was referenced Jun 22, 2022

amd64: removes embeddings of pointers of bit masks for FP arithmetics #648

Merged

compiler: remove embedding of pointers of jump tables #650

Merged

Make typeID deterministic #461

Closed

codefromthecrypt mentioned this issue Aug 16, 2022

Move wasm, wasi_snapshot_preview1, ieee754 and similar folders out of internal into utils or helpers #745

Closed

mathetake mentioned this issue Aug 18, 2022

Externalize compilation cache by compilers #747

Merged

mathetake closed this as completed in #747 Aug 18, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Serialize a compiled module #618

Serialize a compiled module #618

clarkmcc commented Jun 3, 2022

codefromthecrypt commented Jun 3, 2022

F21 commented Jun 3, 2022

codefromthecrypt commented Jun 3, 2022

F21 commented Jun 3, 2022

mathetake commented Jun 3, 2022 •

edited

Loading

codefromthecrypt commented Jun 3, 2022

mathetake commented Jun 30, 2022

mathetake commented Jun 30, 2022 •

edited

Loading

codefromthecrypt commented Jun 30, 2022

clarkmcc commented Jul 1, 2022

codefromthecrypt commented Jul 1, 2022

codefromthecrypt commented Jul 1, 2022

Serialize a compiled module #618

Serialize a compiled module #618

Comments

clarkmcc commented Jun 3, 2022

codefromthecrypt commented Jun 3, 2022

F21 commented Jun 3, 2022

codefromthecrypt commented Jun 3, 2022

F21 commented Jun 3, 2022

mathetake commented Jun 3, 2022 • edited Loading

codefromthecrypt commented Jun 3, 2022

mathetake commented Jun 30, 2022

mathetake commented Jun 30, 2022 • edited Loading

codefromthecrypt commented Jun 30, 2022

clarkmcc commented Jul 1, 2022

codefromthecrypt commented Jul 1, 2022

codefromthecrypt commented Jul 1, 2022

mathetake commented Jun 3, 2022 •

edited

Loading

mathetake commented Jun 30, 2022 •

edited

Loading