Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rkyv read native artifact from archive #2190

Merged
merged 45 commits into from
Apr 21, 2021

Conversation

ailisp
Copy link
Contributor

@ailisp ailisp commented Mar 17, 2021

Description

As mentioned in #2180, load a native artifact is slow because of deserialization ModuleMetadata. This draft PR show how fast it can be when serialize it archive to a [u8] that can be directly read back and interprete as nested struct/HashMap/PrimaryMap without a deserialization. Run

cargo bench -p wasmer-cache --features singlepass

from root dir, will see a lot of timeing in the end:

...
3.447994ms // bincode::deserialize
rkyv 22ns
rkyv deser 386.539µs

This PR use a forked rkyv, for impl rkyv traits on PrimaryMap and IndexMap.
TODO:

  • in fork rkyv, impl rkyv traits on IndexMap is not guanranteed keep indexmap's order, just a quick hack to see timing (reuse ArchivedHashMap). need to impl a real ArchivedIndexMap
  • [ ] replace each use of ModuleMetadata and children to ArchivedModuleMetadata or something like:
enum ModuleMetadataWrap {
    Original(ModuleMetadata),
    Archived(ArchivedModuleMetadata)
}

and field access to ModuleMetadataWrap method calls

  • rkyv deserialize is not as fast as rkyv archive, but fast enough and advantage is minimum change to wasmer's struct field access code, so I plan to use rkyv deserialize
  • add a test to ensure rkyv does deserialize correctly. I added assert to see deserialized obj is Eq to original, need to turn it to a unit test.
  • move impl rkyv::* for PrimaryMap/IndexMap inside this PR, instead of using a fork of rkyv, using a remote-derive technique

Review

  • Add a short description of the the change to the CHANGELOG.md file

@ailisp ailisp marked this pull request as draft March 17, 2021 00:23
@ailisp
Copy link
Contributor Author

ailisp commented Mar 18, 2021

I'm in question about whether we should prefer rkyv deserialize over archive. The deser approach require minimum change (just the change in this PR, PR is almost done except for moving impl rkyv in PrimaryMap/IndexMap inside wasmer). The later is fast, but more importantly, just found its speed is consistent. 386.539µs deser is a good enough time, but test multiple times it become volatile, and worst case takes more than 1ms, which is bad. While archive approach is always <30ns.
Archive need to replace ModuleMetadata with

enum ModuleMetadataWrap {
    Original(ModuleMetadata),
    Archived(ArchivedModuleMetadata)
}

And all access of ModuleMetadata as method calls. When access ModuleMetadata's field of PrimaryMap/IndexMap type, also need implement entire ArchivedPrimaryMap/ArchivedIndexMap methods that mirror all used methods of PrimaryMap/IndexMap.

@syrusakbary
Copy link
Member

syrusakbary commented Mar 18, 2021

@ailisp we just merged #2195, which moves PrimaryMap and SecondaryMap into Wasmer codebase. That way we can implement our own serializers and customize more our needs.
If you adapt this PR to latest master probably it would not require a fork of rkyv (but just the crates.io one).

Also, I think is critical that serialization/deserialization remains constant regardless of the platform/architecture.

That means that serializing the same content from macOS (with the m1 / arm64 chip) should give the same result as doing it from Linux (with a typical x86_64 chip) - and it should be the same for 32 bit architectures. This is critical to allow cross-compilation from other devices. That means compiling the file in one system, and running it in another.
Serde/bincode assures this. And I'm sure that rkyv does as well... but I'm not that sure about archive. Could you give some light on this?

Note: if we can assure constant serialization from different architectures with the archive approach, I'm happy to choose that one!

Thanks!

@ailisp
Copy link
Contributor Author

ailisp commented Mar 18, 2021

we just merged #2195, which moves PrimaryMap and SecondaryMap into Wasmer codebase

That's great to hear! would you also plan to move IndexMap in it? It's okay if not, I can Implement traits on some WasmerIndexMapWrapper trait.

Also, I think is critical that serialization/deserialization remains constant regardless of the platform/architecture.
That means that serializing the same content from macOS (with the m1 / arm64 chip) should give the same result as doing it from Linux (with a typical x86_64 chip) - and it should be the same for 32 bit architectures. This is critical to allow cross-compilation from other devices. That means compiling the file in one system, and running it in another.

Good point! I'll do some test. author mentioned major caveat is it is using platform native endian and didn't fully test for cross platform. Other than it theoratically work. Archive is internally encode all pointers to a relative offset as u64, that indicate where object is located in the resulting [u8]. I consider little endian is common enough (mac/linux/windows x86 and ARM) and author also gives an solution to handle cross platform endian: https://davidkoloski.me/rkyv/faq.html#how-does-rkyv-handle-endianness

Also I suppose the constant serialization means "the deserialized back object is Eq" when serialize from any platform and deserialize in any platform, instead of "serialized bytes is same byte by byte". Because in test I saw bincode doesn't give same byte when serialize the same ModuleMetadata twice.

@syrusakbary
Copy link
Member

Just thought of this.
It might be useful to research the usage of flex buffers (schema less iteration of flat buffers): https://crates.io/crates/flexbuffers

@ailisp
Copy link
Contributor Author

ailisp commented Mar 19, 2021

@syrusakbary I discovered you have some commented lines of using flexbuffers in wasmer, how is that going?
I also discovered this lib but it doesn't support fast access hashmap and indexmap, and you have to deserialize to access, but deserialize it won't be very fast because the overhead bring by serde (we benched any serde based deserialization is slower than borsh, and rkyv deserialize is faster than borsh)

@ailisp ailisp marked this pull request as ready for review March 20, 2021 02:05
lib/vm/src/module.rs Outdated Show resolved Hide resolved
lib/vm/Cargo.toml Outdated Show resolved Hide resolved
@bors
Copy link
Contributor

bors bot commented Apr 21, 2021

Build failed:

@syrusakbary
Copy link
Member

It seems like cargo deny check is failing

lib/vm/src/module.rs Outdated Show resolved Hide resolved
@syrusakbary
Copy link
Member

bors r+

@syrusakbary
Copy link
Member

bors r-

@bors
Copy link
Contributor

bors bot commented Apr 21, 2021

Canceled.

@syrusakbary
Copy link
Member

@ailisp we need to merge this suggestion #2190 (comment)

Otherwise tests will fail:
https://github.com/wasmerio/wasmer/runs/2396189543

@ailisp
Copy link
Contributor Author

ailisp commented Apr 21, 2021

Btw do you use some script / tool to run all tests locally? Sorry to have you start test back and forth

@syrusakbary
Copy link
Member

Btw do you use some script / tool to run all tests locally? Sorry to have you start test back and forth

This are the main tests run:

make test
make test-capi
make test-integration

@syrusakbary
Copy link
Member

bors r+

bors bot added a commit that referenced this pull request Apr 21, 2021
2190: Rkyv read native artifact from archive r=syrusakbary a=ailisp

<!-- 
Prior to submitting a PR, review the CONTRIBUTING.md document for recommendations on how to test:
https://github.com/wasmerio/wasmer/blob/master/CONTRIBUTING.md#pull-requests

-->

# Description
<!-- 
Provide details regarding the change including motivation,
links to related issues, and the context of the PR.
-->

As mentioned in #2180, load a native artifact is slow because of deserialization ModuleMetadata. This draft PR show how fast it can be when serialize it archive to a [u8] that can be directly read back and interprete as nested struct/HashMap/PrimaryMap without a deserialization. Run
```
cargo bench -p wasmer-cache --features singlepass
```
from root dir, will see a lot of timeing in the end:
```
...
3.447994ms // bincode::deserialize
rkyv 22ns
rkyv deser 386.539µs
```

This PR use a forked rkyv, for impl rkyv traits on PrimaryMap and IndexMap. 
TODO:
- [x] in fork rkyv, impl rkyv traits on IndexMap is not guanranteed keep indexmap's order, just a quick hack to see timing (reuse ArchivedHashMap). need to impl a real ArchivedIndexMap
- ~[ ] replace each use of `ModuleMetadata` and children to `ArchivedModuleMetadata` or something like:~
```
enum ModuleMetadataWrap {
    Original(ModuleMetadata),
    Archived(ArchivedModuleMetadata)
}
```
~and field access to `ModuleMetadataWrap` method calls~
- [x] rkyv deserialize is not as fast as rkyv archive, but fast enough and advantage is minimum change to wasmer's struct field access code, so I plan to use rkyv deserialize
- [x] add a test to ensure rkyv does deserialize correctly. I added assert to see deserialized obj is Eq to original, need to turn it to a unit test.
- [x] move impl rkyv::* for PrimaryMap/IndexMap inside this PR, instead of using a fork of rkyv, using a remote-derive technique

# Review

- [ ] Add a short description of the the change to the CHANGELOG.md file


Co-authored-by: Bo Yao <[email protected]>
@bors
Copy link
Contributor

bors bot commented Apr 21, 2021

Build failed:

@ailisp
Copy link
Contributor Author

ailisp commented Apr 21, 2021

@syrusakbary I run cargo fmt --all on two computers give me different format result. So there are two reasons. One of it is I had a ~/.config/rustfmt/rustfmt.toml on one computer, so i delete one, but it still differs. So the other reason is I had one rustc 1.49, anther 1.51. After both use latest 1.51 cargo fmt gives same result.
Do you consider have a fixed rustfmt.toml and rust-toolchain file like we did in nearcore?

https://github.com/near/nearcore/blob/master/rustfmt.toml
https://github.com/near/nearcore/blob/master/rust-toolchain

Or, it is helpful to ask contributors to always upgrade to latest stable rust (as used in ci), before run

make lint
make test
make test-capi
make test-integration

@ailisp
Copy link
Contributor Author

ailisp commented Apr 21, 2021

now make lint, test and test-capi pass, make test-integration fail at a same reason as master, looks unrevelant to this PR:

failures:

---- object_file_engine_works stdout ----
Error: Failed to compile wasm with Wasmer

Caused by:
    No such file or directory (os error 2)
thread 'object_file_engine_works' panicked at 'assertion failed: `(left == right)`
  left: `1`,
 right: `0`: the test returned a termination value with a non-zero status code (1) which indicates a failure', /home/bo/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/test/src/lib.rs:194:5
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

@syrusakbary
Copy link
Member

Do you consider have a fixed rustfmt.toml and rust-toolchain file like we did in nearcore?

Let's open a PR for rust-toolchain (we don't need a custom rustfmt.toml)

@syrusakbary
Copy link
Member

I've looked and for some reasons tests are failing in Aarch64, but I believe is completely unrelated to this PR.

Merging manually

@syrusakbary
Copy link
Member

@ailisp if you can also open another PR integrating rkyv into the JIT engine that would be great

@syrusakbary syrusakbary merged commit e4fd3c0 into wasmerio:master Apr 21, 2021
@ailisp
Copy link
Contributor Author

ailisp commented Apr 22, 2021

@ailisp if you can also open another PR integrating rkyv into the JIT engine that would be great

Sounds good. We decide to use JIT engine now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants