Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fastest way to transfer data between WASM and Rust? #1249

Closed
spease opened this issue Feb 26, 2020 · 16 comments
Closed

Fastest way to transfer data between WASM and Rust? #1249

spease opened this issue Feb 26, 2020 · 16 comments
Assignees
Labels
📦 lib-types About wasmer-types priority-medium Medium priority issue ❓ question I've a question! 🏚 stale Inactive issues or PR

Comments

@spease
Copy link

spease commented Feb 26, 2020

Currently I'm using memory.view() to transfer bytedata between WASM and Rust. This data is generally arbitrarily placed and sized. It may be possible for the size to be a compile-time constant, but it is currently set at runtime.

I've been disappointed that to get or set the data I have to call cell.get() or cell.set() for every single byte. This seems like it will likely needlessly slow things down, unless significant optimizations are made. In addition, it prevents me from using WASM memory directly eg with read as a buffer.

I've noticed there's an atomically() function. This changes the type of each byte in AtomicU8 rather than Cell.

Which of these is closer to the native representation?

Which is expected to generally be faster?

Why isn't there a way to directly access the bytes of the WASM memory space?

Thanks.

@spease spease added the ❓ question I've a question! label Feb 26, 2020
@CryZe
Copy link
Contributor

CryZe commented Feb 26, 2020

The problem is that you need some amount of interior mutability to represent the wasm memory, as the wasm code could randomly change any of these bytes without properly communicating that back to Rust. So if the wasm is single threaded, the best way to represent that interior mutability is via &[Cell<u8>] and in multi threaded wasm via &[AtomicU8]. However I believe as long as raw pointers are used, you are free to alias data anyway, so some form of *mut [u8] + careful usage should be fine too.

@MarkMcCaskey
Copy link
Contributor

MarkMcCaskey commented Feb 26, 2020

I'm not 100% on how to avoid undefined behavior, but UnsafeCell is the primitive that Rust uses for shared mutability through immutable pointers. Cell uses UnsafeCell. Cell is a compile time construct that tells the compiler that the data can be mutated, so there should be very little overhead (for example, none) and in fact the transmute of &[Cell<u8>] into &[u8] and probably into *mut [u8] is safe (well only kind of, the Rust stdlib source code does it but reserves the right to break the behavior in the future, so use at your own risk).

Unfortunately from my own tests, some uses of Cell can actually lead to worse code being generated: this is partially the point of Cell, that certain optimizations around immutable data don't apply, but it seems that also doing it in u8s can be slower. I haven't looked at this recently, but I'd expect setting with &[Cell<u64>] would generate pretty decent code.

Perhaps we should have a memcpy API on memory though for large data transfers 🤔

@CryZe
Copy link
Contributor

CryZe commented Feb 26, 2020

but reserves the right to break the behavior in the future

I don't think that's true. Raw pointers are defined to not follow the aliasing model, otherwise any C FFI wouldn't work. So the stacked borrows model explicitly opts out of disallowing aliasing on raw pointers while they are in use.

So if you run this with miri, which mutably aliases the same slice twice, it won't complain about it:

https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=bd388f82eec9f8a31f3c96526d27b77c

@MarkMcCaskey
Copy link
Contributor

That's good to know! Yeah, I think the line is between mutable and immutable data. It's fine to cast *mut T to *const T and then use unsafe to mutate it, but it's not okay to cast *const T to *mut T and mutate it: The thing that matters is where the memory came from. If it came from Rust and was marked mut, then it's safe to mutate, if it wasn't marked as mut then it's UB. In this case the memory is coming from outside of Rust, so *mut should be fine to use here then!

Hmm that implies that we can even have a &Memory -> *mut u8 and be safe. Though I'd prefer an API that doesn't require users to do bounds checking 🤔

It wasn't super clear in my comment but by "reserves the right to break this in the future" I was referring to the cast between Cell<T> and T. From https://doc.rust-lang.org/src/core/cell.rs.html#1557-1562 :

        // We can just cast the pointer from `UnsafeCell<T>` to `T` because of
        // #[repr(transparent)]. This exploits libstd's special status, there is
        // no guarantee for user code that this will work in future versions of the compiler!

@CryZe
Copy link
Contributor

CryZe commented Feb 27, 2020

Yeah that's mostly because UnsafeCell is the one and only way to obtain mutable data from immutably borrowed memory. So you can't replicate this without involving UnsafeCell somewhere within your type to get "interior mutability". So yeah what could likely happen here is that Memory is internally an UnsafeCell<[u8]> or so, to obtain *mut [u8] and then any kind of modifications should be good to go. Though you may need to be careful to when getting "normal" &[u8] and &mut [u8] slices from that. I'm not entirely sure how the stacked borrow model treats that (I believe some temporary ones are fine).

This would probably require lots of experimentation against miri.

@pepyakin
Copy link

On somewhat related note, the documentation for view states, quote:

This method is safe (as in, it won't cause the host to crash or have UB), but it doesn't obey rust's rules involving data races, especially concurrent ones. Therefore, if this memory is shared between multiple threads, a single memory location can be mutated concurrently without synchronization.

However, AFAIK, data races are UB in Rust and fwiw in C/C++.

@MarkMcCaskey
Copy link
Contributor

@pepyakin I agree, that doc comment seems wrong. There are complications about where the data races happen though: if 2 Wasm modules are run concurrently and in parallel and they both access the same shared memory without atomic memory operations, then data races are the "correct" result. The Wasm spec allows this situation to happen, though I believe we can force all uses of shared memory to be atomic, too, to mitigate some of it and still be in compliance with the Wasm spec. Data races involving the host are no good though and I'm not sure what a safe API for that looks like. If a Wasm module is accessing the same memory at the same time without atomic memory accesses, it seems like we're still in a bad place even if the host is using atomic memory operations.

It probably depends on the level of security guarantees that we want to make. We can probably, for example, create global locks on memory or something if we really needed it but that doesn't seem like a good solution in general. Or perhaps we can expose an API to configure/implement memory to allow users to do whatever they need for their specific use case.

@CryZe I recently read that it's the source mutability that matters for actually mutating... I can't remember exactly where, but I became convinced in the past week or two that as long as it originally came from a mutable location, then it can be cast to immutable and back to mutable and nothing bad will happen when mutating it. If I can remember where I read that I'll search for it and post a link here!

Though you may need to be careful to when getting "normal" &[u8] and &mut [u8] slices from that

That's true, any mapping of high level language types with guarantees like & and &mut to something this low level seems bound to cause undefined behavior in general. We'll probably need to just handle different use cases separately (shared vs unshared, shared with guarantees of atomic access vs shared without those guarantees, etc.). The challenge then becomes making a nice API for something with a ton of inherent complexity.

@spease
Copy link
Author

spease commented Feb 28, 2020

I don’t want to interrupt the discussion, just wanted to say-

From a user perspective, the confusing thing to me is that it looks like it’s locking/synchronized for each byte individually, which means that the conceptual data structure I’m accessing (a data buffer eg 16KiB+ for streaming chunks of data) is still subject to data races. I saw that the view function is generic so I can specify a different type, but then I don’t know how I would be able to index it if the address allocated for the buffer ends up not being a multiple of 16Ki. I’ve never heard of something being aligned based on the size of the entire buffer, usually it’s 4, 8, or 16 bytes.

What I thought maybe I was supposed to do then was specify a byte offset somewhere and then access the view using the type, but I didn’t see an option for that.

In practice for multiple instances I’d expect that you’d want a small amount of memory to be atomic (eg for a mutex) and then the rest of the data buffer would be unsynchronized, with the exception of some fence to make sure everything has been written through to RAM before the mutex is unlocked and the next wasm instance has a go. Even for data structures I’d expect this would be the most common model, I’d expect it’d mostly be counters or flags where people would want integer-sized atomic accesses.

I’m not sure how you would do this interoperability though since I’d expect transmuting some memory to a Mutex type on the x86_64 side would not map exactly to a wasm mutex type. In this case it would be nice if wasmer did provide an API, but synchronization primitives that can be used on both the wasm and x86_64 sides would make sense to me as well.

In my case right now, I’m only running wasm on a single thread so entirely unsynchronized should be ok.

@MarkMcCaskey
Copy link
Contributor

Yeah, the array-style indexing has been around for a while, it'd be good to have an API for accessing properly aligned types that don't fit that more easily, Rust doesn't like unaligned memory accesses but we can probably provide a copying unaligned access function too (perhaps the aligned access should copy too... It's possible to hold references to always valid Rust data types in Wasm memory but not in shared memory and in unshared memory the lifetime of the reference must end before the next time the Wasm can execute: this should work today with the current API).

From a user perspective, the confusing thing to me is that it looks like it’s locking/synchronized for each byte individually, which means that the conceptual data structure I’m accessing (a data buffer eg 16KiB+ for streaming chunks of data) is still subject to data races.

There's no locking when memory is accessed currently, it's all direct access. With the exception of things like memory.grow, that does acquire internal locks on dynamic memories.

Traditionally a data race is when, through the lack of use atomics in a shared setting, data is partially updated or observable in an invalid state. When dealing with types that are larger than the atomics your system supports, you'd normally have to use a higher level synchronization object like a mutex. The problem is that if a participant in the memory access is untrustable (like Wasm modules are in general), then there is no way to prevent logical race conditions at this level. However because we can control the Wasm execution, we can force a lock on its memory/execution on the host level and get safe access to its memory if that matters, but it shouldn't be an issue in general to be honest: the host should treat memory from the guest as malicious and possibly invalid (it may be corrupt even if you trust the Wasm module). To put it more directly: it's the responsibility of the host to handle bad input, so given proper atomic access to memory it shouldn't matter if the guest is attempting to maliciously edit that same memory (with the one caveat that I don't fully understand the interaction between atomic and non-atomic memory accesses, so there could be an issue there).

However even if the guest could partially corrupt its own data, the host shouldn't be trusting any types that aren't valid in all possible bit patterns from the guest, this is what the unsafe ValueType trait that we provide is for: if something is a ValueType it can be copied and it's always valid (for example bool is not always valid in Rust and is not safe to transfer from guest to host but an i32 is). So from that perspective, even if there are data races and you get half of 1 i32 and half of another, the host is fully in control and it just sees a number. If that number is valid then it can accept it, otherwise it can return an error code to the guest or terminate the guest's execution.

In practice for multiple instances I’d expect that you’d want a small amount of memory to be atomic (eg for a mutex) and then the rest of the data buffer would be unsynchronized, with the exception of some fence to make sure everything has been written through to RAM before the mutex is unlocked and the next wasm instance has a go. Even for data structures I’d expect this would be the most common model, I’d expect it’d mostly be counters or flags where people would want integer-sized atomic accesses.

I’m not sure how you would do this interoperability though since I’d expect transmuting some memory to a Mutex type on the x86_64 side would not map exactly to a wasm mutex type. In this case it would be nice if wasmer did provide an API, but synchronization primitives that can be used on both the wasm and x86_64 sides would make sense to me as well.

In general, hosts should not communicate through Wasm memory. Due to malicious Wasm modules and possible internal corruption. Doing so would be in the realm of using unsafe from Rust: it's not safe to do this in general but it may be in your case if you're careful. So it may make sense to have unsafe APIs for that if it's a common enough use case!

@spease
Copy link
Author

spease commented Feb 28, 2020

Just to be clear, this is the host streaming data into the guest or the guest streaming data into the host. Is there a better way to do this than Ctx.memory()?

@MarkMcCaskey
Copy link
Contributor

MarkMcCaskey commented Mar 3, 2020

@spease I was referring to getting data from the guest as the host, mostly.

So it depends, if you look at the design of something like wasi::fd_readdir you'll see something that looks like an iterator where the guest is supplying pointers to write the data to and the host returns a value which tells the guest what to pass back to the syscall to keep reading that data.

Abstractly that's a pretty nice design for most types of bulk data transfer unless the data will exist on the guest as a linear array, in which case it can be transferred to the host in one go.

I'd recommend not worrying about memory in general right now and write for the single-threaded use case.

I forgot to mention WasmPtr before! This is my preferred way to deal with Wasm linear memory from the host side. You can use it directly in your host function arguments too (e.g. fn my_host_call(ctx: &mut Ctx, data: WasmPtr<MyDataType, Array>, data_len: u32) as long as MyDataType implements ValueType).

One thing to note: I just realized the WasmPtr::deref_mut is definitely unsound. edit: nevermind, it's already marked as unsafe!

Also apologies about the poor documentation on the WasmPtr docs page, I'll make a PR now to at least partially fix it!

@stale
Copy link

stale bot commented Jul 14, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the 🏚 stale Inactive issues or PR label Jul 14, 2021
@Hywan
Copy link
Contributor

Hywan commented Jul 15, 2021

In #2442, new methods have been added to MemoryView, such as copy_from or subarray:

/// Copy the contents of the source slice into this `MemoryView`.
///
/// This function will efficiently copy the memory from within the wasm
/// module’s own linear memory to this typed array.
///
/// # Safety
///
/// This method is unsafe because the caller will need to make sure
/// there are no data races when copying memory into the view.
pub unsafe fn copy_from(&self, src: &[T]) {
// We cap at a max length
let sliced_src = &src[..self.length];
for (i, byte) in sliced_src.iter().enumerate() {
*self.ptr.offset(i as isize) = *byte;
}
}

@spease Do you need more API, or is your problem solved?

@stale stale bot removed the 🏚 stale Inactive issues or PR label Jul 15, 2021
@Hywan Hywan self-assigned this Jul 15, 2021
@Hywan Hywan added the 📦 lib-types About wasmer-types label Jul 15, 2021
@Amanieu Amanieu assigned Amanieu and unassigned Hywan Oct 20, 2021
@Amanieu Amanieu added the priority-medium Medium priority issue label Oct 20, 2021
@stale
Copy link

stale bot commented Oct 20, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the 🏚 stale Inactive issues or PR label Oct 20, 2022
@spease
Copy link
Author

spease commented Oct 20, 2022

I’ve moved on from the company I was working with when I ticketed this, but copy_from does look like it would have solved my use case of moving more than one byte of data at a time. Thanks for implementing it, I’m sure it will be useful for someone (I was surprised there didn’t seem to be something like it given how ubiquitous I expected it to be for people to want to move data in and out of a wasm sandbox).

@spease spease closed this as completed Oct 20, 2022
@webmaster128
Copy link
Contributor

From Wasmer 3 onwards, WasmSlice seems to be available to do this directly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
📦 lib-types About wasmer-types priority-medium Medium priority issue ❓ question I've a question! 🏚 stale Inactive issues or PR
Projects
None yet
Development

No branches or pull requests

7 participants