Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use a buffer abstraction for files in virtual_fs::mem_fs and Module::deserialize() #3875

Open
Michael-F-Bryan opened this issue May 17, 2023 · 2 comments
Labels
🎉 enhancement New feature! priority-medium Medium priority issue
Milestone

Comments

@Michael-F-Bryan
Copy link
Contributor

Motivation

The virtual_fs::mem_fs::file::ReadOnlyFile struct currently uses a Cow<'static, [u8]> for its backing bytes.

/// Read only file that uses copy-on-write
#[derive(Debug)]
pub(super) struct ReadOnlyFile {
buffer: Cow<'static, [u8]>,
}

On its own this is fine, however, we uses a lot of memory-mapped files in order to keep RAM usage down, and to deal with the fact that any slices coming from a memory-mapped file will have a non-'static lifetime, we transmute() the lifetime away and are "really careful" that the Mmap object they came from will outlive the virtual_fs::mem_fs::FileSystem object they are stored in.

// FIXME(Michael-F-Bryan): This is pretty sketchy.
// We should be using some sort of reference-counted
// pointer to some bytes that are either on the heap
// or from a memory-mapped file. However, that's not
// possible here because things like memfs and
// WasiEnv are expecting a Cow<'static, [u8]>. It's
// too hard to refactor those at the moment, and we
// were pulling the same trick before by storing an
// "ownership" object in the BinaryPackageCommand,
// so as long as packages aren't removed from the
// module cache it should be fine.
let atom: &'static [u8] =
unsafe { std::mem::transmute(command.atom()) };

Possible Solutions

The simplest solution is to take the &[u8] we get from a BinaryPackageCommand and create a copy of it with into_owned(). That's the approach originally taken in #3777, but it was reverted in 02b8e42 after comments from @theduke and @john-sharratt (#3777 (comment), #3777 (comment)).

A smarter solution would be to introduce something that is either backed by a memory-mapped file or an in-memory object. That's the approach taken by SharedBytes in the webc crate.

pub struct SharedBytes(Repr);

impl Deref for SharedBytes { /* delegate to Repr::as_slice() */ }

enum Repr {
  InMemory(bytes::Bytes),
  #[cfg(feature = "mmap")]
  Mmap {
    mmap: memmap2::Mmap,
    range: Range<usize>,
  }
}

impl Repr {
    fn as_slice(&self) -> &[u8] {
        match self {
            Repr::Bytes(b) => b,
            #[cfg(feature = "mmap")]
            Repr::Mapped { mmap, range } => &mmap[range.clone()],
        }
    }
}

Alternatively, we could add an extra level of indirection and use something like Arc<dyn AsRef<[u8]>> (or a custom trait). That way we'd be able to use Arc<[u8]> or Arc<MmapBackedSlice>, where MmapBackedSlice does that Repr::as_slice() trick.

@Michael-F-Bryan
Copy link
Contributor Author

I abstracted the SharedBytes type from webc into a shared-buffers crate that we can use. The entire dependency tree is 1-2 packages depending on the platform, and it was relatively trivial to update the webc crate's webc::v2::read::OwnedReader so it automatically uses memory-mapped files where possible (https://github.com/wasmerio/pirita/pull/114).

Michael-F-Bryan pushed a commit that referenced this issue May 19, 2023
Michael-F-Bryan pushed a commit that referenced this issue May 22, 2023
@ptitSeb ptitSeb added this to the v4.0 milestone May 23, 2023
@ptitSeb ptitSeb added the priority-high High priority issue label May 23, 2023
@Michael-F-Bryan Michael-F-Bryan changed the title Use a buffer abstraction for files in virtual_fs::mem_fs Use a buffer abstraction for files in virtual_fs::mem_fs and Module::deserialize() May 23, 2023
@Michael-F-Bryan
Copy link
Contributor Author

Michael-F-Bryan commented May 23, 2023

@john-sharratt also suggested using something like this when deserializing modules on Slack (#3882).

@ptitSeb ptitSeb modified the milestones: v4.0, v4.1 May 23, 2023
@ptitSeb ptitSeb added priority-medium Medium priority issue and removed priority-high High priority issue labels May 23, 2023
Michael-F-Bryan pushed a commit that referenced this issue May 24, 2023
Michael-F-Bryan pushed a commit that referenced this issue May 25, 2023
theduke pushed a commit that referenced this issue May 25, 2023
Michael-F-Bryan pushed a commit that referenced this issue May 25, 2023
theduke pushed a commit that referenced this issue May 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🎉 enhancement New feature! priority-medium Medium priority issue
Projects
None yet
Development

No branches or pull requests

2 participants