Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clean up core WASI FS code #2247

Merged
merged 2 commits into from
May 19, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@ Looking for changes that affect our C API? See the [C API Changelog](lib/c-api/C
- [#2149](https://github.com/wasmerio/wasmer/pull/2144) `wasmer-engine-native` looks for clang-11 instead of clang-10.

### Fixed
- [#2247](https://github.com/wasmerio/wasmer/pull/2247) Internal WasiFS logic updated to be closer to what WASI libc does when finding a preopened fd for a path.
- [#2208](https://github.com/wasmerio/wasmer/pull/2208) Fix ownership in Wasm C API of `wasm_extern_as_func`, `wasm_extern_as_memory`, `wasm_extern_as_table`, `wasm_extern_as_global`, `wasm_func_as_extern`, `wasm_memory_as_extern`, `wasm_table_as_extern`, and `wasm_global_as_extern`. These functions no longer allocate memory and thus their results should not be freed. This is a breaking change to align more closely with the Wasm C API's stated ownership.
- [#2210](https://github.com/wasmerio/wasmer/pull/2210) Fix a memory leak in the Wasm C API in the strings used to identify imports and exports coming from user code.
- [#2108](https://github.com/wasmerio/wasmer/pull/2108) The Object Native Engine generates code that now compiles correctly with C++.
Expand Down
65 changes: 20 additions & 45 deletions lib/wasi/src/state/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -784,7 +784,7 @@ impl WasiFs {
let file_type = metadata.file_type();
// we want to insert newly opened dirs and files, but not transient symlinks
// TODO: explain why (think about this deeply when well rested)
let mut should_insert = false;
let should_insert;

let kind = if file_type.is_dir() {
should_insert = true;
Expand All @@ -803,6 +803,7 @@ impl WasiFs {
fd: None,
}
} else if file_type.is_symlink() {
should_insert = false;
let link_value = file.read_link().ok().ok_or(__WASI_EIO)?;
debug!("attempting to decompose path {:?}", link_value);

Expand Down Expand Up @@ -957,18 +958,15 @@ impl WasiFs {
Ok(cur_inode)
}

/// Splits a path into the first preopened directory that is a parent of it,
/// if such a preopened directory exists, and the rest of the path.
///
/// NOTE: this behavior seems to be not the same as what libpreopen is
/// doing in WASI.
///
/// TODO: evaluate users of this function and explain why this behavior is
/// not the same as libpreopen or update its behavior to be the same.
/// Splits a path into the preopened directory that has the largest prefix of
/// the given path, if such a preopened directory exists, and the rest of the path.
Copy link
Contributor

@Earthmark Earthmark Apr 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The wording of this comment confused me for a bit, this is just my perspective and additional data points should be gathered if this actually matters.

Either way it's a massive nitpick, feel free to ignore this.

With this though, my mind interperted the second sentence fragment "if such a preopened directory exists" to be the final part of the sentence, and then the "and the rest of the path" was confusing and didn't correctly link in my mind that it was talking about the second tuple value.

Instead, can I request that the topics of the "path of the preopened directory" and the "rest of the path" be located near eachother in the comment, with the error case being near the end of the line? That way it mirrors the structure of the result value, so reading the return value and comment is a similar read.

fn path_into_pre_open_and_relative_path(
&self,
path: &Path,
) -> Result<(__wasi_fd_t, PathBuf), __wasi_errno_t> {
let mut max_prefix_len = 0;
let mut max_stripped_path = None;
let mut max_po_fd = None;
// for each preopened directory
for po_fd in &self.preopen_fds {
let po_inode = self.fd_map[po_fd].inode;
Expand All @@ -978,46 +976,23 @@ impl WasiFs {
_ => unreachable!("Preopened FD that's not a directory or the root"),
};
// stem path based on it
if let Ok(rest) = path.strip_prefix(po_path) {
// if any path meets this criteria
// (verify that all remaining components are not symlinks except for maybe last? (or do the more complex logic of resolving intermediary symlinks))
// return preopened dir and the rest of the path

return Ok((*po_fd, rest.to_owned()));
}
}
Err(__WASI_EINVAL) // this may not make sense
}

// if this is still dead code and the year is 2020 or later, please delete this function
#[allow(dead_code)]
pub(crate) fn path_relative_to_fd(
&self,
fd: __wasi_fd_t,
inode: Inode,
) -> Result<PathBuf, __wasi_errno_t> {
let mut stack = vec![];
let base_fd = self.get_fd(fd)?;
let base_inode = base_fd.inode;
let mut cur_inode = inode;

while cur_inode != base_inode {
stack.push(self.inodes[cur_inode].name.clone());
match &self.inodes[cur_inode].kind {
Kind::Dir { parent, .. } => {
if let Some(p) = parent {
cur_inode = *p;
}
if let Ok(stripped_path) = path.strip_prefix(po_path) {
// find the max
let new_prefix_len = po_path.as_os_str().len();
// we use >= to favor later preopens because we iterate in order
// whereas WASI libc iterates in reverse to get this behavior.
if new_prefix_len >= max_prefix_len {
max_prefix_len = new_prefix_len;
max_stripped_path = Some(stripped_path);
max_po_fd = Some(*po_fd);
}
_ => return Err(__WASI_EINVAL),
}
}

let mut out = PathBuf::new();
for p in stack.iter().rev() {
out.push(p);
if max_prefix_len == 0 {
Err(__WASI_EINVAL) // this may not make sense depending on where it's called
} else {
Ok((max_po_fd.unwrap(), max_stripped_path.unwrap().to_owned()))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I am very new, but I figure I should introduce myself by bein useful with a code review!)

I wonder if it is possible to structure this so the unwraps are not required. It seems like this function pivots once a candidate is found, would it be possible to fold the three parameters into a single Result value, and have the last step here be over the existing result methods instead of using unwrap?

The unwrap was a red flag for me, and I almost posted about the empty-list case, however these unwraps are safe because the max_prefix_len check and that if it's not null, these two are populated. Recognizing that requires me to mull over the state machine to figure out that the three are correlated, where as I feel the code could directly tell me that.

Either way, still works the same, just a clarity for me!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this is a good observation! I agree, but it's a bit tricky here.

Sorry by the way -- I missed these comments before -- reviewing them now, I'll make a new PR with any updates I make. Thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In theory, we can solve this problem with a typed state machine, I'm just not sure it's really worth it given the size of this function. If it were larger, I'd definitely be in favor though. The thing is it feels like I'll at least double the amount of code to do it that way...

Like

enum BaseFdAndRelPath {
  None,
  BestMatch  {
    fd: __wasi_fd_t,
    rel_path: PathBuf,
    max_seen: usize,
  }
}

Impl  BaseFdAndRelPath {
  fn max_seen(&self) -> usize {
     match self {
       Self::None => 0,
       Self::BestMatch { max_seen, .. } => max_seen,
     }
  }
}

Actually, that's not too bad with the method... I'll try it out

Copy link
Contributor Author

@MarkMcCaskey MarkMcCaskey May 19, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like https://doc.rust-lang.org/std/cmp/fn.max_by.html could also work here, unfortunately it's not stable. That would lead to simplified code too. I'm pretty happy with the result I have, I do think the code is more clear now! Thanks!

The above Enum that I posted could definitely be made generic and reused with generic data for finding the most X of a collection of things, could probably even move control flow into a method on this type.

Edit: but really for this code if we wanted to do it properly we'd probably just use a proper prefix-search data structure like a trie of the preopen FD paths to upgrade the complexity from M * N (where M is the length of the user given path and N is the number of preopens) to just M.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pushed it up to #2325

(I am very new, but I figure I should introduce myself by bein useful with a code review!)

Nice to meet you and thanks for the suggestions! We have a community slack as well where you can ping us if we ever miss something like this. If you're interested in hacking on some Wasm / Wasmer stuff, feel free to reach out!

}
Ok(out)
}

/// finds the number of directories between the fd and the inode if they're connected
Expand Down