Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add function to get nul-terminated strings from memory #1092

Merged

Conversation

srenatus
Copy link
Contributor

@srenatus srenatus commented Dec 20, 2019

Description

This is meant to fix #1086.

❓ I'm a bit new to this -- is this how you'd do it? Happy to take directions! πŸ˜ƒ

I've added a note regarding the special case in the comment.

Are there existing tests to expand?

Review

  • Add a short description of the the change to the CHANGELOG.md file

@srenatus srenatus force-pushed the sr/null-terminated-utf8-string-from-wasmptr branch from c43b96f to a7e10be Compare December 20, 2019 15:59
@srenatus
Copy link
Contributor Author

πŸ€” Not sure what happens on travis: https://travis-ci.org/wasmerio/wasmer/builds/627783425#L333 -- when I flip this locally, I get the opposite error, expected i8, found u8.

@srenatus srenatus force-pushed the sr/null-terminated-utf8-string-from-wasmptr branch from a7e10be to 4e6b605 Compare December 20, 2019 16:25
Copy link
Contributor

@MarkMcCaskey MarkMcCaskey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! It looks good, but there is a bit of an issue with this implementation:

CStr::from_ptr requires that there's a null-terminator which we don't check here. That means that the entire function has to be unsafe -- an unsafe function means that the caller has to check the documentation and make sure to up-hold certain invariants, if those invariants are broken then bad things may happen. Leaking unsafety by calling unsafe without guaranteeing all the invariants are up-held makes the API "unsound", which just means that through normal Rust use someone could cause their program to crash.

Looking at the docs of https://doc.rust-lang.org/std/ffi/struct.CStr.html#method.from_ptr ; there are a few things we need to make sure we handle:

  • There is no guarantee to the validity of ptr.
  • The returned lifetime is not guaranteed to be the actual lifetime of ptr.
  • There is no guarantee that the memory pointed to by ptr contains a valid nul terminator byte at the end of the string.
  • It is not guaranteed that the memory pointed by ptr won't change before the CStr has been destroyed.

The first one is almost guaranteed but we need to do bounds checking on memory ourselves here. As-is this function can access out of bounds memory, both at the start and end of the string (e.g. if there is no nul byte in wasm memory, it will continue to read past the end of Wasm memory).

The second one is good because of the lifetime on get_utf8_string_null_terminated<'a>(self, memory: &'a Memory) -> Option<&'a str> this says that the string is valid for as long as &Memory is valid.

The third one is not checked by us or anything we call.

The fourth one is guaranteed for the sake of this function. I'm realizing now (and a little bit before), that our implementation of memory access is unsound. It's possible to mutate Wasm memory with a &Memory. I'll add it to my todo list to look into how invasive fixing that is.

A common pattern in Rust is to provide 2 versions of methods like this, one is unsafe and does minimal or no error checking, the other is safe and cannot be used incorrectly. It's appealing to add an unsafe version of this function, but the worst case for mistakes with this kind of function is very, very bad, so I think we should wait until someone identifies this as a bottleneck in their system and reports it to us before exposing something that dangerous.


Okay lets take a look at what kinds of changes we can make to improve this:

So it looks like based on the functions that CStr offers, that we'll have to find the length ourselves by looking for the nul byte. If that's the case, then it seems like we can split this new function into two logical parts:

  1. find the length by scanning for nul
  2. call get_utf8_string with our length

So something like

memory.view::<u8>()[self.offset..].iter().enumerate().find(|(_, byte)| *byte == 0).and_then(|(length, _)| self.get_utf8_string(memory, length))

Should do it! And without any unsafe, too!

What this does is get a view of memory starting at offset, iterates through it enumerating each byte (so it says this is byte 0, byte 1, byte 2...), finds the first byte that's 0 and returns an Option of item that matches (our (length, byte_value) pair) if it found it or None if it couldn't find it. We then use and_then to take the value out of our Option, if it exists, and call another function which could fail (get_utf8_string), and_then then combines the Option, so if the second function fails, we get a final result of None and if it succeeds then we get Some(&our str)!

Let me know if you'd like more explanation on the specifics of the implementation or if you think you've found a better way to do it!

One other minor point is that the 0-value character itself is usually called nul, null is usually used when referring to pointers. It's an old convention, but the rust docs for CStr also use nul instead of null, so I think it'd be good to use it too.

@MarkMcCaskey
Copy link
Contributor

Oh, one note about our linting/formatting: we stay 1 stable release behind the latest stable, so you'll have to format it with Rust 1.39 (rustup install 1.39.0 to install it and then run rustup override set 1.39.0 in the wasmer directory so the tools know to always use that version in this folder)

@srenatus
Copy link
Contributor Author

@MarkMcCaskey thanks for the insightful response! I'll try to get to it shortly. πŸ˜ƒ

@srenatus srenatus force-pushed the sr/null-terminated-utf8-string-from-wasmptr branch 3 times, most recently from 74c6445 to fa61976 Compare December 20, 2019 18:49
@srenatus
Copy link
Contributor Author

@MarkMcCaskey Can you have another look? I've landed on a slight alteration of your proposal, and renamed the null to nul. Thanks again for taking the time to lay out how unsafe is to be handled in Rust, I've learnt something today! πŸ’‘

Let's see what travis thinks of this. (The error befor wasn't linting related, was it? But it seems like I'm unable to run make spectests-singlepass locally πŸ€”)

@srenatus srenatus force-pushed the sr/null-terminated-utf8-string-from-wasmptr branch from fa61976 to 36226fa Compare December 20, 2019 18:55
@MarkMcCaskey
Copy link
Contributor

@srenatus Singlepass requires nightly! everything else works with stable 1.39! Sounds good, I'll review it!

Copy link
Contributor

@MarkMcCaskey MarkMcCaskey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, that's great! I wasn't familiar with the position method on iterators, but that's exactly what we needed here.

Great work!

One last, thing can you add an entry into the changelog for this as it's a new public API?

Once these last few points are addressed we can ship it!

/// Get a UTF-8 string representation of this `WasmPtr`, where the string is nul-terminated.
/// Note that this does not account for UTF-8 strings that _contain_ nul themselves,
/// [`get_utf8_string`] has to be used for those.
pub fn get_utf8_string_nul_terminated<'a>(self, memory: &'a Memory) -> Option<&'a str> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we change the name of this to get_utf8_string_with_nul to follow the naming conventions of the methods on CStr?

.iter()
.map(|cell| cell.get())
.position(|byte| byte == 0)
.and_then(|length| self.get_utf8_string(memory, length.try_into().unwrap()))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⛳️
length.try_into().unwrap() -> length as u32

try_into is better for being super generic but Wasmer is written in a way that there's no chance that it will work on systems with pointers smaller than 32 bits, so it's safe to just do the cast directly! If we want to run on 16bit machines in the future, we'll have to do a ton of refactoring anyways!

@srenatus srenatus changed the title Add function to get NULL-terminated strings from memory Add function to get nul-terminated strings from memory Dec 20, 2019
@srenatus srenatus force-pushed the sr/null-terminated-utf8-string-from-wasmptr branch from 36226fa to 782be5b Compare December 20, 2019 20:53
@srenatus
Copy link
Contributor Author

Done! πŸ˜ƒ

Copy link
Contributor

@MarkMcCaskey MarkMcCaskey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks perfect -- thanks!!

@MarkMcCaskey
Copy link
Contributor

bors r+

bors bot added a commit that referenced this pull request Dec 20, 2019
1092: Add function to get nul-terminated strings from memory r=MarkMcCaskey a=srenatus

## Description

This is meant to fix #1086.

❓ I'm a bit new to this -- is this how you'd do it? Happy to take directions! πŸ˜ƒ 

I've added a note regarding the special case in the comment.

Are there existing tests to expand?

## Review

- [x] Add a short description of the the change to the CHANGELOG.md file


Co-authored-by: Stephan Renatus <[email protected]>
@bors
Copy link
Contributor

bors bot commented Dec 20, 2019

Build succeeded

@bors bors bot merged commit 782be5b into wasmerio:master Dec 20, 2019
@srenatus srenatus deleted the sr/null-terminated-utf8-string-from-wasmptr branch December 21, 2019 07:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add method for reading C-style strings from WasmPtr
2 participants