-
Notifications
You must be signed in to change notification settings - Fork 824
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add function to get nul-terminated strings from memory #1092
Add function to get nul-terminated strings from memory #1092
Conversation
c43b96f
to
a7e10be
Compare
π€ Not sure what happens on travis: https://travis-ci.org/wasmerio/wasmer/builds/627783425#L333 -- when I flip this locally, I get the opposite error, expected |
a7e10be
to
4e6b605
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR! It looks good, but there is a bit of an issue with this implementation:
CStr::from_ptr
requires that there's a null-terminator which we don't check here. That means that the entire function has to be unsafe -- an unsafe
function means that the caller has to check the documentation and make sure to up-hold certain invariants, if those invariants are broken then bad things may happen. Leaking unsafety by calling unsafe
without guaranteeing all the invariants are up-held makes the API "unsound", which just means that through normal Rust use someone could cause their program to crash.
Looking at the docs of https://doc.rust-lang.org/std/ffi/struct.CStr.html#method.from_ptr ; there are a few things we need to make sure we handle:
- There is no guarantee to the validity of ptr.
- The returned lifetime is not guaranteed to be the actual lifetime of ptr.
- There is no guarantee that the memory pointed to by ptr contains a valid nul terminator byte at the end of the string.
- It is not guaranteed that the memory pointed by ptr won't change before the CStr has been destroyed.
The first one is almost guaranteed but we need to do bounds checking on memory ourselves here. As-is this function can access out of bounds memory, both at the start and end of the string (e.g. if there is no nul byte in wasm memory, it will continue to read past the end of Wasm memory).
The second one is good because of the lifetime on get_utf8_string_null_terminated<'a>(self, memory: &'a Memory) -> Option<&'a str>
this says that the string is valid for as long as &Memory
is valid.
The third one is not checked by us or anything we call.
The fourth one is guaranteed for the sake of this function. I'm realizing now (and a little bit before), that our implementation of memory access is unsound. It's possible to mutate Wasm memory with a &Memory
. I'll add it to my todo list to look into how invasive fixing that is.
A common pattern in Rust is to provide 2 versions of methods like this, one is unsafe
and does minimal or no error checking, the other is safe and cannot be used incorrectly. It's appealing to add an unsafe
version of this function, but the worst case for mistakes with this kind of function is very, very bad, so I think we should wait until someone identifies this as a bottleneck in their system and reports it to us before exposing something that dangerous.
Okay lets take a look at what kinds of changes we can make to improve this:
So it looks like based on the functions that CStr
offers, that we'll have to find the length ourselves by looking for the nul
byte. If that's the case, then it seems like we can split this new function into two logical parts:
- find the length by scanning for
nul
- call
get_utf8_string
with our length
So something like
memory.view::<u8>()[self.offset..].iter().enumerate().find(|(_, byte)| *byte == 0).and_then(|(length, _)| self.get_utf8_string(memory, length))
Should do it! And without any unsafe
, too!
What this does is get a view of memory starting at offset, iterates through it enumerating each byte (so it says this is byte 0, byte 1, byte 2...), finds the first byte that's 0 and returns an Option of item that matches (our (length, byte_value)
pair) if it found it or None
if it couldn't find it. We then use and_then
to take the value out of our Option, if it exists, and call another function which could fail (get_utf8_string), and_then
then combines the Option
, so if the second function fails, we get a final result of None
and if it succeeds then we get Some(&our str)
!
Let me know if you'd like more explanation on the specifics of the implementation or if you think you've found a better way to do it!
One other minor point is that the 0-value character itself is usually called nul
, null
is usually used when referring to pointers. It's an old convention, but the rust docs for CStr
also use nul
instead of null
, so I think it'd be good to use it too.
Oh, one note about our linting/formatting: we stay 1 stable release behind the latest stable, so you'll have to format it with Rust 1.39 ( |
@MarkMcCaskey thanks for the insightful response! I'll try to get to it shortly. π |
74c6445
to
fa61976
Compare
@MarkMcCaskey Can you have another look? I've landed on a slight alteration of your proposal, and renamed the null to nul. Thanks again for taking the time to lay out how Let's see what travis thinks of this. (The error befor wasn't linting related, was it? But it seems like I'm unable to run |
fa61976
to
36226fa
Compare
@srenatus Singlepass requires nightly! everything else works with stable 1.39! Sounds good, I'll review it! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow, that's great! I wasn't familiar with the position
method on iterators, but that's exactly what we needed here.
Great work!
One last, thing can you add an entry into the changelog for this as it's a new public API?
Once these last few points are addressed we can ship it!
lib/runtime-core/src/memory/ptr.rs
Outdated
/// Get a UTF-8 string representation of this `WasmPtr`, where the string is nul-terminated. | ||
/// Note that this does not account for UTF-8 strings that _contain_ nul themselves, | ||
/// [`get_utf8_string`] has to be used for those. | ||
pub fn get_utf8_string_nul_terminated<'a>(self, memory: &'a Memory) -> Option<&'a str> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we change the name of this to get_utf8_string_with_nul
to follow the naming conventions of the methods on CStr?
lib/runtime-core/src/memory/ptr.rs
Outdated
.iter() | ||
.map(|cell| cell.get()) | ||
.position(|byte| byte == 0) | ||
.and_then(|length| self.get_utf8_string(memory, length.try_into().unwrap())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
β³οΈ
length.try_into().unwrap()
-> length as u32
try_into
is better for being super generic but Wasmer is written in a way that there's no chance that it will work on systems with pointers smaller than 32 bits, so it's safe to just do the cast directly! If we want to run on 16bit machines in the future, we'll have to do a ton of refactoring anyways!
Fixes wasmerio#1086. Signed-off-by: Stephan Renatus <[email protected]>
36226fa
to
782be5b
Compare
Done! π |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks perfect -- thanks!!
bors r+ |
1092: Add function to get nul-terminated strings from memory r=MarkMcCaskey a=srenatus ## Description This is meant to fix #1086. β I'm a bit new to this -- is this how you'd do it? Happy to take directions! π I've added a note regarding the special case in the comment. Are there existing tests to expand? ## Review - [x] Add a short description of the the change to the CHANGELOG.md file Co-authored-by: Stephan Renatus <[email protected]>
Build succeeded
|
Description
This is meant to fix #1086.
β I'm a bit new to this -- is this how you'd do it? Happy to take directions! π
I've added a note regarding the special case in the comment.
Are there existing tests to expand?
Review