Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does Function that return buffer with string should include trailling 0 or not #492

Closed
ptitSeb opened this issue Aug 4, 2022 · 9 comments

Comments

@ptitSeb
Copy link

ptitSeb commented Aug 4, 2022

There are a few functions, like fd_prestat_dir_name or fd_readdir that fill in buffer with string representing file names. The specs doesn't precised is the buffer include or not the trailling zero. What should it be? with or without the trailling 0?

@syrusakbary
Copy link
Contributor

syrusakbary commented Aug 12, 2022

Can this be added as point of discussion for the next WASI meeting @linclark @lukewagner?

We'd like to know which way is the proper one to make sure the Wasmer WASI implementation respect the specs, and right now is not clear on what the official way is

@linclark
Copy link
Member

@syrusakbary please follow the procedure for adding a discussion topic (adding a PR to the agenda in the meetings repo)

@syrusakbary
Copy link
Contributor

Here we go: WebAssembly/meetings#1094

@sunfishcode
Copy link
Member

wasi-libc has code for fd_prestat_dir_name and fd_readdir to insert trailing NUL's in the places where it needs them to be.

In languages other than C, strings aren't usually NUL-terminated, so their use of fd_readdir doesn't need a trailing 0.

Consequently, I propose Wasmtime's current behavior in these two instances be considered the correct behavior.

@ptitSeb
Copy link
Author

ptitSeb commented Aug 19, 2022

Well, the terminated 0 needs to be set, either on wasm side or the wash side. In my opinion, the definition of "String" should be unified accross the API. At no point is the string defined, and because there isn't any hint of "sized-based" string definition (like could be found in Pascal langage), the assumption is that string are C-like 0-terminated.

It would be good to have clarification about all the string buffers.

@linclark
Copy link
Member

@ptitSeb It might be good to have some context on what you're trying to do, as that will help understand what exact information you need.

You're right that strings aren't defined in WASI. That is by design. Instead of having a concrete definition of strings in WASI, there is an abstract string type, and that is defined in a different part of the WebAssembly standards, the component model.

As stated in the README, we are currently in the process of switching from the initial witx to wit, which is what the component model is defining.

WASI is transitioning away from the witx format and its early experimental ABI. We are transitioning to Interface Types using the wit format and the canonical ABI.

If you want to learn more about the thinking behind these types, you can read this post or dig into the component model repo.

@ptitSeb
Copy link
Author

ptitSeb commented Aug 22, 2022

I'm trying to maintain a wasi implementation.

But the spec are still not completely consistant. For example, the args_get and environ_get function does precise that string buffer returned are 0 terminated. But not for fd_prestat_dir_name or fd_readdir where it's not precised.
Similarily, the path_create_directory function, the string is not defined as 0 terminated, and the function argument are in fact a string pointer and string length?

I dug a bit the component model, but the only thing I found bout string is "list of char" which is still not precise on how you delimit the end of the string (sized or 0-terminated).

@programmerjake
Copy link
Contributor

programmerjake commented Aug 22, 2022

in the mvp of the component model, in memory, a string is two i32s: a pointer to the buffer and a length expressed as the number of code units (bytes for UTF-8 -- this is oversimplified slightly, see the code for details). I'd assume that means it is not nul-terminated, since the length is given explicitly.

see the definition of store_string in https://github.com/WebAssembly/component-model/blob/b9be93e6311873ba8234e073203c9e27f2412c71/design/mvp/CanonicalABI.md#storing

@ptitSeb
Copy link
Author

ptitSeb commented Aug 22, 2022

Yeah, so basically, If I sum-up:

  • string are a tuple, pointer + size
  • Unless stated otherwise, Inputed strings are NOT 0-terminated (because they are sized).
  • Unless stated otherwise, returned string buffer doesn't contains a 0 char at end of string.

@ptitSeb ptitSeb closed this as completed Oct 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants