Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WasmFS JS API: Implement truncate #19543

Merged
merged 26 commits into from
Jun 13, 2023

Conversation

jameshu15869
Copy link
Contributor

This PR implements truncate and ftruncate. Tests were also added to test_fs_js_api.

One thing I noticed was that both the man pages and doTruncate syscall note that truncate and ftruncate take in size as type off_t, but __syscall_truncate64 and __syscall_ftruncate64 take in size as a uint64_t. This means that passing in negative numbers directly to __syscall_truncate64 and __syscall_ftruncate64 gives a weird number and no error, so I had to add a manual check in library_wasmfs.js, which I didn't do for the other methods. Was there a specific reason that the two syscalls take in uint64_t instead of off_t?

assert(s.st_size == 8);

EM_ASM(
FS.truncate('truncatetest', 2);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As with all these changes it would be good to look for other tests that already use this API, if any? If we were previously disabling, or not running, those tests under WASMFS it would be good to enable them now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't see anything explicitly calling FS.truncate() in the test dir, so I added them to test_fs_js_api for now. Are there any places besides the test directory that could potentially contain disabled tests?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it would only be in the test dir. I guess we had not tests for that API. Thanks for looking into it. Hopefully after we are done with this we will have much better test coverage for the JS API!

truncate: (path, len) => {
if (len < 0) {
throw new FS.ErrnoError({{{ cDefs.EINVAL }}});
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If one calls truncate("foo", -1) I would probably expect it to try to resize of MAX_INT or something like that.. so maybe we don't need this check and we can just let the underlying call fail?

Copy link
Contributor Author

@jameshu15869 jameshu15869 Jun 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's what I was thinking, but I had one test that tried calling truncate with -10. I tried printing out the value the syscall receives, which was 4294867286. But instead of setting the return value to -1 and setting errno accordingly, I think the syscall aborts.

Aborted()
RuntimeError: Aborted(). Build with -sASSERTIONS for more info.
    at abort (/home/ubuntu/Documents/emsdk/emscripten/out/test/test_fs_js_api.js:1:5674)
    at _abort (/home/ubuntu/Documents/emsdk/emscripten/out/test/test_fs_js_api.js:1:14749)
    at <anonymous>:wasm-function[187]:0xd2a9
    at <anonymous>:wasm-function[197]:0xe13e
    at <anonymous>:wasm-function[172]:0xba9a
    at <anonymous>:wasm-function[178]:0xce81
    at _wasmfs_truncate (<anonymous>:wasm-function[261]:0x13101)

This is the error. It doesn't seem to match the pattern of how we've been returning JS API errors before (Normally the syscall returns -1 and I can return errno to JS, then throw an error from JS with an error name and code). Is this close to what you were saying by letting the call fail?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree we don't want to syscall aborting like that. If you build with -g or --profiling-funcs you should get functions names for those wasm functions in the backtrace BTW.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is strange because doTruncate does return -EINVAL when its input is negative. It would be good to get to the bottom of this!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed that doTruncate takes in off_t size, but truncate64 and ftruncate64 take in size as a uint64_t, so is it possible that the size is always positive in doTruncate?

Copy link
Collaborator

@kleisauke kleisauke Jun 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is needed for the Node backend, as it silently accepts negative offsets rather than failing with EINVAL, see: nodejs/node#35632.

Though, it looks like setSize() is currently not implemented for the Node backend:

int setSize(off_t size) override {
WASMFS_UNREACHABLE("TODO: implement NodeFile::setSize");
}

(i.e. this PR won't work for -sNODERAWFS + -sWASMFS)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's really interesting. I don't think I've been testing with NODEFS or NODERAWFS. Should I start adding those flags to test_fs_js_api?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If setSize() (and insertMove() due to FS.rename()) is supported in WasmFS' Node backend (which is probably better as a follow-up), you could do:

--- a/test/test_core.py
+++ b/test/test_core.py
@@ -6017,6 +6017,7 @@ Module = {
   def test_fs_writeFile(self):
     self.do_run_in_out_file_test('fs/test_writeFile.cpp')
 
+  @also_with_noderawfs
   def test_fs_js_api(self):
     self.set_setting("FORCE_FILESYSTEM")
     self.do_runf(test_file('fs/test_fs_js_api.c'), 'success')

After that, you can verify with:

$ ./test/runner wasmfs.test_fs_js_api* core2.test_fs_js_api*

Whether the test passes on both memory and Node backend, either with or without WasmFS.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After going back to the problem this morning, I realized I never checked that the right value was getting passed to js_api.cpp. I originally had _wamsfs_truncate in js_api.cpp take in size as type off_t, and for some reason negative values would not be passed in correctly. After changing off_t to long, the tests pass without making any changes to syscalls.cpp. I also tried passing in an int64_t to js_api.cpp, but that didn't seem to work. I initially wanted to use off_t to match the POSIX API, but could there be a potential problem with using that type in js_api.cpp?

Out of curiosity, I tried printing the values in syscalls.cpp, and I could print uint64_t as a signed integer using %lld. When the uint64_t then gets passed to doTruncate (Which takes in size as type off_t), does C automatically force the uint64_t to become signed? What if the unsigned value is larger than the signed max?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

off_t is defined here:

#if defined(__NEED_off_t) && !defined(__DEFINED_off_t)
typedef _Int64 off_t;
#define __DEFINED_off_t
#endif

So, it corresponds to int64_t. I just opened PR #19559 for this.

@sbc100
Copy link
Collaborator

sbc100 commented Jun 6, 2023

Regarding the syscall interface, I think for consistency all the native syscalls take primitive types.. see system/lib/libc/musl/arch/emscripten/syscall_arch.h. If changing the types of those functions to off_t doesn't break anything and causes your code to be simpler that seems reasonable to me.

Comment on lines 142 to 144
if (err == -1) {
return errno;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Syscalls don't set errno, so this doesn't look right. Syscalls return negative error codes and libc handles those by setting errno appropriately. I would expect this code to be checking whether err is a negative error code (not necessarily -1) instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible that errno is being set by libc behind the scenes? I've been doing this for the other js_api.cpp functions and the error codes match with the legacy API.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, no, all those places should be fixed. For example, in _wasmfs_mknod, it checks whether __syscall_mknodat returns -1, and if so returns errno. But __syscall_mknodat never sets errno and doesn't return -1 on errors; instead, it either returns -EINVAL or -EPERM or other negative error codes that doOpen can return.

Copy link
Contributor Author

@jameshu15869 jameshu15869 Jun 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see what went wrong now. The whole time I thought the syscalls were returning -1, so the errno values were returning correctly - turns out the if statement is always false, so the methods were just returning the syscall return value and I never checked the actual return values of the syscalls before, which made all the tests pass. Those issues have all been addressed in this PR.

Copy link
Member

@tlively tlively left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM % separable comment.

Comment on lines 140 to 147
int _wasmfs_truncate(char *path, long size) {
return __syscall_truncate64((intptr_t)path, size);
}

int _wasmfs_ftruncate(int fd, long size) {
return __syscall_ftruncate64(fd, size);
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since there's no extra logic necessary here, is it possible to skip these wrappers and just call the underlying syscall directly from the JS?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense, although I'm having trouble figuring out how to import the syscalls to library_wasmfs.js. I've tried using __syscall_truncate64 and ___syscall_truncate64 (Three underscores) and neither worked. Normally, I would add _wasmfs_truncate to emcc.py, but I don't think it worked. How should I go about making the syscall available to call?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe that could be a followup.

When calling __syscall_truncate64 from JS you would use an extra underscore... but when adding it to the settings.REQUIRED_EXPORTS list in emcc.py you would not.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doing this as a follow-up sounds good to me.

@jameshu15869
Copy link
Contributor Author

As a side note, should we worry about the issue with using long in place of some 64 bit integer (Either uint64_t, int64_t, or off_t)? Would it make sense to wait for #19559 and try truncate and ftruncate with those types?

@sbc100
Copy link
Collaborator

sbc100 commented Jun 8, 2023

I'm hoping we can land #19559 today BTW.. since that tests already passed.

emcc.py Outdated
@@ -2338,6 +2338,8 @@ def phase_linker_setup(options, state, newargs):
'_wasmfs_read',
'_wasmfs_pread',
'_wasmfs_symlink',
'__syscall_truncate64',
'__syscall_ftruncate64',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was there a reason to use the raw syscalls and not a _wasmfs_* wrapper? Wrappers have the benefit that changes to the syscalls don't affect them.

Copy link
Contributor Author

@jameshu15869 jameshu15869 Jun 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thomas suggested a little earlier that since truncate doesn't have any additional processing, it might make sense to directly use the raw syscall. However, I do see your point that a wrapper would be beneficial. Would it be preferable to go back to calling the wrapper?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think for now you should continue to use wrapper functions and we can separately/later consider weather they are actually needed since (IIUC) these are not the only once that are simple pass-through functions?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't block landing anything on this conversation, but I still think pass-through wrappers aren't that useful. If we update the syscalls (which should be extremely rare), then we would still have to update their callers. Whether those callers are in C or JS doesn't make much of a difference as far as I can tell.

return -errno;
}
return err;
return __syscall_openat(AT_FDCWD, (intptr_t)path, flags, mode);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these bugfixes?

Copy link
Contributor Author

@jameshu15869 jameshu15869 Jun 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We realized that this check isn't needed, since the syscalls do not actually set errno. The tests were passing before because err was never equal to -1, so the js_api.cpp function directly returned the error without using errno, giving the desired effect.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you make a separate PR for this cleanup stuff?

@kripken kripken merged commit 90858c8 into emscripten-core:main Jun 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants