Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

usage of vfork() is non-compliant #22

Open
green-nsk opened this issue Aug 1, 2022 · 11 comments
Open

usage of vfork() is non-compliant #22

green-nsk opened this issue Aug 1, 2022 · 11 comments

Comments

@green-nsk
Copy link

green-nsk commented Aug 1, 2022

  • Version: julia-uv2-1.44.1
  • Platform: linux

Julia version of libuv uses vfork() for uv_spawn() implementation, but contrary to vfork() manpage it manages file descriptors and signal handlers prior to calling exec(). This might work fine with vanilla kernel/libc, but breaks in our environment. We use custom code for accelerated network sockets, which breaks when the child process shares address space with the parent process.

https://github.com/JuliaLang/libuv/blob/julia-uv2-1.44.1/src/unix/process.c#L990

Standard description
       (From POSIX.1) The vfork() function has the same effect as
       fork(2), except that the behavior is undefined if the process
       created by vfork() either modifies any data other than a variable
       of type pid_t used to store the return value from vfork(), or
       returns from the function in which vfork() was called, or calls
       any other function before successfully calling _exit(2) or one of
       the exec(3) family of functions.
@Keno
Copy link
Member

Keno commented Aug 1, 2022

The code in the linux backend is assuming linux/glibc semantics not POSIX. It's unlikely we'll move away from this model. Can you elaborate what breaks? We may be able to suggest workarounds.

@green-nsk
Copy link
Author

There're a lot of things that break, so I think the simplest workaround for us would be to patch uv_spawn() and replace vfork() with fork() (haven't tried it yet, but I expect it should work, with maybe some minor modifications).

In broad strokes, the implementation of accelerated sockets relies on keeping extra information about file descriptors and signals in userspace (and intercepting some system calls). When the child process shares userspace with the parent, those don't work as expected.

@green-nsk
Copy link
Author

green-nsk commented Aug 1, 2022

Take for example signal()/sigaction() calls. Supplied signal handler is stored in user-space. When a parent has a signal handler stored, and child process attempts to reset signal handler back to SIG_DFL here, it ends up "resetting" parent signal handler as well. I haven't investigated other calls in much detail, but I expect the general scenario to be similar.

We're hitting this problem with exasock network stack from Cisco/ex-Exablaze. Here's relevant signal()/sigaction() code. But there're others with a similar architecture that may be hit by similar problems, e.g. openonload from Xilinx/ex-SolarFlare

@Keno
Copy link
Member

Keno commented Aug 1, 2022

Yes, I'm familiar with those techniques - it's usually a huge pile of hacks, that make assumptions that don't hold in any even reasonably complicated code base. Our recommendation to vendors we've worked with on these kinds of issues is to make sure to have an API that does not rely on any interception tricks that runtime systems that manage more complicated state can opt into.

@ancapdev
Copy link

ancapdev commented Aug 1, 2022

Yes, I'm familiar with those techniques - it's usually a huge pile of hacks, that make assumptions that don't hold in any even reasonably complicated code base. Our recommendation to vendors we've worked with on these kinds of issues is to make sure to have an API that does not rely on any interception tricks that runtime systems that manage more complicated state can opt into.

Hi @Keno. Agreed on the ideal state and what ought to be. However, vendors cater to customers, and a large fraction of customers want a drop in preload library that transparently runs their network stack faster. This has been the de facto solution to user space networking for quite some time, and while it often comes with some gotchas and limitations, I've also seen plenty complicated code bases run perfectly fine over it.

Out of interest, for this specific issue, what was the rationale for using vfork on linux?

@Keno
Copy link
Member

Keno commented Aug 1, 2022

Out of interest, for this specific issue, what was the rationale for using vfork on linux?

vfork is significantly faster, particularly on applications that have a lot of memory mappings (as julia applications often do), because the page tables need not be copied.

@ancapdev
Copy link

ancapdev commented Aug 1, 2022

vfork is significantly faster, particularly on applications that have a lot of memory mappings (as julia applications often do), because the page tables need not be copied.

Thanks, that makes sense.

@vtjnash
Copy link
Member

vtjnash commented Aug 1, 2022

I have some code that switches to using posix_spawn from glibc, but note that also requires vfork, and the implementation looks pretty similar to our implementation here. But perhaps it would avoid the symbol imposition from happening with the signal code?

@green-nsk
Copy link
Author

I would expect system calls happening within posix_spawn() won't be intercepted by the exasock. That way, the parent process memory will not be mutated. In that sense, using posix_spawn() would be better for our case.

That said, the state of signals/file descriptors tracked by exasock inside the child process may diverge from their "true" state. This may backfire in more subtle ways. but realistically I don't expect that to be a problem.

In the meanwhile, switching from vfork() to fork() has indeed fixed all the issues we're seeing.

@rshpount
Copy link

rshpount commented Mar 12, 2023

I think some of the process initialization code looks unstable in relation to vfork. Specifically, https://github.com/JuliaLang/libuv/blob/julia-uv2-1.44.2/src/unix/process.c#L450 and https://github.com/JuliaLang/libuv/blob/julia-uv2-1.44.2/src/unix/process.c#L452. These two lines modify the pipes array in the parent process memory and replace the file handles with new handles created by F_DUPFD in the child process. These handles do not exist in the parent space, so uv__process_open_stream will fail when attempting to close the non-existing handle in https://github.com/JuliaLang/libuv/blob/julia-uv2-1.44.2/src/unix/process.c#L221.

@Keno
Copy link
Member

Keno commented Mar 12, 2023

I agree, that looks like a bug

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants