-
Notifications
You must be signed in to change notification settings - Fork 260
SIGCHLD can arrive much later after waitpid() returned #1480
Comments
I would vote for using a single message for child's death, which would then cause SIGCHLD to be delivered if needed (this would happen on the receivers side i.e. in parent). |
I'm confused. #1297 talks about a race between SIGUSR1 and waitpid. The issue here talks about a race between a special-case SIGCHLD and waitpid. Do we want to solve only the "SIGCHLD vs waitpid" race or a broader issue of "any signal vs waitpid"? The first one should be easier to solve, the second one may involve some special logic (solution 3 in Isaku's list). |
In this PR, I have the "SIGCHLD vs waitpid" in mind. |
I think if we send this "child died" message via the same channel as all signals, it would fix both issues, if we handle these messages in order (since child cannot send any more signals after death). |
The current implementation of graphene may delay SIGCHLD after waitpid().
related to #1297
Description of the problem
On Linux kernel kernel, waitpid() and SIGCHLD should happen closely.
When waitpid() is returned, application expects SIGCHLD is already received if SIGCHLD is unmasked.
Such assumption is reasonable because waitpid() returning and SIGCHLD delivery are the results of the same event, the death of child process.
The current graphene implementation emulates waitpid and unix signal by its own ipc mechanism(LibOS/shim/src/ipc/). and the communication channel(named pipe on host) for waitpid and unix signal is different.
So there can be reordering or delay among IPC messages.
So SIGCHLD may arrive much later than waitpid() returns. This is not what application expects.
LTP expects the following execution flow.
-- More specifically, waitpid() returns -EINTR because of SIGCHLD,
-- the signal handler for SIGCHLD is invoked.
-- and then waitpid() is restarted because of SA_RESTART,
-- the waitpid() finds the death of child and then waitpid() returns successfully.
However on graphene, SIGCHLD can arrive later after waitpid() returns and signal handler is invoked. Then the signal handler for SIGCHLD causes SEGV.
There are several options to address this.
Steps to reproduce
LTP test cases as reported in #1297
Expected results
Actual results
The text was updated successfully, but these errors were encountered: