Skip to content
This repository has been archived by the owner on Jan 20, 2022. It is now read-only.

SIGCHLD can arrive much later after waitpid() returned #1480

Closed
yamahata opened this issue May 8, 2020 · 4 comments · Fixed by #1949
Closed

SIGCHLD can arrive much later after waitpid() returned #1480

yamahata opened this issue May 8, 2020 · 4 comments · Fixed by #1949

Comments

@yamahata
Copy link
Contributor

yamahata commented May 8, 2020

The current implementation of graphene may delay SIGCHLD after waitpid().
related to #1297

Description of the problem

On Linux kernel kernel, waitpid() and SIGCHLD should happen closely.
When waitpid() is returned, application expects SIGCHLD is already received if SIGCHLD is unmasked.
Such assumption is reasonable because waitpid() returning and SIGCHLD delivery are the results of the same event, the death of child process.

The current graphene implementation emulates waitpid and unix signal by its own ipc mechanism(LibOS/shim/src/ipc/). and the communication channel(named pipe on host) for waitpid and unix signal is different.
So there can be reordering or delay among IPC messages.
So SIGCHLD may arrive much later than waitpid() returns. This is not what application expects.
LTP expects the following execution flow.

  • setup signal handler for SIGCHLD with SA_RESTART
  • fork() and waitpid() to block.
  • child exits
  • the signal handler is invoked and waitpid() returns.
    -- More specifically, waitpid() returns -EINTR because of SIGCHLD,
    -- the signal handler for SIGCHLD is invoked.
    -- and then waitpid() is restarted because of SA_RESTART,
    -- the waitpid() finds the death of child and then waitpid() returns successfully.
  • clean up assuming the signal handler won't be invoked after waitpid().

However on graphene, SIGCHLD can arrive later after waitpid() returns and signal handler is invoked. Then the signal handler for SIGCHLD causes SEGV.

There are several options to address this.

  • use same communication channel for waitpid() and SIGCHLD so that messages won't be reordered/delayed.
  • use single message to notify waitpid() and SIGCHLD. Then receiver can process waitpid() and SIGCHLD at once.
  • somehow enforces the ordering. e.g. sequence number. special logic for this case. etc

Steps to reproduce

LTP test cases as reported in #1297

Expected results

Actual results

@yamahata yamahata changed the title waitpid() and SIGCHLD ordering SIGCHLD can arrive much later after waitpid() return May 8, 2020
@yamahata yamahata changed the title SIGCHLD can arrive much later after waitpid() return SIGCHLD can arrive much later after waitpid() returned May 8, 2020
@boryspoplawski
Copy link
Contributor

boryspoplawski commented May 10, 2020

I would vote for using a single message for child's death, which would then cause SIGCHLD to be delivered if needed (this would happen on the receivers side i.e. in parent).

@dimakuv
Copy link
Contributor

dimakuv commented May 11, 2020

I'm confused. #1297 talks about a race between SIGUSR1 and waitpid. The issue here talks about a race between a special-case SIGCHLD and waitpid.

Do we want to solve only the "SIGCHLD vs waitpid" race or a broader issue of "any signal vs waitpid"? The first one should be easier to solve, the second one may involve some special logic (solution 3 in Isaku's list).

@yamahata
Copy link
Contributor Author

In this PR, I have the "SIGCHLD vs waitpid" in mind.
I agree that "any signal vs waitpid" is also still an issue.

@boryspoplawski
Copy link
Contributor

I think if we send this "child died" message via the same channel as all signals, it would fix both issues, if we handle these messages in order (since child cannot send any more signals after death).

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants