Skip to content
This repository has been archived by the owner on Jan 20, 2022. It is now read-only.

try_process_exit() should wait for the async helper thread to exit #440

Closed
yamahata opened this issue Jan 26, 2019 · 4 comments · Fixed by gramineproject/gramine#1992
Closed

Comments

@yamahata
Copy link
Contributor

This issue is to track the issue found by #321
try_process_exit() should wait for async helper thread to exit and then release their reference count for final resource release.
Now there is TODO comment to do so because it requires new PAL interface to wait other thread to exit.

  • define Pal interface to wait for a thread to exit
  • implement it in Pal (at least Pal/Linux and Pal/Linux-SGX)
    • stub function as nop
  • fix try_process_exit()
yamahata added a commit to yamahata/graphene that referenced this issue Jan 26, 2019
In final squashed commit log,
gramineproject#440
should be mentioned.

Signed-off-by: Isaku Yamahata <[email protected]>
@donporter
Copy link
Contributor

+1

mkow pushed a commit to yamahata/graphene that referenced this issue Feb 5, 2019
Using put_thread(self) to free resources by itself is problematic because the exiting thread is still using them. Actually, put_thread(self) logic in ipc/async helper may cause SEGV depending on who is the last to put_thread(). If the thread is the last one to free thread area, lock/unlock after freeing thread area causes SEGV due to debug message.

The solution is to keep reference count of those two helper threads by a (usually master) thread so that the (master) thread will be the last thread to release the reference count. This way self destruction can be avoided.

To completely fix this issue, helper threads need to be waited to exit.
The issue is tracked by gramineproject#440.

Signed-off-by: Isaku Yamahata <[email protected]>
@mkow
Copy link
Member

mkow commented Feb 5, 2019

FYI: #321 was just merged to master.

@dimakuv
Copy link

dimakuv commented Jun 5, 2019

Some additional info on a related issue in code.

The termination of IPC Helper Thread checks for the state HELPER_HANDEDOVER (which would be better called HELPER_PENDING_SHUTDOWN). This state is set only via try_process_exit() -> exit_with_ipc_helper() when the process is supposed to exit/shutdown/terminate (think SIGKILL or end of execve).

The benign issue is that termination functionality is unnecessarily duplicated:

int try_process_exit (int error_code, int term_signal) {
... /* code below simplified */
    int ret = exit_with_ipc_helper(true, &ipc_thread);
    shim_clean(ret);
}

static void shim_ipc_helper (void * arg) {  /* in newer commits may be called shim_ipc_helper_end */
...
end:
    COMPILER_BARRIER();
    if (ipc_helper_state == HELPER_HANDEDOVER) {
        debug("ipc helper thread is the last thread, process exiting\n");
        shim_terminate(0); // Same as shim_clean(), but this is the official termination function
    }
...
}

So, try_process_exit() forces the state of IPC helper thread to be HELPER_HANDEDOVER which results in the thread performing shim_terminate(). But try_process_exit() itself is supposed to perform shim_clean() -- so two functions race on who will exit the process.

It's a benign issue but must be fixed to only have one point of process termination. (See also PR #648)

@boryspoplawski
Copy link
Contributor

Update of the issue wrt the current state of Graphene:
the issue is still present, but can be fixed with the same manner as #2313 fixed it for IPC worker thread. The problem is that it requires rewriting async thread, which needs to be done, but is not a big priority.

mwkmwkmwk pushed a commit to mwkmwkmwk/gramine that referenced this issue Sep 3, 2024
The async worker thread is currently spawned on-demand and terminated
after an idle timeout. This is entirely unnecessary — the cost of having
a persistent async thread is near-0, and it simplifies the logic.

Fixes gramineproject/graphene#440.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants