Skip to content
This repository has been archived by the owner on Jan 20, 2022. It is now read-only.

[LibOS] Deliver signal to only one thread #1510

Merged
merged 2 commits into from
Jun 8, 2020
Merged

Conversation

boryspoplawski
Copy link
Contributor

@boryspoplawski boryspoplawski commented May 15, 2020

Description of the changes

Previously Graphene delivered each signal to all threads instead of
one. This was additionally abused to quit the application. Now each
signal is delivered to only one arbitrary thread. This commit
additionally cleans the signal and exiting code.

How to test this PR?

Added new test.


This change is Reviewable

Copy link
Contributor

@yamahata yamahata left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although I know this PR is still work-in-progress...

Reviewable status: 0 of 15 files reviewed, 2 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: ITL) (waiting on @boryspoplawski)


LibOS/shim/src/bookkeep/shim_signal.c, line 639 at r1 (raw file):

    if (tcb->tp->time_to_die) {
        thread_exit(0, 0);

This is dangerous because thread can be executing libos or PAL with some resource(e.g. spinlock) acquired.


LibOS/shim/src/sys/shim_sigaction.c, line 313 at r1 (raw file):

        .cmp_type = TGID,
    };
    int ret = walk_thread_list(_signal_one_thread, &arg, /*one_shot=*/true);

When TGID case, this logic delivers signal to the first found thread irrelevant of its signal mask.
It should search the thread with given pgid with unmasked if possible.
If all threads with pgid masked the signal, pick one.

Maybe such change make this PR more complex. So we can attack after this PR. So I don't block it.

Copy link
Contributor Author

@boryspoplawski boryspoplawski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 20 files reviewed, 1 unresolved discussion, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: ITL) (waiting on @yamahata)


LibOS/shim/src/bookkeep/shim_signal.c, line 639 at r1 (raw file):

Previously, yamahata wrote…

This is dangerous because thread can be executing libos or PAL with some resource(e.g. spinlock) acquired.

Done. Moved to the same place signals are handled so this should not be an issue now.


LibOS/shim/src/sys/shim_sigaction.c, line 313 at r1 (raw file):

Previously, yamahata wrote…

When TGID case, this logic delivers signal to the first found thread irrelevant of its signal mask.
It should search the thread with given pgid with unmasked if possible.
If all threads with pgid masked the signal, pick one.

Maybe such change make this PR more complex. So we can attack after this PR. So I don't block it.

This was WIP and now is reworked (added per process signal queue).

@boryspoplawski
Copy link
Contributor Author

Jenkins, test Jenkins-Debug-18.04 please (mbedcrypto download failed)

@boryspoplawski
Copy link
Contributor Author

Jenkins, test Jenkins-Debug-18.04 please (ltp:fork11 failed, it received a SIGPIPE on writing to a pipe with remote end dead; in proc_write called by broadcast_ipc to announce that this process is exiting; this seems unrelated to the patch; I was able to trigger it even in gdb on ubuntu16, but I don't have time to debug it now)

@boryspoplawski
Copy link
Contributor Author

The above error is unrelated and I was able to trigger it on current master (see issue #1534)

Copy link
Contributor

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 3 of 14 files at r2, 5 of 13 files at r3.
Reviewable status: 8 of 24 files reviewed, 21 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: ITL), "fixup! " found in commit messages' one-liners (waiting on @boryspoplawski and @yamahata)


Documentation/devel/signal-handling.rst, line 746 at r3 (raw file):

* BUG? Graphene LibOS performs `DkThreadYieldExecution()` in `__handle_signal()` (i.e., yield
  thread execution after handling one pending signal). Looks useless.

FYI: This whole file can be retired now. It was written by me ~2 years ago, just to wrap my head around the flows. It's not useful to anyone except a couple of developers.


LibOS/shim/include/shim_internal.h, line 187 at r3 (raw file):

        int64_t preempt = get_cur_preempt();                \
        __UNUSED(preempt);                                  \
        /* handle_signals(); */                             \

What about we just remove it from here? I see no reason we ever switch to "handle signals before a syscall".


LibOS/shim/include/shim_thread.h, line 46 at r3 (raw file):

struct shim_signal_queue {
    struct shim_signal* standard_signal_queues[SIGRTMIN - 1];

This name confused me later. Cannot we rename to simply standard_signals? There is no concept of a queue for 1..31 really.


LibOS/shim/src/shim_parser.c, line 428 at r3 (raw file):

    return str;
}

Please refactor and code-style this function.


LibOS/shim/src/shim_parser.c, line 842 at r3 (raw file):

    else
        PRINTF("[SIG %d]", signum);
}

Why not simply:

int signum = va_arg(ap, int);
PUTS(signal_name(signum));

LibOS/shim/src/bookkeep/shim_signal.c, line 55 at r3 (raw file):

static struct shim_signal_queue process_signal_queue = { 0 };
static uint64_t process_pending_signals = 0;

Do I understand correctly that on fork, these two vars are zeroed out?


LibOS/shim/src/bookkeep/shim_signal.c, line 57 at r3 (raw file):

static uint64_t process_pending_signals = 0;

/* This checks are racy, but we can't do better anyway: signal can be delivered in any moment. */

This -> These


LibOS/shim/src/bookkeep/shim_signal.c, line 99 at r3 (raw file):

                                          __ATOMIC_RELAXED, __ATOMIC_RELAXED));
    return true;
}

I don't understand what this function tries to achieve. We replace the queue (signal at the head of the queue) with a signal only if there is no old signal. Why?

Update: Ok, now I get it. The queue is actually a single standard signal, not a queue of 31 standard signals. So we simply put with a new one (if there was no old signal). What confused me is the argument name queue. Could we rename to smth like signal_slot?


LibOS/shim/src/bookkeep/shim_signal.c, line 120 at r3 (raw file):

    queue->queue[put_idx % ARRAY_SIZE(queue->queue)] = signal;
    return true;
}

You're assuming that get_idx/put_idx never overflow during lifetime of an app, right?


LibOS/shim/src/bookkeep/shim_signal.c, line 139 at r3 (raw file):

        (void)__atomic_add_fetch(&thread->pending_signals, 1, __ATOMIC_RELAXED);
    }
    return ret;

Is it normal that we return false if we wanted to add a standard signal (e.g. SIGINT) but there was already one pending? The usage of this function seems to print a warning that the signal queue is full. Is this the correct behavior?


LibOS/shim/src/bookkeep/shim_signal.c, line 150 at r3 (raw file):

}

static struct shim_signal* queue_consume_standard_signal(struct shim_signal** queue) {

Ditto (rename queue)


LibOS/shim/src/bookkeep/shim_signal.c, line 158 at r3 (raw file):

        }
    } while (!__atomic_compare_exchange_n(queue, &old, NULL, /*weak=*/true,
                                          __ATOMIC_RELAXED, __ATOMIC_RELAXED));

Cannot you replace the whole load + cmxhg with one __atomic_exchange_n? You're checking for NULL and replacing with NULL anyway.


LibOS/shim/src/bookkeep/shim_signal.c, line 169 at r3 (raw file):

        assert(put_idx >= get_idx);

        if (put_idx <= get_idx) {

Why <= and not ==? You already have an assert above.


LibOS/shim/src/bookkeep/shim_signal.c, line 178 at r3 (raw file):

}

static struct shim_signal* queue_consume_signal(int sig, struct shim_signal_queue* queue) {

Could you switch the order of two arguments, for consistency with the rest of the code


LibOS/shim/src/bookkeep/shim_signal.c, line 248 at r3 (raw file):

    } else {
        __handle_signal(tcb, sig);
        __handle_one_signal(tcb, sig, signal);

Why did you change this old logic? Previously, if the signal was delivered and Graphene was able to handle it immediately (not nested and not blocked), then Graphene handles this one signal. Now you changed it to "handle all signals" and then "handle this new signal".


LibOS/shim/src/bookkeep/shim_signal.c, line 275 at r3 (raw file):

    if (preempt > 1 || __sigismember(&cur_thread->signal_mask, sig)) {
        signal = malloc_copy(signal,sizeof(struct shim_signal));
        if (!signal || !append_thread_signal(cur_thread, signal)) {

These are two different cases, please separate into two IFs.


LibOS/shim/src/bookkeep/shim_signal.c, line 276 at r3 (raw file):

        signal = malloc_copy(signal,sizeof(struct shim_signal));
        if (!signal || !append_thread_signal(cur_thread, signal)) {
            debug("signal queue is full (TID = %u, SIG = %d)\n", tcb->tid, sig);

See my other comment: you will print this message when two standard signals arrive (the second one will be skipped). Do we still want to print this message (it sounds kinda scary)?


LibOS/shim/src/bookkeep/shim_signal.c, line 790 at r3 (raw file):

        __handle_one_signal(tcb, sig, signal);
        free(signal);
        DkThreadYieldExecution();

I never understood the need for this yield. Do you have any other insights why it was added here?


LibOS/shim/src/bookkeep/shim_signal.c, line 944 at r3 (raw file):

    __UNUSED(obj);
    __UNUSED(size);
    __UNUSED(objp);

Did you intentionally skip the typical logic in CP functions of "first check if this was already checkpointed, then do the checkpoint if not"? I believe this check is not needed, so this code should be correct.


LibOS/shim/src/bookkeep/shim_signal.c, line 995 at r3 (raw file):

        if (!infos[i].si_signo) {
            continue;
        }

How is this possible that infos[i].si_signo == 0? This can't happen, no? Either remove the whole IF or add an assert.


LibOS/shim/src/bookkeep/shim_thread.c, line 48 at r3 (raw file):

//#define DEBUG_REF

#ifdef DEBUG_REF

TODO(dimakuv): Continue review from here.

Copy link
Contributor Author

@boryspoplawski boryspoplawski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 8 of 24 files reviewed, 22 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: ITL), "fixup! " found in commit messages' one-liners (waiting on @boryspoplawski, @dimakuv, and @yamahata)


Documentation/devel/signal-handling.rst, line 746 at r3 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

FYI: This whole file can be retired now. It was written by me ~2 years ago, just to wrap my head around the flows. It's not useful to anyone except a couple of developers.

As I'm not familiar with its content I'll let you decide: should I remove it in this PR?


LibOS/shim/include/shim_internal.h, line 187 at r3 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

What about we just remove it from here? I see no reason we ever switch to "handle signals before a syscall".

Done.


LibOS/shim/include/shim_thread.h, line 46 at r3 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

This name confused me later. Cannot we rename to simply standard_signals? There is no concept of a queue for 1..31 really.

I just named it so for consistency (as this can be thought of as one entry queue), but you are right, it's more confusing this way. Done.


LibOS/shim/src/shim_parser.c, line 428 at r3 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Please refactor and code-style this function.

Done. Previously this was just moved from another place/file.


LibOS/shim/src/shim_parser.c, line 842 at r3 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Why not simply:

int signum = va_arg(ap, int);
PUTS(signal_name(signum));

Don't ask me :)
Done.


LibOS/shim/src/bookkeep/shim_signal.c, line 55 at r3 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Do I understand correctly that on fork, these two vars are zeroed out?

Yes (if they are not, this is a bug).


LibOS/shim/src/bookkeep/shim_signal.c, line 57 at r3 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

This -> These

Done.


LibOS/shim/src/bookkeep/shim_signal.c, line 99 at r3 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

I don't understand what this function tries to achieve. We replace the queue (signal at the head of the queue) with a signal only if there is no old signal. Why?

Update: Ok, now I get it. The queue is actually a single standard signal, not a queue of 31 standard signals. So we simply put with a new one (if there was no old signal). What confused me is the argument name queue. Could we rename to smth like signal_slot?

Renamed it according to a comment near its definition, let me know if it suits.


LibOS/shim/src/bookkeep/shim_signal.c, line 120 at r3 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

You're assuming that get_idx/put_idx never overflow during lifetime of an app, right?

Yeah, adding signals with 1GHz (10**9 signals per second) gives a 544 years running time before overflow. Looks like a safe margin, but I would gladly change it to a better solution, if you have one (added a comment for now).


LibOS/shim/src/bookkeep/shim_signal.c, line 139 at r3 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Is it normal that we return false if we wanted to add a standard signal (e.g. SIGINT) but there was already one pending? The usage of this function seems to print a warning that the signal queue is full. Is this the correct behavior?

This is exactly what return value of this function is supposed to mean: whether a signal was added to the queue (even if queue size is 1).
What do you mean with "correct behavior"?


LibOS/shim/src/bookkeep/shim_signal.c, line 150 at r3 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Ditto (rename queue)

Done.


LibOS/shim/src/bookkeep/shim_signal.c, line 158 at r3 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Cannot you replace the whole load + cmxhg with one __atomic_exchange_n? You're checking for NULL and replacing with NULL anyway.

Done.


LibOS/shim/src/bookkeep/shim_signal.c, line 169 at r3 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Why <= and not ==? You already have an assert above.

Done. Just a thing I got used to (that's not necessarily good).


LibOS/shim/src/bookkeep/shim_signal.c, line 178 at r3 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Could you switch the order of two arguments, for consistency with the rest of the code

Done.


LibOS/shim/src/bookkeep/shim_signal.c, line 248 at r3 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Why did you change this old logic? Previously, if the signal was delivered and Graphene was able to handle it immediately (not nested and not blocked), then Graphene handles this one signal. Now you changed it to "handle all signals" and then "handle this new signal".

I didn't change any of these. Could you mark exact lines where you think such change has happened?


LibOS/shim/src/bookkeep/shim_signal.c, line 275 at r3 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

These are two different cases, please separate into two IFs.

Done.


LibOS/shim/src/bookkeep/shim_signal.c, line 276 at r3 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

See my other comment: you will print this message when two standard signals arrive (the second one will be skipped). Do we still want to print this message (it sounds kinda scary)?

I think it's a valid debug message saying that we got a second signal either while handling the first instance or that we have it blocked and it's pending. If you think it should be changed in any way please let me know the specifics.


LibOS/shim/src/bookkeep/shim_signal.c, line 790 at r3 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

I never understood the need for this yield. Do you have any other insights why it was added here?

No idea.


LibOS/shim/src/bookkeep/shim_signal.c, line 944 at r3 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Did you intentionally skip the typical logic in CP functions of "first check if this was already checkpointed, then do the checkpoint if not"? I believe this check is not needed, so this code should be correct.

Short answer: yes.
I believe that check is only needed for object that are checkpointed indirectly (e.g. handles where there could be multiple references to one handle), not for global one-instance objects.


LibOS/shim/src/bookkeep/shim_signal.c, line 995 at r3 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

How is this possible that infos[i].si_signo == 0? This can't happen, no? Either remove the whole IF or add an assert.

Assuming there are possible (unfortunately unavoidable) races in CP_FUNC (while traversing queues) I added it as additional sanity check.
Now I think of it: this code is ran only in case of an execve checkpoint, which should only have one thread running so all multi-thread races should not happen. Only thing to care is asynchronous signal arriving and being added to queue while we are in this checkpoiting loop. I'll try to make this as robust as possible.

Copy link
Contributor Author

@boryspoplawski boryspoplawski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 4 of 24 files reviewed, 23 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: ITL), "fixup! " found in commit messages' one-liners (waiting on @boryspoplawski, @dimakuv, and @yamahata)

a discussion (no related file):
This PR uncovered an issue (#1534) which is non-deterministic but triggers a lot here. Blocking so this won't get merged until the aforementioned issue gets fixed.


Copy link
Contributor

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 6 of 15 files at r1, 2 of 14 files at r2, 8 of 13 files at r3, 4 of 4 files at r4.
Reviewable status: all files reviewed, 40 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: ITL), "fixup! " found in commit messages' one-liners (waiting on @boryspoplawski and @yamahata)

a discussion (no related file):

Previously, boryspoplawski (Borys Popławski) wrote…

This PR uncovered an issue (#1534) which is non-deterministic but triggers a lot here. Blocking so this won't get merged until the aforementioned issue gets fixed.

Let me block as well.



Documentation/devel/signal-handling.rst, line 746 at r3 (raw file):

Previously, boryspoplawski (Borys Popławski) wrote…

As I'm not familiar with its content I'll let you decide: should I remove it in this PR?

No. I'll do it in a separate PR.


LibOS/shim/src/shim_parser.c, line 416 at r4 (raw file):

    if (sig > NUM_SIGS)
        return "BAD SIGNAL";

This and above could be joined into one IF.


LibOS/shim/src/bookkeep/shim_signal.c, line 99 at r3 (raw file):

Previously, boryspoplawski (Borys Popławski) wrote…

Renamed it according to a comment near its definition, let me know if it suits.

But you still have this name queue here. Could we change to signal_slot? Or queue_signal? UPDATE: I mean the arg name, like this: queue_produce_standard_signal(struct shim_signal** queue_signal, struct shim_signal* signal)


LibOS/shim/src/bookkeep/shim_signal.c, line 120 at r3 (raw file):

Previously, boryspoplawski (Borys Popławski) wrote…

Yeah, adding signals with 1GHz (10**9 signals per second) gives a 544 years running time before overflow. Looks like a safe margin, but I would gladly change it to a better solution, if you have one (added a comment for now).

Thanks for comment, looks good. I wonder if anyone started this experiment of how much real time it takes for some laptop to overflow a uint64 integer :)


LibOS/shim/src/bookkeep/shim_signal.c, line 139 at r3 (raw file):

Previously, boryspoplawski (Borys Popławski) wrote…

This is exactly what return value of this function is supposed to mean: whether a signal was added to the queue (even if queue size is 1).
What do you mean with "correct behavior"?

OK, see my other comment below.


LibOS/shim/src/bookkeep/shim_signal.c, line 150 at r3 (raw file):

Previously, boryspoplawski (Borys Popławski) wrote…

Done.

Sorry, I meant the arg name, like this: queue_consume_standard_signal(struct shim_signal** queue_signal)


LibOS/shim/src/bookkeep/shim_signal.c, line 248 at r3 (raw file):

Previously, boryspoplawski (Borys Popławski) wrote…

I didn't change any of these. Could you mark exact lines where you think such change has happened?

I mean this change: __handle_signal(tcb, sig); -> __handle_signals(tcb);. The old code would only handle say pending SIGINT signals, the new code handles all pending signals.

I'm not arguing that the old code was correct. I just want to make sure this particular change was intentional.


LibOS/shim/src/bookkeep/shim_signal.c, line 276 at r3 (raw file):

Previously, boryspoplawski (Borys Popławski) wrote…

I think it's a valid debug message saying that we got a second signal either while handling the first instance or that we have it blocked and it's pending. If you think it should be changed in any way please let me know the specifics.

What about modifying the message to: Signal queue for thread %u and signal %d is full, dropping the incoming signal. This makes it less scary and more informative.


LibOS/shim/src/bookkeep/shim_signal.c, line 995 at r3 (raw file):

Previously, boryspoplawski (Borys Popławski) wrote…

Assuming there are possible (unfortunately unavoidable) races in CP_FUNC (while traversing queues) I added it as additional sanity check.
Now I think of it: this code is ran only in case of an execve checkpoint, which should only have one thread running so all multi-thread races should not happen. Only thing to care is asynchronous signal arriving and being added to queue while we are in this checkpoiting loop. I'll try to make this as robust as possible.

Ok, I understand now. We can keep this, but let's move it four lines above (before malloc). And maybe add a comment /* signo == 0 can happen on a race between this code and adding a new signal */.


LibOS/shim/src/bookkeep/shim_signal.c, line 101 at r4 (raw file):

}

/* In teory `get_idx` and `put_idx` could overflow, but adding signals with 1GHz (10**9 signals

theory


LibOS/shim/src/bookkeep/shim_signal.c, line 177 at r4 (raw file):

        return NULL;
    } else if (sig < SIGRTMIN) {
        return consume_standard_signal(&queue->standard_signals[sig - 1]);

As mentioned above, I didn't want to change the function name but rather the argument name.


LibOS/shim/src/bookkeep/shim_signal.c, line 876 at r4 (raw file):

    }

    debug("signal queue is full (TID = %u%s, SIG = %d)\n",

Please change this debug message in a similar way as I commented above.


LibOS/shim/src/bookkeep/shim_thread.c, line 456 at r4 (raw file):

 * Searches for any alive thread excluding `self`.
 * Returns tid of the first found alive thread or 0 if there are no alive threads.
 * If `self` is NULL, all threads are checked. If `makr_self_dead` is true, atomically marks `self`

makr_self_dead -> mark_self_dead


LibOS/shim/src/bookkeep/shim_thread.c, line 501 at r4 (raw file):

    /* clean up the thread itself */
    lock(&thread_list_lock);
    thread->is_alive = false;

Understand: why we don't need to unset is_alive.


LibOS/shim/src/ipc/shim_ipc_pid.c, line 161 at r4 (raw file):

};

int check_thread(struct shim_thread* thread, void* arg) {

Please add static.


LibOS/shim/src/sys/shim_alarm.c, line 33 at r4 (raw file):

static void signal_alarm(IDTYPE caller, void* arg) {
    __UNUSED(caller);
    (void)do_kill_proc(get_cur_thread()->tid, (IDTYPE)(uintptr_t)arg, SIGALRM, /*use_ipc=*/false);

signal_alarm() will be called by the Async helper thread. Thus, get_cur_thread()->tid will contain a TID of a Graphene-internal thread. I see that you actually have a comment about this in do_kill_proc(), but maybe also add a comment here? Smth like /* this function is invoked by Async helper thread (an internal Graphene thread) */


LibOS/shim/src/sys/shim_exit.c, line 33 at r4 (raw file):

#include <shim_utils.h>

void release_robust_list (struct robust_list_head * head);

Could you code-style this?


LibOS/shim/src/sys/shim_exit.c, line 76 at r4 (raw file):

        if (!thread->in_vm) {
            debug("deliver SIGCHLD (thread = %d, exitval = %d)\n",
                  thread->tid, exit_code);

I guess it is impossible to have two remote threads of the same child process, right? So we are guaranteed to enter this code path only once per child process.


LibOS/shim/src/sys/shim_exit.c, line 87 at r4 (raw file):

            if (append_signal(parent, &info)) {
                thread_wakeup(thread);
                DkThreadResume(thread->pal_handle);

Why are we waking up the remote thread? What does it even mean? Cannot we just remove these two lines?


LibOS/shim/src/sys/shim_exit.c, line 172 at r4 (raw file):

    thread_wakeup(thread);
    DkThreadResume(thread->pal_handle);

Ok, here I see why this is needed. We marked thread as to-die, and now we kick it so that it goes into __handle_signals() and there notices that it needs to die, so it exits. Maybe a small comment that /* this will kick the thread to invoke __handle_signals() and exit from there */?


LibOS/shim/src/sys/shim_exit.c, line 187 at r4 (raw file):

    }

    /* Tell other threads to exit. We can't do anything on failuers. */

failures


LibOS/shim/src/sys/shim_ioctl.c, line 294 at r4 (raw file):

void signal_io(IDTYPE caller, void* arg) {
    __UNUSED(caller);
    (void)do_kill_proc(get_cur_thread()->tid, (IDTYPE)(uintptr_t)arg, SIGIO, /*use_ipc=*/false);

ditto (add comment that this is invoked through Async helper thread)


LibOS/shim/src/sys/shim_sigaction.c, line 230 at r4 (raw file):

    return append_signal(thread, &info);
}

It's confusing that _append_signal() calls append_signal(), in the sense that we usually do it the other way around (the latter is usually the wrapper around the former). Also, this function is used in only two places and it simply creates info on the stack and calls append_signal(). Could we just merge its code into the callers?


LibOS/shim/src/sys/shim_sigaction.c, line 281 at r4 (raw file):

            /* We are ending this walk anyway, lets reuse sent field to mark that current thread
             * needs to handle a signal. */
            arg->sent = false;

Why couldn't we call handle_signals() here immediately? Why do we need this level of indirection?


LibOS/shim/src/sys/shim_sigaction.c, line 295 at r4 (raw file):

int do_kill_proc(IDTYPE sender, IDTYPE tgid, int sig, bool use_ipc) {
    /* This might be called by internal thread like IPC, need to do a search. */

How is the first part connected to the second? We should always do the search (walk the thread list), no?

(Also, it's not only IPC thread, it's also Async thread)


LibOS/shim/src/sys/shim_sigaction.c, line 308 at r4 (raw file):

    }

    /* We delivered the signal to self, now need to handle it. */

I'd prefer to move this comment under the two IFs, close to handle_signals()


LibOS/shim/src/sys/shim_sigaction.c, line 324 at r4 (raw file):

    if (use_ipc) {
        return ipc_pid_kill_send(sender, tgid, KILL_PROCESS, sig);

That's a bit weird. We land here even if we found a thread in our process that could handle the signal but we were not able to append it (because no memory or signal queue is full). Shouldn't we distinguish this case?


LibOS/shim/src/sys/shim_sigaction.c, line 345 at r4 (raw file):

    }

    /* This might be called by internal thread like IPC, need to do a search. */

ditto


LibOS/shim/src/sys/shim_sigaction.c, line 357 at r4 (raw file):

    /* We delivered the signal to self, now need to handle it. */
    if (ret == 0 && !arg.sent) {
        handle_signals();

Move comment inside the IF.


LibOS/shim/src/sys/shim_sleep.c, line 63 at r4 (raw file):

int shim_do_pause(void) {
    if (signal_pending())
        return -EINTR;

Why is everything removed? Because it is an impossible situation? (In the sense that pause() doesn't change signal mask and all pending-but-unblocked signals were already consumed.)


LibOS/shim/src/sys/shim_sleep.c, line 81 at r4 (raw file):

        }
        return -EINTR;
    }

ditto


LibOS/shim/test/regression/signal_multithread.c, line 59 at r4 (raw file):

    }

    usleep(1000);

What's the point of this sleep? Can you add a comment? Is that to make it more obvious/frequent that the child thread can also handle the signal?


LibOS/shim/test/regression/sigprocmask_pending.c, line 33 at r4 (raw file):

}

static void set_signalm_handler(int sig, void* handler) {

signalm? Why m?


LibOS/shim/test/regression/sigprocmask_pending.c, line 62 at r4 (raw file):

    ignore_signal(0);
    __atomic_store_n(&seen_signal, 0, __ATOMIC_RELAXED);
}

Why does this function clean pending signals? I cannot find in man pages what will force this behavior.


LibOS/shim/test/regression/sigprocmask_pending.c, line 71 at r4 (raw file):

    CHECK(kill(getpid(), SIGALRM) < 0);
    CHECK(kill(getpid(), SIGALRM) < 0);

Maybe add a check here that seen_signal == 0?


LibOS/shim/test/regression/sigprocmask_pending.c, line 75 at r4 (raw file):

    if (__atomic_load_n(&seen_signal, __ATOMIC_RELAXED) != 1) {
        printf("Multiple or none instances of simple signal were queued!\n");

simple -> standard?


LibOS/shim/test/regression/sigprocmask_pending.c, line 88 at r4 (raw file):

    set_signalm_handler(sig, signal_handler);

Maybe add a check here that seen_signal == 0?


LibOS/shim/test/regression/sigprocmask_pending.c, line 107 at r4 (raw file):

    CHECK(p < 0);
    if (p == 0) {
        set_signalm_handler(SIGALRM, signal_handler);

Why not install signal_handler in the parent and remove from here? This way you'll also test that the handler was inherited by the child.


LibOS/shim/test/regression/sigprocmask_pending.c, line 136 at r4 (raw file):

static void test_execve_continue(void) {
    set_signalm_handler(SIGALRM, signal_handler);

Maybe add a check here that seen_signal == 0?


LibOS/shim/test/regression/test_libos.py, line 148 at r4 (raw file):

    def test_404_sigprocmask_pending(self):
        stdout, _ = self.run_binary(['sigprocmask_pending'])

Could you increase timeout to 60? This test does one fork test and one execve test; it takes time on SGX.

Copy link
Contributor

@yamahata yamahata left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 42 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: ITL), "fixup! " found in commit messages' one-liners (waiting on @boryspoplawski and @dimakuv)


LibOS/shim/src/bookkeep/shim_signal.c, line 248 at r3 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

I mean this change: __handle_signal(tcb, sig); -> __handle_signals(tcb);. The old code would only handle say pending SIGINT signals, the new code handles all pending signals.

I'm not arguing that the old code was correct. I just want to make sure this particular change was intentional.

It's subtle. What if the current(sig) is synchronous signal(e.g. SIGSEGV) and the pending signal is async. In that case, we'd like to trigger sync first and then async.
Anyway that's very unlikely and the existing logic is somewhat broken, I'm fine with either way for now.


LibOS/shim/src/bookkeep/shim_signal.c, line 788 at r4 (raw file):

    (*handler) (sig, &signal->info, &signal->context);

    __atomic_store_n(&thread->signal_handled, true, __ATOMIC_RELAXED);

we'd like to add DkEventSet(thread->sched_event) for nanosleep, pause, and sigsuspend.
The implementaiton of DkEventSet() seems async signal safe.


LibOS/shim/src/sys/shim_sleep.c, line 36 at r4 (raw file):

int shim_do_pause(void) {
    /* ~0ULL micro sec ~= 805675 years */
    DkThreadDelayExecution(~((PAL_NUM)0));

This can be DkSynchronizationObjectWait(scheduler_event, max-timeout)
and deliver_signal can call DkEventSet().


LibOS/shim/src/sys/shim_sleep.c, line 45 at r4 (raw file):

    unsigned long time = rqtp->tv_sec * 1000000L + rqtp->tv_nsec / 1000;
    unsigned long ret = DkThreadDelayExecution(time);

ditto.

Copy link
Contributor Author

@boryspoplawski boryspoplawski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 9 of 25 files reviewed, 40 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: ITL), "fixup! " found in commit messages' one-liners (waiting on @boryspoplawski, @dimakuv, and @yamahata)


LibOS/shim/src/shim_parser.c, line 416 at r4 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

This and above could be joined into one IF.

Done.


LibOS/shim/src/bookkeep/shim_signal.c, line 99 at r3 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

But you still have this name queue here. Could we change to signal_slot? Or queue_signal? UPDATE: I mean the arg name, like this: queue_produce_standard_signal(struct shim_signal** queue_signal, struct shim_signal* signal)

Oh, sorry, the naming was confusing indeed. Hope I've really changed it where appropriate now :)


LibOS/shim/src/bookkeep/shim_signal.c, line 120 at r3 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Thanks for comment, looks good. I wonder if anyone started this experiment of how much real time it takes for some laptop to overflow a uint64 integer :)

~200 years, at least on my laptop (simple loop increasing a register) ;)


LibOS/shim/src/bookkeep/shim_signal.c, line 150 at r3 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Sorry, I meant the arg name, like this: queue_consume_standard_signal(struct shim_signal** queue_signal)

Done.


LibOS/shim/src/bookkeep/shim_signal.c, line 248 at r3 (raw file):

Previously, yamahata wrote…

It's subtle. What if the current(sig) is synchronous signal(e.g. SIGSEGV) and the pending signal is async. In that case, we'd like to trigger sync first and then async.
Anyway that's very unlikely and the existing logic is somewhat broken, I'm fine with either way for now.

Oh, I've misunderstood what the old code was doing (for some reason I thought that sig argument was just the starting index in searching loop). So the change was not intentional before, but now is:)
@yamahata you are right, I've changed the order: now synchronous signal is handled first.


LibOS/shim/src/bookkeep/shim_signal.c, line 276 at r3 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

What about modifying the message to: Signal queue for thread %u and signal %d is full, dropping the incoming signal. This makes it less scary and more informative.

Done.


LibOS/shim/src/bookkeep/shim_signal.c, line 995 at r3 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Ok, I understand now. We can keep this, but let's move it four lines above (before malloc). And maybe add a comment /* signo == 0 can happen on a race between this code and adding a new signal */.

Well, this actually cannot happen, so changed that to an assert.


LibOS/shim/src/bookkeep/shim_signal.c, line 101 at r4 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

theory

Done.


LibOS/shim/src/bookkeep/shim_signal.c, line 177 at r4 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

As mentioned above, I didn't want to change the function name but rather the argument name.

Done. Should I restore the old name? (I'm fine with this version)


LibOS/shim/src/bookkeep/shim_signal.c, line 788 at r4 (raw file):

Previously, yamahata wrote…

we'd like to add DkEventSet(thread->sched_event) for nanosleep, pause, and sigsuspend.
The implementaiton of DkEventSet() seems async signal safe.

Why is it needed? Seems like pause, sigsuspend and sleep are correctly handling wake-ups (at least in practice, maybe it's just coincidence).


LibOS/shim/src/bookkeep/shim_signal.c, line 876 at r4 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Please change this debug message in a similar way as I commented above.

Done.


LibOS/shim/src/bookkeep/shim_thread.c, line 456 at r4 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

makr_self_dead -> mark_self_dead

Done.


LibOS/shim/src/bookkeep/shim_thread.c, line 501 at r4 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Understand: why we don't need to unset is_alive.

Maybe I redefined meaning of that field?
Now it means that a thread is not running (a Graphene thread, the real host thread might be still alive and inside LibOS/PAL exit code path), which mostly means that the thread cannot handle any signals.
This field is set atomically (wrt. to other threads reading it) in LibOS exiting path.


LibOS/shim/src/ipc/shim_ipc_pid.c, line 161 at r4 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Please add static.

Done.


LibOS/shim/src/sys/shim_alarm.c, line 33 at r4 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

signal_alarm() will be called by the Async helper thread. Thus, get_cur_thread()->tid will contain a TID of a Graphene-internal thread. I see that you actually have a comment about this in do_kill_proc(), but maybe also add a comment here? Smth like /* this function is invoked by Async helper thread (an internal Graphene thread) */

Actually here it makes sense that caller (caller of shim_do_alarm) will be forwarded as sender.


LibOS/shim/src/sys/shim_exit.c, line 33 at r4 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Could you code-style this?

Removed it, no idea what it did here in the first place ...


LibOS/shim/src/sys/shim_exit.c, line 76 at r4 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

I guess it is impossible to have two remote threads of the same child process, right? So we are guaranteed to enter this code path only once per child process.

Generally speaking this should not exists at all. A process should not be aware of any particular threads of another process, but of a mere existence of a process as a whole, which renders this in_vm field useless. I.e. on thread_list there should be no "remote" threads.
I'll remove it when I have the time.
As to answer your question: I have no idea, but I suspect it is possible. Is there any issue here I should address in this PR?


LibOS/shim/src/sys/shim_exit.c, line 87 at r4 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Why are we waking up the remote thread? What does it even mean? Cannot we just remove these two lines?

I have no idea, but the previous code did that (via last argument to old append_signal).
See the comment above.


LibOS/shim/src/sys/shim_exit.c, line 172 at r4 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Ok, here I see why this is needed. We marked thread as to-die, and now we kick it so that it goes into __handle_signals() and there notices that it needs to die, so it exits. Maybe a small comment that /* this will kick the thread to invoke __handle_signals() and exit from there */?

Done.


LibOS/shim/src/sys/shim_exit.c, line 187 at r4 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

failures

Done.


LibOS/shim/src/sys/shim_ioctl.c, line 294 at r4 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

ditto (add comment that this is invoked through Async helper thread)

Done.


LibOS/shim/src/sys/shim_sigaction.c, line 230 at r4 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

It's confusing that _append_signal() calls append_signal(), in the sense that we usually do it the other way around (the latter is usually the wrapper around the former). Also, this function is used in only two places and it simply creates info on the stack and calls append_signal(). Could we just merge its code into the callers?

Yeah, makes sense. Done.


LibOS/shim/src/sys/shim_sigaction.c, line 281 at r4 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Why couldn't we call handle_signals() here immediately? Why do we need this level of indirection?

Because we are in the middle of walking the threads list and are holding some locks (related to that walk).


LibOS/shim/src/sys/shim_sigaction.c, line 295 at r4 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

How is the first part connected to the second? We should always do the search (walk the thread list), no?

(Also, it's not only IPC thread, it's also Async thread)

Expanded the comment a bit.
If there was a way to find the process id (tgid) we wouldn't need to use all those cmp_type, just wake any thread which has this signal unblocked (but yes, search would be needed anyway).


LibOS/shim/src/sys/shim_sigaction.c, line 308 at r4 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

I'd prefer to move this comment under the two IFs, close to handle_signals()

Done.


LibOS/shim/src/sys/shim_sigaction.c, line 324 at r4 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

That's a bit weird. We land here even if we found a thread in our process that could handle the signal but we were not able to append it (because no memory or signal queue is full). Shouldn't we distinguish this case?

Not delivering due to full signal queue actually counts as success (quite counter-intuitive), but you are right that we do not distinguish errors from not finding the signal target.
Changed return type of append_signal so now it clearly returns -ENOMEM in case of failure (oom).


LibOS/shim/src/sys/shim_sigaction.c, line 345 at r4 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

ditto

Done.


LibOS/shim/src/sys/shim_sigaction.c, line 357 at r4 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Move comment inside the IF.

Done.


LibOS/shim/src/sys/shim_sleep.c, line 63 at r4 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Why is everything removed? Because it is an impossible situation? (In the sense that pause() doesn't change signal mask and all pending-but-unblocked signals were already consumed.)

This was some weird special handling of a situation that needs to be handled even after this if (signal_pending()) so it was completely superfluous. All it did was check if we have a pending, non-blocked signal. But once such signal arrive, we will get poked (via those wake_threads and DkThreadSomething and handle it once preempt counter allows.


LibOS/shim/src/sys/shim_sleep.c, line 81 at r4 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

ditto

ditto


LibOS/shim/test/regression/signal_multithread.c, line 59 at r4 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

What's the point of this sleep? Can you add a comment? Is that to make it more obvious/frequent that the child thread can also handle the signal?

Done.


LibOS/shim/test/regression/sigprocmask_pending.c, line 33 at r4 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

signalm? Why m?

Because I changed it from alarm to signal, made a typo and then copy-pasted it everywhere ;)


LibOS/shim/test/regression/sigprocmask_pending.c, line 62 at r4 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Why does this function clean pending signals? I cannot find in man pages what will force this behavior.

See the added comments.


LibOS/shim/test/regression/sigprocmask_pending.c, line 71 at r4 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Maybe add a check here that seen_signal == 0?

Done.


LibOS/shim/test/regression/sigprocmask_pending.c, line 75 at r4 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

simple -> standard?

Done.


LibOS/shim/test/regression/sigprocmask_pending.c, line 88 at r4 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Maybe add a check here that seen_signal == 0?

Done.


LibOS/shim/test/regression/sigprocmask_pending.c, line 107 at r4 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Why not install signal_handler in the parent and remove from here? This way you'll also test that the handler was inherited by the child.

Done.


LibOS/shim/test/regression/sigprocmask_pending.c, line 136 at r4 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Maybe add a check here that seen_signal == 0?

Done.


LibOS/shim/test/regression/test_libos.py, line 148 at r4 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Could you increase timeout to 60? This test does one fork test and one execve test; it takes time on SGX.

Done. Though I think if this test takes more than a couple of seconds on SGX we should be aware of it.

@boryspoplawski
Copy link
Contributor Author

Somethings wrong with exec_same now, debugging in progress ...

Copy link
Contributor

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 16 of 16 files at r5.
Reviewable status: all files reviewed, 11 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: ITL), "fixup! " found in commit messages' one-liners (waiting on @boryspoplawski and @yamahata)


LibOS/shim/src/bookkeep/shim_signal.c, line 177 at r4 (raw file):

Previously, boryspoplawski (Borys Popławski) wrote…

Done. Should I restore the old name? (I'm fine with this version)

No, I'm also fine with this version. Resolved.


LibOS/shim/src/bookkeep/shim_signal.c, line 955 at r5 (raw file):

    for (int sig = 1; sig < SIGRTMIN && i < n; sig++) {
        struct shim_signal** q = &process_signal_queue.standard_signals[sig - 1];
        /* This load might look racy, but only scenario that this signal gets taken of the queue

gets taken of the queue -> is removed from the queue


LibOS/shim/src/bookkeep/shim_thread.c, line 501 at r4 (raw file):

Previously, boryspoplawski (Borys Popławski) wrote…

Maybe I redefined meaning of that field?
Now it means that a thread is not running (a Graphene thread, the real host thread might be still alive and inside LibOS/PAL exit code path), which mostly means that the thread cannot handle any signals.
This field is set atomically (wrt. to other threads reading it) in LibOS exiting path.

OK. This was a note to myself, and I forgot to return to it :) Indeed, when we are in this function, the thread is "not alive" from the Graphene perspective.


LibOS/shim/src/sys/shim_alarm.c, line 33 at r4 (raw file):

Previously, boryspoplawski (Borys Popławski) wrote…

Actually here it makes sense that caller (caller of shim_do_alarm) will be forwarded as sender.

Then please remove __UNUSED(caller).


LibOS/shim/src/sys/shim_exec.c, line 376 at r5 (raw file):

    kill_other_threads();

    /* All other threds are dead. Restoring initial value in case we stay inside same process

threads


LibOS/shim/src/sys/shim_exec.c, line 381 at r5 (raw file):

    bool use_same_process = true;
    if (use_same_process && !strcmp_static(PAL_CB(host_type), "Linux-SGX")) {

No need for user_same_process check in this IF now.


LibOS/shim/src/sys/shim_exit.c, line 76 at r4 (raw file):

Previously, boryspoplawski (Borys Popławski) wrote…

Generally speaking this should not exists at all. A process should not be aware of any particular threads of another process, but of a mere existence of a process as a whole, which renders this in_vm field useless. I.e. on thread_list there should be no "remote" threads.
I'll remove it when I have the time.
As to answer your question: I have no idea, but I suspect it is possible. Is there any issue here I should address in this PR?

OK, let's deal with these "remote threads" in another PR.


LibOS/shim/src/sys/shim_exit.c, line 87 at r4 (raw file):

Previously, boryspoplawski (Borys Popławski) wrote…

I have no idea, but the previous code did that (via last argument to old append_signal).
See the comment above.

OK, let's deal with these "remote threads" in another PR.


LibOS/shim/src/sys/shim_sleep.c, line 63 at r4 (raw file):

Previously, boryspoplawski (Borys Popławski) wrote…

This was some weird special handling of a situation that needs to be handled even after this if (signal_pending()) so it was completely superfluous. All it did was check if we have a pending, non-blocked signal. But once such signal arrive, we will get poked (via those wake_threads and DkThreadSomething and handle it once preempt counter allows.

OK, understood. This seems to be vestiges from the times when we had bugs such that some unblocked signals were not handled.


LibOS/shim/test/regression/exec_same.c, line 37 at r5 (raw file):

    pthread_t th;

    /* Creating another thread and doing a race on execve. Only on thread should survive. */

Only on -> Only one.

Hard to imagine a real-world app that does this but cool :)


LibOS/shim/test/regression/sigprocmask_pending.c, line 62 at r5 (raw file):

    /* We should not have any pending signals other than SIGALRM. */
    set_signal_handler(SIGALRM, signal_handler);
    /* This assumes that unblocking a signal will cause its immediate delivery. Unfortunatey

Unfortunately


LibOS/shim/test/regression/test_libos.py, line 148 at r4 (raw file):

Previously, boryspoplawski (Borys Popławski) wrote…

Done. Though I think if this test takes more than a couple of seconds on SGX we should be aware of it.

On my old and weak NUC, a HelloWorld example with the default enclave size of 256MB takes ~5 seconds to load. So creating three enclaves will take ~15 seconds, which can easily exceed the default limit of 20 seconds if we're very unlucky. (And our Jenkins has similar old-and-weak NUCs.)

Copy link
Contributor Author

@boryspoplawski boryspoplawski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 11 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: ITL), "fixup! " found in commit messages' one-liners (waiting on @boryspoplawski, @dimakuv, and @yamahata)


LibOS/shim/src/bookkeep/shim_signal.c, line 955 at r5 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

gets taken of the queue -> is removed from the queue

Done.


LibOS/shim/src/sys/shim_alarm.c, line 33 at r4 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Then please remove __UNUSED(caller).

Done.


LibOS/shim/src/sys/shim_exec.c, line 376 at r5 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

threads

Done.


LibOS/shim/src/sys/shim_exec.c, line 381 at r5 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

No need for user_same_process check in this IF now.

Done.


LibOS/shim/test/regression/exec_same.c, line 37 at r5 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Only on -> Only one.

Hard to imagine a real-world app that does this but cool :)

Done.


LibOS/shim/test/regression/sigprocmask_pending.c, line 62 at r5 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Unfortunately

Done.

dimakuv
dimakuv previously approved these changes May 28, 2020
Copy link
Contributor

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 5 of 5 files at r6.
Reviewable status: all files reviewed, 4 unresolved discussions, "fixup! " found in commit messages' one-liners (waiting on @boryspoplawski and @yamahata)

a discussion (no related file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Let me block as well.

I'm unblocking this since the proposed solution to issue #1534 requires this PR. I will work on the solution on top of this PR.


Copy link
Contributor Author

@boryspoplawski boryspoplawski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 21 of 26 files reviewed, 4 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: ITL), "fixup! " found in commit messages' one-liners (waiting on @boryspoplawski, @dimakuv, and @yamahata)

a discussion (no related file):
This PR uncovered some UAF in Pal (triggered often in exec_same). Blocking untill I debug it.


Copy link
Contributor

@yamahata yamahata left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 21 of 26 files reviewed, 4 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: ITL), "fixup! " found in commit messages' one-liners (waiting on @boryspoplawski and @dimakuv)


LibOS/shim/src/bookkeep/shim_signal.c, line 788 at r4 (raw file):

Previously, boryspoplawski (Borys Popławski) wrote…

Why is it needed? Seems like pause, sigsuspend and sleep are correctly handling wake-ups (at least in practice, maybe it's just coincidence).

Ok, let leave it as is for now because it's long standing another issue and you said "in practice."

Just for record:
In shim_do_sigsuspend() case, async signal can arrive after thread_setwait() and signal_hancled check before thread_sleep().
In that case, thread_sleep() won't wake up.

Copy link
Contributor Author

@boryspoplawski boryspoplawski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 21 of 26 files reviewed, 3 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: ITL), "fixup! " found in commit messages' one-liners (waiting on @boryspoplawski and @dimakuv)

a discussion (no related file):

Previously, boryspoplawski (Borys Popławski) wrote…

This PR uncovered some UAF in Pal (triggered often in exec_same). Blocking untill I debug it.

See the comment in exec_same test.



LibOS/shim/src/bookkeep/shim_signal.c, line 788 at r4 (raw file):

Previously, yamahata wrote…

Ok, let leave it as is for now because it's long standing another issue and you said "in practice."

Just for record:
In shim_do_sigsuspend() case, async signal can arrive after thread_setwait() and signal_hancled check before thread_sleep().
In that case, thread_sleep() won't wake up.

But if it's async signal, then whoever appended it to pending signals also called thread_wakeup on us, which does exactly that.
If a signal was delivered before thread->signal_handled check then this exact check will prevent sleeping, if it was delivered after -> then whoever have appended it will also set thread->scheduler_event.

Copy link
Contributor

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 4 of 5 files at r7, 4 of 4 files at r8.
Reviewable status: all files reviewed, 9 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: ITL), "fixup! " found in commit messages' one-liners (waiting on @boryspoplawski)


LibOS/shim/src/bookkeep/shim_thread.c, line 465 at r8 (raw file):

/*
 * Checks whether there are any other threds on `thread_list`.

threads


LibOS/shim/src/sys/shim_exit.c, line 71 at r8 (raw file):

        LISTP_DEL_INIT(thread, &parent->children, siblings);
        LISTP_ADD_TAIL(thread, &parent->exited_children, siblings);
        unlock(&parent->lock);

You moved this unlock just for sanity, right? There was no bug in the previous version, just unnecessary locking?


Pal/src/host/Linux-SGX/enclave_framework.c, line 383 at r8 (raw file):

    if (tf->stubs) {
        *stubptr = tf->stubs;
        *sizeptr = tf->size;

This seems to be a useless assignment because the size of trusted files never changes and we already assign it above. You can remove it if you want, but not blocking.


Pal/src/host/Linux-SGX/enclave_framework.c, line 468 at r8 (raw file):

    if (tf->stubs) {
        *stubptr = tf->stubs;
        *sizeptr = tf->size;

This seems to be a useless assignment because the size of trusted files never changes and we already assign it above. You can remove it if you want, but not blocking.


Pal/src/host/Linux-SGX/enclave_framework.c, line 471 at r8 (raw file):

        spinlock_unlock(&trusted_file_lock);
        free(stubs);
        return 0;

Ok, this looks like a fix for the following data race: my thread started calculating stubs but some other thread updated stubs on the same trusted file tf. The old code seems to disregard the updated-by-other-thread stubs, free them, and use our stubs. Your new code, however, disregards the calculated-by-my-thread stubs, free them, and user other-thread's stubs. Is my understanding correct?


Pal/src/host/Linux-SGX/enclave_framework.c, line 714 at r8 (raw file):

        }

        new->index = (++trusted_file_indexes);

This old code is just... yeah

yamahata
yamahata previously approved these changes May 30, 2020
Copy link
Contributor

@yamahata yamahata left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 15 files at r1, 3 of 14 files at r2, 3 of 13 files at r3, 1 of 4 files at r4, 8 of 16 files at r5, 3 of 5 files at r6, 4 of 5 files at r7, 4 of 4 files at r8.
Reviewable status: all files reviewed, 9 unresolved discussions, "fixup! " found in commit messages' one-liners (waiting on @boryspoplawski)

Copy link
Contributor Author

@boryspoplawski boryspoplawski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 4 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: ITL), "fixup! " found in commit messages' one-liners (waiting on @boryspoplawski and @dimakuv)


LibOS/shim/src/bookkeep/shim_thread.c, line 465 at r8 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

threads

Done.


LibOS/shim/src/sys/shim_exit.c, line 71 at r8 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

You moved this unlock just for sanity, right? There was no bug in the previous version, just unnecessary locking?

Yes, but reasoning was: now you don't need to hold a thread lock to append a signal.


Pal/src/host/Linux-SGX/enclave_framework.c, line 383 at r8 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

This seems to be a useless assignment because the size of trusted files never changes and we already assign it above. You can remove it if you want, but not blocking.

Done.
Yeah, I suspected that, but I was a bit frustrated with debugging the other race, so I decided to push smallest subset of only needed changes.


Pal/src/host/Linux-SGX/enclave_framework.c, line 468 at r8 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

This seems to be a useless assignment because the size of trusted files never changes and we already assign it above. You can remove it if you want, but not blocking.

Done.


Pal/src/host/Linux-SGX/enclave_framework.c, line 471 at r8 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Ok, this looks like a fix for the following data race: my thread started calculating stubs but some other thread updated stubs on the same trusted file tf. The old code seems to disregard the updated-by-other-thread stubs, free them, and use our stubs. Your new code, however, disregards the calculated-by-my-thread stubs, free them, and user other-thread's stubs. Is my understanding correct?

Yes, but there is a bit more to the old version: another thread could be actively using those old stubs that the new thread free - UAF.

dimakuv
dimakuv previously approved these changes Jun 1, 2020
Copy link
Contributor

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 2 files at r9.
Reviewable status: all files reviewed, all discussions resolved, "fixup! " found in commit messages' one-liners

Copy link
Member

@mkow mkow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 15 files at r1, 3 of 14 files at r2, 3 of 13 files at r3, 1 of 4 files at r4, 8 of 16 files at r5, 3 of 5 files at r6, 3 of 5 files at r7, 3 of 4 files at r8, 2 of 2 files at r9.
Reviewable status: all files reviewed, 24 unresolved discussions, "fixup! " found in commit messages' one-liners (waiting on @boryspoplawski)

a discussion (no related file):
Do we have any new LTP tests which started to work?



LibOS/shim/src/shim_parser.c, line 400 at r9 (raw file):

#define S(sig) #sig

const char* const siglist[SIGRTMIN - 1] = {

This table is arch-dependent. Please either wrap it in #ifdef or use indexed initializer:

const char* const siglist[SIGRTMIN - 1] = {
    [SIGHUP] = S(SIGHUP),
    [SIGINT] = S(SIGINT),
}

But: even if you use indexed initializer, this won't solve all the troubles, because not only the signal numbers differ between archs, but also their existence.
I see this was already here before, but IMO worth fixing while we are here.


LibOS/shim/src/shim_parser.c, line 401 at r9 (raw file):

const char* const siglist[SIGRTMIN - 1] = {
    S(SIGHUP),  S(SIGINT),    S(SIGQUIT), S(SIGILL),   S(SIGTRAP),   S(SIGABRT),

alignment is broken on this line


LibOS/shim/src/bookkeep/shim_signal.c, line 55 at r9 (raw file):

process_pending_signals

This is a count of them? If so, please add _cnt to the name.


LibOS/shim/src/bookkeep/shim_signal.c, line 57 at r9 (raw file):

static uint64_t process_pending_signals = 0;

/* These checks are racy, but we can't do better anyway: signal can be delivered in any moment. */

Could you comment here what is the worst-case scenario for this race? What will happen?


LibOS/shim/src/bookkeep/shim_signal.c, line 57 at r9 (raw file):

static uint64_t process_pending_signals = 0;

/* These checks are racy, but we can't do better anyway: signal can be delivered in any moment. */

As discussed in private, it seems that it's possible to implement this without races and additionally fix some security issues related to signals.
The idea is that we actually shouldn't forward any external signals to the app. Synchronous signals should be generated inside the trusted part based on information in SSA (in case of SGX PAL) and asynchronous should only be allowed to arrive from other processes running inside this Graphene namespace. Things like Ctrl+C should just kill Graphene without being able to trigger the signal inside the user app. Having signal implemented this way, it will be possible to just wrap this code in locks and make it race-safe.

This would require quite a huge rewrite of signal handling, so for now please just place a TODO here and describe this.


LibOS/shim/src/bookkeep/shim_signal.c, line 61 at r9 (raw file):

    return __atomic_load_n(&queue->get_idx, __ATOMIC_RELAXED)
            == __atomic_load_n(&queue->put_idx, __ATOMIC_RELAXED);
}

missing empty line here


LibOS/shim/src/bookkeep/shim_signal.c, line 89 at r9 (raw file):

}

static bool produce_standard_signal(struct shim_signal** signal_slot, struct shim_signal* signal) {

Why "produce" instead of e.g. "insert" or "append"? Sounds like we're creating the signal here, not just inserting the already existing structure to a queue.


LibOS/shim/src/bookkeep/shim_signal.c, line 134 at r9 (raw file):

    bool ret = queue_produce_signal(&thread->signal_queue, signal);
    if (ret) {
        (void)__atomic_add_fetch(&thread->pending_signals, 1, __ATOMIC_RELAXED);

Can't here be some problems from thread->pending_signals and thread->signal_queue being not-atomically changed? Also, you use __ATOMIC_RELAXED, so the increment may be visible before the insertion?


LibOS/shim/src/bookkeep/shim_signal.c, line 142 at r9 (raw file):

    bool ret = queue_produce_signal(&process_signal_queue, signal);
    if (ret) {
        (void)__atomic_add_fetch(&process_pending_signals, 1, __ATOMIC_RELAXED);

ditto


LibOS/shim/src/bookkeep/shim_signal.c, line 155 at r9 (raw file):

    uint64_t get_idx = __atomic_load_n(&queue->get_idx, __ATOMIC_RELAXED);
    do {
        put_idx = __atomic_load_n(&queue->put_idx, __ATOMIC_RELAXED);

Should this really be __ATOMIC_RELAXED?


LibOS/shim/src/bookkeep/shim_signal.c, line 167 at r9 (raw file):

}

static struct shim_signal* queue_consume_signal(struct shim_signal_queue* queue, int sig) {

I think pop is more standard name than consume + I think consume in this context mean "taking and working on a object from a queue", not just taking it (i.e. you have producers and consumers, but these terms describe more actions than just inserting/popping)


LibOS/shim/src/bookkeep/shim_signal.c, line 801 at r9 (raw file):

    if (thread->time_to_die) {
        thread_exit(0, 0);

please use named args here


LibOS/shim/src/bookkeep/shim_signal.c, line 941 at r9 (raw file):

    __UNUSED(objp);

    /* This is a bit racy, but we cannot do any better. If a app gets spamed with signals while

a app -> an app
spamed -> spammed


LibOS/shim/src/bookkeep/shim_signal.c, line 944 at r9 (raw file):

     * doing execve, some pending signals might not get checkpointed; we add an arbitrary number of
     * safe margin slots. */
#define SAFE_MARGIN_SLOTS 10

why not const size_t SAFE_MARGIN_SLOTS?


LibOS/shim/src/bookkeep/shim_signal.c, line 948 at r9 (raw file):

    uint64_t i = 0;
    assert(n <= SIGRTMIN - 1 + (NUM_SIGS - SIGRTMIN + 1) * MAX_SIGNAL_LOG + SAFE_MARGIN_SLOTS);
    siginfo_t infos[n];

could you allocate this on the heap?


LibOS/shim/src/bookkeep/shim_signal.c, line 954 at r9 (raw file):

    for (int sig = 1; sig < SIGRTMIN && i < n; sig++) {
        struct shim_signal** q = &process_signal_queue.standard_signals[sig - 1];
        /* This load might look racy, but only scenario that this signal is removed from the queue

but only scenario -> but the only scenario


LibOS/shim/src/bookkeep/shim_thread.c, line 504 at r9 (raw file):

    release_clear_child_tid(thread->clear_child_tid);

    /* Clean up the thread itself - it will be removed from `thread_list`. */

Do you mean that del_thread will remove it from the list or that we're calling del_thread because it will be later removed somewhere else?


LibOS/shim/src/sys/shim_exec.c, line 368 at r9 (raw file):

    }

    /* If `execve` is invoked concurrently by multiple threads, let only one succeed. */

What if the execve we let through fails? (e.g. because of a wrong path passed)
We end up will all threads killed + another parallel execves ignored.
Seems to be an exotic case, I'm not sure if it's worth fixing, but maybe it's worth commenting (if such a problem is really possible here).


LibOS/shim/src/sys/shim_exit.c, line 193 at r9 (raw file):

    /* Wait for all other threads to exit. */
    while (!check_last_thread()) {
        /* Tell other threads to exit again - the previous anouncement could be missed by threads

anouncement -> announcement


LibOS/shim/test/regression/signal_multithread.c, line 10 at r9 (raw file):

static int counter = 0;

static void handler(int signum) {

handler -> sigterm_handler


LibOS/shim/test/regression/signal_multithread.c, line 26 at r9 (raw file):

}

static void* f(void* x) {

f -> thread_func


LibOS/shim/test/regression/sigprocmask_pending.c, line 17 at r9 (raw file):

} while (0)

static int seen_signal = 0;

seen_signal -> seen_signals_cnt


LibOS/shim/test/regression/sigprocmask_pending.c, line 152 at r9 (raw file):

    if (__atomic_load_n(&seen_signal, __ATOMIC_RELAXED) != 0) {
        printf("Seen a unexpected signal!\n");

a -> an

Copy link
Contributor Author

@boryspoplawski boryspoplawski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 20 of 28 files reviewed, 24 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: ITL), "fixup! " found in commit messages' one-liners (waiting on @dimakuv, @mkow, and @yamahata)

a discussion (no related file):

Previously, mkow (Michał Kowalczyk) wrote…

Do we have any new LTP tests which started to work?

No idea, you are welcome to try (-ENOTIME here).



LibOS/shim/src/shim_parser.c, line 400 at r9 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

This table is arch-dependent. Please either wrap it in #ifdef or use indexed initializer:

const char* const siglist[SIGRTMIN - 1] = {
    [SIGHUP] = S(SIGHUP),
    [SIGINT] = S(SIGINT),
}

But: even if you use indexed initializer, this won't solve all the troubles, because not only the signal numbers differ between archs, but also their existence.
I see this was already here before, but IMO worth fixing while we are here.

I would vote for just printing a signal number, but whatever, done.


LibOS/shim/src/shim_parser.c, line 401 at r9 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

alignment is broken on this line

Done.


LibOS/shim/src/bookkeep/shim_signal.c, line 55 at r9 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…
process_pending_signals

This is a count of them? If so, please add _cnt to the name.

Done.


LibOS/shim/src/bookkeep/shim_signal.c, line 57 at r9 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

Could you comment here what is the worst-case scenario for this race? What will happen?

Done.


LibOS/shim/src/bookkeep/shim_signal.c, line 57 at r9 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

As discussed in private, it seems that it's possible to implement this without races and additionally fix some security issues related to signals.
The idea is that we actually shouldn't forward any external signals to the app. Synchronous signals should be generated inside the trusted part based on information in SSA (in case of SGX PAL) and asynchronous should only be allowed to arrive from other processes running inside this Graphene namespace. Things like Ctrl+C should just kill Graphene without being able to trigger the signal inside the user app. Having signal implemented this way, it will be possible to just wrap this code in locks and make it race-safe.

This would require quite a huge rewrite of signal handling, so for now please just place a TODO here and describe this.

Done.


LibOS/shim/src/bookkeep/shim_signal.c, line 61 at r9 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

missing empty line here

Done.


LibOS/shim/src/bookkeep/shim_signal.c, line 89 at r9 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

Why "produce" instead of e.g. "insert" or "append"? Sounds like we're creating the signal here, not just inserting the already existing structure to a queue.

It was named so solely to match consume below. Moving the discussion there.


LibOS/shim/src/bookkeep/shim_signal.c, line 134 at r9 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

Can't here be some problems from thread->pending_signals and thread->signal_queue being not-atomically changed? Also, you use __ATOMIC_RELAXED, so the increment may be visible before the insertion?

None that I can think of. If a signal might need being acted upon immediately (e.g. just handled), a thread will be woken up after this append, hence it will notice this signal, even if it is just in the process of exiting signal handling routine.
pending_signals is just an optimization not to have go through whole list. I can make those adds totally ordered, but I see a little value in that. Tho I might be missing something, so please give it a thought and approach my explanations skeptically :)


LibOS/shim/src/bookkeep/shim_signal.c, line 142 at r9 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

ditto

ditto


LibOS/shim/src/bookkeep/shim_signal.c, line 155 at r9 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

Should this really be __ATOMIC_RELAXED?

I believe it could be, but now that I changed the comparison below, races seem to be possible (see the closed comment near put_idx == get_idx).
Should I restore the old comparison or make those sequential wrt. produce routines? The former should be a bit faster (less synchronization), the latter hm, I more conservative, I guess.
Just for the record: the issue might be that a produce call (1) and two consume calls (2, 3) happen at the same time. Now consider:
(1) increates put_idx <- this change is visible in (2) but NOT in (3)
(2) increases get_idx <- this is visible in (3)
(3) now sees get_idx > put_idx which allows it to successfully do the cmp&exchange below, permanently increasing get_idx past put_idx and therefore appending a signal past allowed space.
Such scenario is perfectly possible with __ATOMIC_RELAXED, tho I believe restoring the >= operator would fix this.
@dimakuv @mkow Thoughts?


LibOS/shim/src/bookkeep/shim_signal.c, line 167 at r9 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

I think pop is more standard name than consume + I think consume in this context mean "taking and working on a object from a queue", not just taking it (i.e. you have producers and consumers, but these terms describe more actions than just inserting/popping)

This naming is kind of an artifact from previous version. Here consume actually does that: this signal is being handled right after calling this signal (tho not inside it). I agree a pop would be a better name, with current state of this code it should be possible to rename all of this.

Update: I did a rename in the end.


LibOS/shim/src/bookkeep/shim_signal.c, line 801 at r9 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

please use named args here

Done.


LibOS/shim/src/bookkeep/shim_signal.c, line 941 at r9 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

a app -> an app
spamed -> spammed

Done.


LibOS/shim/src/bookkeep/shim_signal.c, line 944 at r9 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

why not const size_t SAFE_MARGIN_SLOTS?

Done.


LibOS/shim/src/bookkeep/shim_signal.c, line 948 at r9 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

could you allocate this on the heap?

I deliberately allocated it on stack, because I have no idea how this checkpointing code interacts with memory allocations (I suspect that not very well). This size should be small enough not to cause any issues.


LibOS/shim/src/bookkeep/shim_signal.c, line 954 at r9 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

but only scenario -> but the only scenario

Done.


LibOS/shim/src/bookkeep/shim_thread.c, line 504 at r9 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

Do you mean that del_thread will remove it from the list or that we're calling del_thread because it will be later removed somewhere else?

Should be cleaner now.


LibOS/shim/src/sys/shim_exec.c, line 261 at r9 (raw file):

    if (!fs->d_ops->open) {
        ret = -EACCES;
    err:

/me tries not to scream loudly


LibOS/shim/src/sys/shim_exec.c, line 368 at r9 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

What if the execve we let through fails? (e.g. because of a wrong path passed)
We end up will all threads killed + another parallel execves ignored.
Seems to be an exotic case, I'm not sure if it's worth fixing, but maybe it's worth commenting (if such a problem is really possible here).

  1. Arguments (including path opening) are checked before this.
  2. This is a case we cannot really recover from.
  3. From this point only failures that can happen are:
    • Graphene internal failures - this is probably fatal anyway
    • OOM - sad_frog.jpg
      Added a comment and actually fixed some missing process kills.

LibOS/shim/src/sys/shim_exit.c, line 193 at r9 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

anouncement -> announcement

Done.


LibOS/shim/test/regression/signal_multithread.c, line 10 at r9 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

handler -> sigterm_handler

Done.


LibOS/shim/test/regression/signal_multithread.c, line 26 at r9 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

f -> thread_func

Done.


LibOS/shim/test/regression/sigprocmask_pending.c, line 17 at r9 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

seen_signal -> seen_signals_cnt

Done.


LibOS/shim/test/regression/sigprocmask_pending.c, line 152 at r9 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

a -> an

Done.

Copy link
Member

@mkow mkow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 8 of 8 files at r10.
Reviewable status: all files reviewed, 5 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: ITL), "fixup! " found in commit messages' one-liners (waiting on @boryspoplawski, @dimakuv, and @mkow)

a discussion (no related file):

Previously, boryspoplawski (Borys Popławski) wrote…

No idea, you are welcome to try (-ENOTIME here).

Tests which I found to stably pass: (stably aka. 500 runs without failures)

  • rt_sigaction02 (worked already on master)
  • rt_sigprocmask01 (started to work after this PR)
  • signal06 (worked already on master)

There are also some which seem to almost pass, e.g. failing only on EFAULT (so, probably insufficient argument checking). But we can work on them later.

Could you enable the ones I listed?



LibOS/shim/src/bookkeep/shim_signal.c, line 134 at r9 (raw file):

Previously, boryspoplawski (Borys Popławski) wrote…

None that I can think of. If a signal might need being acted upon immediately (e.g. just handled), a thread will be woken up after this append, hence it will notice this signal, even if it is just in the process of exiting signal handling routine.
pending_signals is just an optimization not to have go through whole list. I can make those adds totally ordered, but I see a little value in that. Tho I might be missing something, so please give it a thought and approach my explanations skeptically :)

Sounds good. Could you place this explanation about pending_signals in a comment?


LibOS/shim/src/bookkeep/shim_signal.c, line 155 at r9 (raw file):

Previously, boryspoplawski (Borys Popławski) wrote…

I believe it could be, but now that I changed the comparison below, races seem to be possible (see the closed comment near put_idx == get_idx).
Should I restore the old comparison or make those sequential wrt. produce routines? The former should be a bit faster (less synchronization), the latter hm, I more conservative, I guess.
Just for the record: the issue might be that a produce call (1) and two consume calls (2, 3) happen at the same time. Now consider:
(1) increates put_idx <- this change is visible in (2) but NOT in (3)
(2) increases get_idx <- this is visible in (3)
(3) now sees get_idx > put_idx which allows it to successfully do the cmp&exchange below, permanently increasing get_idx past put_idx and therefore appending a signal past allowed space.
Such scenario is perfectly possible with __ATOMIC_RELAXED, tho I believe restoring the >= operator would fix this.
@dimakuv @mkow Thoughts?

I'm fine with both solutions, although I think I prefer the sequential solution more - am I right that this code is not performance critical? If someone's doing IPC using signals then they don't deserve good performance anyway ;) So, if I'm not missing something, then choosing more conservative solution wrt correctness sounds like a better choice.


LibOS/shim/src/bookkeep/shim_signal.c, line 167 at r9 (raw file):

Previously, boryspoplawski (Borys Popławski) wrote…

This naming is kind of an artifact from previous version. Here consume actually does that: this signal is being handled right after calling this signal (tho not inside it). I agree a pop would be a better name, with current state of this code it should be possible to rename all of this.

Update: I did a rename in the end.

Could you also rename "produce" then?


LibOS/shim/src/bookkeep/shim_signal.c, line 948 at r9 (raw file):

Previously, boryspoplawski (Borys Popławski) wrote…

I deliberately allocated it on stack, because I have no idea how this checkpointing code interacts with memory allocations (I suspect that not very well). This size should be small enough not to cause any issues.

Ok, fair enough.


LibOS/shim/src/sys/shim_exec.c, line 368 at r9 (raw file):

Previously, boryspoplawski (Borys Popławski) wrote…
  1. Arguments (including path opening) are checked before this.
  2. This is a case we cannot really recover from.
  3. From this point only failures that can happen are:
    • Graphene internal failures - this is probably fatal anyway
    • OOM - sad_frog.jpg
      Added a comment and actually fixed some missing process kills.

Looks good now, thanks!

@boryspoplawski
Copy link
Contributor Author

Jenkins, retest Jenkins please (fork11 failed, known issue)

Copy link
Contributor Author

@boryspoplawski boryspoplawski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 25 of 29 files reviewed, 5 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: ITL), "fixup! " found in commit messages' one-liners (waiting on @dimakuv, @mkow, and @yamahata)

a discussion (no related file):

Previously, mkow (Michał Kowalczyk) wrote…

Tests which I found to stably pass: (stably aka. 500 runs without failures)

  • rt_sigaction02 (worked already on master)
  • rt_sigprocmask01 (started to work after this PR)
  • signal06 (worked already on master)

There are also some which seem to almost pass, e.g. failing only on EFAULT (so, probably insufficient argument checking). But we can work on them later.

Could you enable the ones I listed?

Sure. Which are the ones that almost pass?



LibOS/shim/src/bookkeep/shim_signal.c, line 134 at r9 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

Sounds good. Could you place this explanation about pending_signals in a comment?

Done.


LibOS/shim/src/bookkeep/shim_signal.c, line 155 at r9 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

I'm fine with both solutions, although I think I prefer the sequential solution more - am I right that this code is not performance critical? If someone's doing IPC using signals then they don't deserve good performance anyway ;) So, if I'm not missing something, then choosing more conservative solution wrt correctness sounds like a better choice.

Done. I revisited all usages of atomics in this PR, please validate the current approach.
I'm blocking here until someone explicitly accepts.
cc other reviewers @dimakuv @yamahata


LibOS/shim/src/bookkeep/shim_signal.c, line 167 at r9 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

Could you also rename "produce" then?

Oops, forgot. Done.

Copy link
Member

@mkow mkow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 4 of 4 files at r11.
Reviewable status: all files reviewed, 1 unresolved discussion, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: ITL), "fixup! " found in commit messages' one-liners (waiting on @dimakuv and @yamahata)

a discussion (no related file):

Previously, boryspoplawski (Borys Popławski) wrote…

Sure. Which are the ones that almost pass?

rt_sigprocmask02 - one success instead of failure
sigaction01 - 3 fails + 1 success, but looks like at least some of them could be easily fixed
sigpending02 - this one passes, but emits sigpending02.c:47: CONF: syscall(-1) __NR_sigpending not supported, so I'm not sure whether we should enable it?

disclaimer: I didn't debug them nor look into the code, just guessing that they may be easy to fix. Also, the problems there doesn't look like directly related to this PR.


Copy link
Member

@mkow mkow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 8 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: ITL), "fixup! " found in commit messages' one-liners (waiting on @boryspoplawski, @dimakuv, and @yamahata)


LibOS/shim/src/bookkeep/shim_signal.c, line 106 at r11 (raw file):

    struct shim_signal* old = NULL;
    return __atomic_compare_exchange_n(signal_slot, &old, signal, /*weak=*/false,
                                       __ATOMIC_RELEASE, __ATOMIC_ACQUIRE);

Not __ATOMIC_ACQ_REL on success instead of __ATOMIC_RELEASE?


LibOS/shim/src/bookkeep/shim_signal.c, line 127 at r11 (raw file):

        }
    } while (!__atomic_compare_exchange_n(&queue->put_idx, &put_idx, put_idx + 1, /*weak=*/true,
                                          __ATOMIC_RELEASE, __ATOMIC_ACQUIRE));

ditto


LibOS/shim/src/bookkeep/shim_signal.c, line 148 at r11 (raw file):

    bool ret = queue_append_signal(&thread->signal_queue, signal);
    if (ret) {
        (void)__atomic_add_fetch(&thread->pending_signals, 1, __ATOMIC_RELEASE);

Why not __ATOMIC_ACQ_REL?


LibOS/shim/src/bookkeep/shim_signal.c, line 156 at r11 (raw file):

    bool ret = queue_append_signal(&process_signal_queue, signal);
    if (ret) {
        (void)__atomic_add_fetch(&process_pending_signals_cnt, 1, __ATOMIC_RELEASE);

Why not __ATOMIC_ACQ_REL?


LibOS/shim/src/bookkeep/shim_signal.c, line 176 at r11 (raw file):

        }
    } while (!__atomic_compare_exchange_n(&queue->get_idx, &get_idx, get_idx + 1, /*weak=*/true,
                                          __ATOMIC_RELEASE, __ATOMIC_ACQUIRE));

ditto


LibOS/shim/src/bookkeep/shim_signal.c, line 194 at r11 (raw file):

    struct shim_signal* signal = queue_pop_signal(&thread->signal_queue, sig);
    if (signal) {
        (void)__atomic_sub_fetch(&thread->pending_signals, 1, __ATOMIC_ACQUIRE);

ditto


LibOS/shim/src/bookkeep/shim_signal.c, line 202 at r11 (raw file):

    struct shim_signal* signal = queue_pop_signal(&process_signal_queue, sig);
    if (signal) {
        (void)__atomic_sub_fetch(&process_pending_signals_cnt, 1, __ATOMIC_ACQUIRE);

ditto

Copy link
Contributor

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 13 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: ITL), "fixup! " found in commit messages' one-liners (waiting on @boryspoplawski, @mkow, and @yamahata)


LibOS/shim/src/bookkeep/shim_signal.c, line 155 at r9 (raw file):

Previously, boryspoplawski (Borys Popławski) wrote…

Done. I revisited all usages of atomics in this PR, please validate the current approach.
I'm blocking here until someone explicitly accepts.
cc other reviewers @dimakuv @yamahata

Agreed with Michal on tighter (more conservative) memory ordering here. I believe what you're doing is correct: all loads must have acquire semantics, all stores -- release semantics, and all read-modify-writes -- acquire-release semantics (not all places are fixed yet in this PR).

We all should be more careful with memory orderings now...


LibOS/shim/src/bookkeep/shim_signal.c, line 106 at r11 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

Not __ATOMIC_ACQ_REL on success instead of __ATOMIC_RELEASE?

+1. Since it's a read-modify-write operation, it seems logical to use acquire-release semantics (it's both a load and a store).


LibOS/shim/src/bookkeep/shim_signal.c, line 127 at r11 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

ditto

+1


LibOS/shim/src/bookkeep/shim_signal.c, line 148 at r11 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

Why not __ATOMIC_ACQ_REL?

+1


LibOS/shim/src/bookkeep/shim_signal.c, line 156 at r11 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

Why not __ATOMIC_ACQ_REL?

+1


LibOS/shim/src/bookkeep/shim_signal.c, line 176 at r11 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

ditto

+1


LibOS/shim/src/bookkeep/shim_signal.c, line 194 at r11 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

ditto

+1


LibOS/shim/src/bookkeep/shim_signal.c, line 202 at r11 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

ditto

+1


LibOS/shim/src/bookkeep/shim_thread.c, line 497 at r11 (raw file):

    /* wait on clear_child_tid_pal; this signals that PAL layer exited child thread */
    while (__atomic_load_n(&thread->clear_child_tid_pal, __ATOMIC_RELAXED) != 0) {

Could we change to __ATOMIC_ACQUIRE? I know this is old code, and it looks benign, but would prefer to fix to a stricter ordering.


LibOS/shim/src/sys/shim_exec.c, line 370 at r11 (raw file):

    /* If `execve` is invoked concurrently by multiple threads, let only one succeed. */
    static unsigned int first = 0;
    if (__atomic_exchange_n(&first, 1, __ATOMIC_RELAXED) != 0) {

This should be __ATOMIC_ACQ_REL


LibOS/shim/src/sys/shim_exec.c, line 378 at r11 (raw file):

    /* All other threads are dead. Restoring initial value in case we stay inside same process
     * instance and call execve again. */
    __atomic_store_n(&first, 0, __ATOMIC_RELAXED);

Here it looks benign (we don't have any other threads), but I'd still prefer __ATOMIC_ACQUIRE


LibOS/shim/src/sys/shim_exit.c, line 211 at r11 (raw file):

     * point. */
    static int first = 0;
    if (__atomic_exchange_n(&first, 1, __ATOMIC_RELAXED) != 0) {

__ATOMIC_ACQ_REL


LibOS/shim/test/regression/sigprocmask_pending.c, line 20 at r11 (raw file):

static void signal_handler(int signal) {
    __atomic_add_fetch(&seen_signal_cnt, 1, __ATOMIC_RELAXED);

We are using __ATOMIC_RELAXED here because this test is single-threaded, right? So there is no need to enforce stricter, inter-process memory orderings?

Copy link
Contributor

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 9 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: ITL), "fixup! " found in commit messages' one-liners (waiting on @boryspoplawski, @mkow, and @yamahata)


LibOS/shim/src/bookkeep/shim_signal.c, line 106 at r11 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

+1. Since it's a read-modify-write operation, it seems logical to use acquire-release semantics (it's both a load and a store).

LGTM. I think Borys is correct here. We do not load the value of signal_slot on success (as in, we don't use the old value in that memory cell), so there is no need for additional ACQUIRE semantics.

Also see a long and winded discussion: https://users.rust-lang.org/t/why-does-rust-not-permit-cas-operations-with-release-acquire-ordering/40394/15.


LibOS/shim/src/bookkeep/shim_signal.c, line 127 at r11 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

+1

LGTM. I think Borys is correct here. We do not load the value of put_idx on success (as in, we don't use the old value in that memory cell), so there is no need for additional ACQUIRE semantics.


LibOS/shim/src/bookkeep/shim_signal.c, line 148 at r11 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

+1

LGTM. Borys is right, since it's an unconditional store to this variable, there is no need for additional ACQUIRE semantics.


LibOS/shim/src/bookkeep/shim_signal.c, line 156 at r11 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

+1

LGTM. Borys is right, since it's an unconditional store to this variable, there is no need for additional ACQUIRE semantics.


LibOS/shim/src/bookkeep/shim_signal.c, line 176 at r11 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

+1

LGTM. I think Borys is correct here. We do not load the value of get_idx on success (as in, we don't use the old value in that memory cell), so there is no need for additional ACQUIRE semantics.


LibOS/shim/src/bookkeep/shim_signal.c, line 194 at r11 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

+1

Why ACQUIRE? We are storing into the variable without reading. Shouldn't we have RELEASE semantics?


LibOS/shim/src/bookkeep/shim_signal.c, line 202 at r11 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

+1

Why ACQUIRE? We are storing into the variable without reading. Shouldn't we have RELEASE semantics?


LibOS/shim/src/sys/shim_exec.c, line 370 at r11 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

This should be __ATOMIC_ACQ_REL

LGTM. We only operate on single variable, no need to enforce ordering.


LibOS/shim/src/sys/shim_exec.c, line 378 at r11 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Here it looks benign (we don't have any other threads), but I'd still prefer __ATOMIC_ACQUIRE

LGTM. We only operate on single variable, no need to enforce ordering.


LibOS/shim/src/sys/shim_exit.c, line 211 at r11 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

__ATOMIC_ACQ_REL

LGTM. We only operate on single variable, no need to enforce ordering.


LibOS/shim/test/regression/sigprocmask_pending.c, line 20 at r11 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

We are using __ATOMIC_RELAXED here because this test is single-threaded, right? So there is no need to enforce stricter, inter-process memory orderings?

LGTM. We only operate on single variable, no need to enforce ordering.

Copy link
Contributor Author

@boryspoplawski boryspoplawski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 9 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: ITL), "fixup! " found in commit messages' one-liners (waiting on @boryspoplawski, @dimakuv, @mkow, and @yamahata)


LibOS/shim/src/bookkeep/shim_signal.c, line 106 at r11 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

LGTM. I think Borys is correct here. We do not load the value of signal_slot on success (as in, we don't use the old value in that memory cell), so there is no need for additional ACQUIRE semantics.

Also see a long and winded discussion: https://users.rust-lang.org/t/why-does-rust-not-permit-cas-operations-with-release-acquire-ordering/40394/15.

I'll put here what we've already talked about in priv: here we only care that all previous stores are visible when another thread executes a matching acquire load, hence no need for __ATOMIC_ACQ_REL. This appends a signal, so we need previous stores to be visible, but we do not care about stores that come after.
I'm not comfortable enough with memory orders to insist on current approach, so if we (as in substantial number of maintainers, which currently is 1, I guess) still have reasonable doubts, I can change all of these to SEQ_CST.gg


LibOS/shim/src/bookkeep/shim_signal.c, line 194 at r11 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Why ACQUIRE? We are storing into the variable without reading. Shouldn't we have RELEASE semantics?

We need to match RELEASE in append signal.

mkow
mkow previously approved these changes Jun 7, 2020
Copy link
Member

@mkow mkow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 4 unresolved discussions, not enough approvals from different teams (1 more required, approved so far: ITL), "fixup! " found in commit messages' one-liners (waiting on @boryspoplawski, @dimakuv, and @yamahata)

dimakuv
dimakuv previously approved these changes Jun 8, 2020
Copy link
Contributor

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 1 unresolved discussion, "fixup! " found in commit messages' one-liners (waiting on @yamahata)


LibOS/shim/src/bookkeep/shim_signal.c, line 194 at r11 (raw file):

Previously, boryspoplawski (Borys Popławski) wrote…

We need to match RELEASE in append signal.

OK. Yes, I was stupid. Here we pop one appended signal so we will start reading memory contents pertaining to this signal. So we need to synchronize these reads with writes to the same memory when appending the signal.

Copy link
Contributor Author

@boryspoplawski boryspoplawski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, all discussions resolved, "fixup! " found in commit messages' one-liners


LibOS/shim/src/bookkeep/shim_signal.c, line 194 at r11 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

OK. Yes, I was stupid. Here we pop one appended signal so we will start reading memory contents pertaining to this signal. So we need to synchronize these reads with writes to the same memory when appending the signal.

To be fair I think this could be __ATOMIC_RELAXED, since this is only optimization, real synchronization is in pop vs append.
But I don't want to think about it and start another discussion, so lets leave it as it is.


LibOS/shim/src/sys/shim_exec.c, line 370 at r11 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

LGTM. We only operate on single variable, no need to enforce ordering.

FYI: In theory we should place a compiler barrier after this __atomic, but otoh all we do later is exit the thread, so this shouldn't be any issue (even if those calls will get inlined with a LTO or something)

Previously Graphene delivered each signal to all threads instead of
one. This was additionally abused to quit the application. Now each
signal is delivered to only one arbitrary thread. This commit
additionally cleans the signal and exiting code.
Copy link
Contributor

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 8 files at r10, 20 of 20 files at r12.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved

@dimakuv
Copy link
Contributor

dimakuv commented Jun 8, 2020

Jenkins, retest Jenkins-18.04 please (fork11 failed)

Copy link
Member

@mkow mkow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 20 of 20 files at r12.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved

@boryspoplawski
Copy link
Contributor Author

Jenkins, retest Jenkins-SGX please (sighandler_reset timedout, I'm running it 100 times, locally, probably too low timeout set)

@boryspoplawski
Copy link
Contributor Author

Jenkins, test this please (3/5, sighandler_reset still in debug)

@boryspoplawski
Copy link
Contributor Author

Jenkins, test this please (4/5, sighandler_reset still in debug)

@mkow
Copy link
Member

mkow commented Jun 8, 2020

Jenkins, test this please (5/5, sighandler_reset still in debug)

@boryspoplawski
Copy link
Contributor Author

boryspoplawski commented Jun 8, 2020

Managed to trigger sighandler_reset issue on current master (162268e).

@mkow mkow merged commit 943153a into master Jun 8, 2020
@mkow mkow deleted the borys/sginal_one_proc branch June 8, 2020 22:53
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants