Skip to content
This repository has been archived by the owner on Jan 20, 2022. It is now read-only.

[LibOS] Rework threads #1949

Merged
merged 7 commits into from
Nov 19, 2020
Merged

[LibOS] Rework threads #1949

merged 7 commits into from
Nov 19, 2020

Conversation

boryspoplawski
Copy link
Contributor

@boryspoplawski boryspoplawski commented Nov 7, 2020

Description of the changes

Summary of changes can be found in commit message.
Let me know which part requires more comments, generally I tried to be descriptive with the code (function names etc.)

Fixes #378, fixes #446 (this one was probably already fixed), fixes #969, fixes #1068, fixes #1480, fixes #1937, fixes #1957
Might fix (needs testing) #1297
Closes #1958

Currently there is some issue with IPC, which leads to hang on pipe_ocloexec test from LibOS regression (under SGX only). I'm not yet sure if this issue is introduced in this PR or is already present (e.g. this: #1948). fixes #1948


This change is Reviewable

Copy link
Contributor Author

@boryspoplawski boryspoplawski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 46 files reviewed, 1 unresolved discussion, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: ITL) (waiting on @dimakuv, @mkow, and @yamahata)

a discussion (no related file):
TODO: add a TODO/issue to change threads list to a tree (probably as a follow up to this PR).


Copy link
Contributor Author

@boryspoplawski boryspoplawski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 46 files reviewed, 2 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: ITL) (waiting on @dimakuv, @mkow, and @yamahata)

a discussion (no related file):
TODO: check if any LTP test can be enabled (probably after debugging IPC issue).


Copy link
Contributor

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 36 of 46 files at r1.
Reviewable status: 36 of 46 files reviewed, 30 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: ITL) (waiting on @boryspoplawski, @dimakuv, @mkow, and @yamahata)


LibOS/shim/include/shim_checkpoint.h, line 350 at r1 (raw file):

 * The remaining arguments are passed into the migration function.
 *
 * \return             0 on success, negative POSIX error code on failure.

This indentation is now meaningless. Either no indentation, or align with above params.


LibOS/shim/include/shim_checkpoint.h, line 354 at r1 (raw file):

int create_process_and_send_checkpoint(migrate_func_t migrate_func, struct shim_handle* exec,
                                       struct shim_child_process* child_process,
                                       struct shim_process* process_description,

Why do you have two objects describing the child process (child_process and process_description)? Also, the child_process argument is not listen in the comment.

UPDATE: Ok, child_process is used in case of clone/fork (exec is used in a contrast case of execve), and is added to the list of child processes of the current process. Please update the comment.


LibOS/shim/include/shim_ipc.h, line 173 at r1 (raw file):

struct shim_ipc_cld_exit {
    IDTYPE ppid, pid;
    IDTYPE uid;

Is uid added for sanity/for future? Because currently I don't see any uses of it; it is always the same as the initial one (and seems to be zero?).


LibOS/shim/include/shim_process.h, line 23 at r1 (raw file):

    IDTYPE vmid;

    int death_notification_signal;

Why this name? Maybe child_termination_signal -- the term used in man pages?


LibOS/shim/include/shim_process.h, line 27 at r1 (raw file):

    int exit_code;
    int term_signal;
    IDTYPE uid;

Nitpick: move uid close to pid/vmid.


LibOS/shim/include/shim_process.h, line 46 at r1 (raw file):

    mode_t umask;

    /* Handle to the executable. Protected by `fs_lock`. */

This comment looks useless. Maybe at least add ...executable file?


LibOS/shim/include/shim_process.h, line 74 at r1 (raw file):

/*
 * These 2 functions mark a child as exited, moving it from `children` list to `zombies` list
 * and generate a child-death signal (if needed).

child-death -> child-termination


LibOS/shim/include/shim_thread.h, line 56 at r1 (raw file):

DEFINE_LISTP(shim_thread);
struct shim_thread {
    /* Field for inserting threads on global `thread_list`. */

Can we rename the global thread_list to g_thread_list (and its corresponding lock)?


LibOS/shim/include/shim_thread.h, line 63 at r1 (raw file):

    /* credentials */
    IDTYPE uid, gid, euid, egid;

Aren't these supposed to be per-process rather than per-thread?


LibOS/shim/include/shim_thread.h, line 82 at r1 (raw file):

     * `process_pending_signals_cnt`. */
    uint64_t pending_signals;
    unsigned char signal_handled;

Can you add comment describing three options: NOT_HANDLED, HANDLED, RESTART


LibOS/shim/include/shim_thread.h, line 107 at r1 (raw file):

    struct shim_thread_queue* next;
    struct shim_thread* thread;
    bool in_use;

Could you put a comment why this is needed?


LibOS/shim/src/shim_context-x86_64.c, line 146 at r1 (raw file):

    *(unsigned long*)(regs.rsp - RED_ZONE_SIZE - 8) = regs.rip;

    context->regs = NULL;

Is this just for sanity? Looks superfluous.


LibOS/shim/src/shim_init.c, line 623 at r1 (raw file):

        debug("exit status collides with Graphene-internal magic status; changed to 1\n");
        exit_code = 1;
    }

Don't we want to move this debug message to shim_do_exitgroup (or whatever is used now to exit at LibOS level)? Looks useful.


LibOS/shim/src/bookkeep/shim_handle.c, line 42 at r1 (raw file):

    /* XXX: Try getting the root FS from current thread? */
    assert(cur_thread);
    assert(cur_thread->root);

This change collides with my #1951. Will need to be rebased (it's the subset of what I changed).


LibOS/shim/src/bookkeep/shim_process.c, line 133 at r1 (raw file):

            parent_signal = child->death_notification_signal;
            child_pid = child->pid,

Wow, , -> ;


LibOS/shim/src/bookkeep/shim_process.c, line 136 at r1 (raw file):

            wait_queue = g_process.wait_queue;
            g_process.wait_queue = NULL;

Is this correct? Is the assumption that there is only one process we can wait on at any single time?


LibOS/shim/src/bookkeep/shim_process.c, line 167 at r1 (raw file):

        struct shim_thread* thread = wait_queue->thread;
        __atomic_store_n(&wait_queue->in_use, false, __ATOMIC_RELEASE);
        COMPILER_BARRIER();

Why do we need this barrier?


LibOS/shim/src/bookkeep/shim_process.c, line 171 at r1 (raw file):

        wait_queue = next;
    }

Please remove one line


LibOS/shim/src/bookkeep/shim_process.c, line 184 at r1 (raw file):

}

BEGIN_CP_FUNC(process_description) {

Why don't we checkpoint wait_queue? I guess it doesn't make sense because all non-main threads are destroyed... Then could you add an explicit ``process->wait_queue = NULL` in restore func and put a comment that there is no need to checkpoint/restore the queue?


LibOS/shim/src/bookkeep/shim_process.c, line 247 at r1 (raw file):

        INIT_LIST_HEAD(&zombies[i], list);
        i++;
    }

Could you rename child -> zombie in this loop?


LibOS/shim/src/bookkeep/shim_signal.c, line 722 at r1 (raw file):

    unsigned long sa_flags = sig_action->sa_flags;

    if (allow_reset && sig_action->sa_flags & SA_RESETHAND) {

You can simplify this now: sig_action->sa_flags -> sa_flags


LibOS/shim/src/bookkeep/shim_thread.c, line 28 at r1 (raw file):

static LISTP_TYPE(shim_thread) thread_list = LISTP_INIT;
struct shim_lock thread_list_lock;

Please prepend with g_ all these vars.


LibOS/shim/src/bookkeep/shim_thread.c, line 125 at r1 (raw file):

    if (!cur_thread->tid) {
        debug("Cannot allocate pid for the initial thread!\n");
        return -ESRCH;

I guess we don't care about proper memory release here because it's an init_* function? Otherwise please add put_thread(cur_thread).


LibOS/shim/src/bookkeep/shim_thread.c, line 173 at r1 (raw file):

}

static IDTYPE get_internal_pid(void) {

Ideally we should rename this to get_new_internal_tid().


LibOS/shim/src/bookkeep/shim_thread.c, line 200 at r1 (raw file):

    thread->gid       = cur_thread->gid;
    thread->euid      = cur_thread->euid;
    thread->egid      = cur_thread->egid;

See my other comment, but why are these per-thread?


LibOS/shim/src/bookkeep/shim_thread.c, line 229 at r1 (raw file):

    thread->tid = get_internal_pid();
    if (!thread->tid) {
        free(thread);

I think put_thread() is better.


LibOS/shim/src/bookkeep/shim_thread.c, line 319 at r1 (raw file):

/*
 * Checks whether there are any other threads on `thread_list` (i.e. if we are the last thread).
 * If `mark_self_dead` is true additionally takes us of the `thread_list`.

of -> off


LibOS/shim/src/sys/shim_futex.c, line 415 at r1 (raw file):

        thread = remove_futex_waiter(waiter, futex);
        add_thread_to_queue(queue, thread);
        put_thread(thread);

Was that a bug before? What happens now if the thread was already in some queue (in this case we don't do get_thread(thread) but still do put_thread(thread))? Won't ref count underflow?

Copy link
Contributor

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 10 of 46 files at r1.
Reviewable status: all files reviewed, 45 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: ITL) (waiting on @boryspoplawski, @mkow, and @yamahata)


LibOS/shim/src/ipc/shim_ipc_child.c, line 32 at r1 (raw file):

     */
    if (mark_child_exited_by_vmid(vmid, /*uid=*/0, /*exit_code=*/0, SIGKILL)) {
        debug("Child process (id: 0x%x) got disconnected\n", vmid);

Could you replace vmid -> vmid & 0xFFFF? We do this for readability of the debug output.`

Can you also change id to vmid in the debug message? Otherwise it is ambiguous: is it host-level pid (vmid) or Graphene-iternal pid (pid)?


LibOS/shim/src/ipc/shim_ipc_child.c, line 84 at r1 (raw file):

          msgin->ppid, msgin->pid, msgin->exitcode, msgin->term_signal);

    /* message cannot come from our own threads (from ourselves as process) */

/* Message cannot come from our own process. */


LibOS/shim/src/ipc/shim_ipc_helper.c, line 641 at r1 (raw file):

    if (needs_wait) {
        /* Wait untill the helper thread notices the stop event. */

Typo: until


LibOS/shim/src/sys/shim_clone.c, line 37 at r1 (raw file):

    PAL_HANDLE create_event;
    PAL_HANDLE initialize_event;
    struct shim_thread* parent;

parent seems to be useless now


LibOS/shim/src/sys/shim_clone.c, line 213 at r1 (raw file):

    get_dentry(process_description.root);
    get_dentry(process_description.cwd);

Shouldn't we check if root and cwd are not NULL, just like we do with exec?


LibOS/shim/src/sys/shim_clone.c, line 226 at r1 (raw file):

    clear_lock(&process_description.children_lock);

    child_process->pid = process_description.pid;

Can we add child_proces->vmid = 0 and add a comment that vmid is set to the host-pid after the process is actually created at the host level?


LibOS/shim/src/sys/shim_clone.c, line 242 at r1 (raw file):

    put_handle(process_description.exec);
    put_dentry(process_description.cwd);
    put_dentry(process_description.root);

Shouldn't we have if ([exec | cwd | root]) put_*(...)


LibOS/shim/src/sys/shim_clone.c, line 424 at r1 (raw file):

    get_thread(thread);
    new_args.thread  = thread;
    new_args.parent  = self;

Useless


LibOS/shim/src/sys/shim_exec.c, line 428 at r1 (raw file):

    pause_ipc_helper();

    lock(&cur_thread->lock);

Do we need this thread lock/unlock if we know that we are the only thread left?


LibOS/shim/src/sys/shim_exit.c, line 6 at r1 (raw file):

 *                    Borys Popławski <borysp@invisiblethingslab.com>
 * Copyright (C) 2020 Intel Corporation
 *                    Borys Popławski <borysp@invisiblethingslab.com>

Why are there two copyrights with your name?


LibOS/shim/src/sys/shim_exit.c, line 73 at r1 (raw file):

        lock(&cur_thread->lock);
        assert(cur_thread->shim_tcb->tp == cur_thread);
        cur_thread->shim_tcb->tp = NULL;

Why do we do this?


LibOS/shim/src/sys/shim_exit.c, line 92 at r1 (raw file):

    }

    /* Let parent know we exited. */

Can you prepend This is the last thread of the process; let parent ...


LibOS/shim/src/sys/shim_getpid.c, line 18 at r1 (raw file):

pid_t shim_do_gettid(void) {
    return get_cur_thread()->tid;

Why no lock?


LibOS/shim/src/sys/shim_getpid.c, line 28 at r1 (raw file):

    struct shim_thread* cur = get_cur_thread();
    cur->clear_child_tid    = tidptr;
    return cur->tid;

Why no lock?


LibOS/shim/src/sys/shim_getuid.c, line 39 at r1 (raw file):

    get_cur_thread()->egid = gid;
    return 0;
}

Lock of thread in all these funcs?


LibOS/shim/src/sys/shim_wait.c, line 60 at r1 (raw file):

}

static void remove_qnode_from_wait_queue(struct shim_thread_queue* qnode) {

I am confused by the logic of this function. How is this possible that seen can be false? We are setting ourselves to wait in the wait_queue, so there is noone else to remove us from the queue. So why do we need the additional in_use check?

Copy link
Contributor Author

@boryspoplawski boryspoplawski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 43 of 46 files at r3, 2 of 2 files at r4, 1 of 1 files at r5.
Reviewable status: 33 of 46 files reviewed, 45 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: ITL), "fixup! " found in commit messages' one-liners (waiting on @boryspoplawski, @dimakuv, @mkow, and @yamahata)


LibOS/shim/include/shim_checkpoint.h, line 350 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

This indentation is now meaningless. Either no indentation, or align with above params.

Done.


LibOS/shim/include/shim_checkpoint.h, line 354 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Why do you have two objects describing the child process (child_process and process_description)? Also, the child_process argument is not listen in the comment.

UPDATE: Ok, child_process is used in case of clone/fork (exec is used in a contrast case of execve), and is added to the list of child processes of the current process. Please update the comment.

Done.


LibOS/shim/include/shim_ipc.h, line 173 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Is uid added for sanity/for future? Because currently I don't see any uses of it; it is always the same as the initial one (and seems to be zero?).

This type of message is generated when a process exits to inform the parent. This field is filled (together with rest of the struct) when sending such a message in ipc_cld_exit_send and is set to the current uid of the last thread.


LibOS/shim/include/shim_process.h, line 23 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Why this name? Maybe child_termination_signal -- the term used in man pages?

Done.


LibOS/shim/include/shim_process.h, line 27 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Nitpick: move uid close to pid/vmid.

This was intentional because this field is set together with exit_code and term_signal - it comes in a IPC message. Added a comment.


LibOS/shim/include/shim_process.h, line 46 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

This comment looks useless. Maybe at least add ...executable file?

Done. This was mainly to state that you need to take fs_lock to access this field.


LibOS/shim/include/shim_process.h, line 74 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

child-death -> child-termination

Done.


LibOS/shim/include/shim_thread.h, line 56 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Can we rename the global thread_list to g_thread_list (and its corresponding lock)?

Done.


LibOS/shim/include/shim_thread.h, line 63 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Aren't these supposed to be per-process rather than per-thread?

No, on Linux uids are per thread attribute.
Side note: threading libraries (like NPTL) will make sure all threads have the same credentials, since POSIX requires that, but on kernel level each thread have a separate set of them.


LibOS/shim/include/shim_thread.h, line 82 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Can you add comment describing three options: NOT_HANDLED, HANDLED, RESTART

Done.


LibOS/shim/include/shim_thread.h, line 107 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Could you put a comment why this is needed?

Done.


LibOS/shim/src/shim_context-x86_64.c, line 146 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Is this just for sanity? Looks superfluous.

This context is part of tcb (at least clone calls this functions in such a way). I have no idea what other parts of code use this field (seems like signals sometimes do?), but it's part of the old stack (in clone case), so we should not be using it after this function. Additionally the old version cleaned (set to 0) most of the context.


LibOS/shim/src/shim_init.c, line 623 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Don't we want to move this debug message to shim_do_exitgroup (or whatever is used now to exit at LibOS level)? Looks useful.

User application exit code is truncated to 0-255 (in shim_do_exitgroup) so there is no way it collides with PAL_WAIT_FOR_CHILDREN_EXIT (which is much higher).
UPDATE: seems I've left this message in libos_clean_and_exit (in LibOS/shim/src/sys/shim_exit.c).


LibOS/shim/src/bookkeep/shim_handle.c, line 42 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

This change collides with my #1951. Will need to be rebased (it's the subset of what I changed).

Right, merging this will take longer for sure.


LibOS/shim/src/bookkeep/shim_process.c, line 133 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Wow, , -> ;

Oops, done.


LibOS/shim/src/bookkeep/shim_process.c, line 136 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Is this correct? Is the assumption that there is only one process we can wait on at any single time?

There is no such assumption, but indeed there is only one queue. When a thread wakes up (after waiting on this queue) it does wait checks again (see the main loop in waitid for details). Here we always wake all waiting threads - I didn't see a better way of implementing this given that you can wait for a single child, all children or a arbitrary subset of them (via pgid).


LibOS/shim/src/bookkeep/shim_process.c, line 167 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Why do we need this barrier?

In theory thread_wakeup could be moved before the atomic store of in_use above (__ATOMIC_RELEASE only blocks compiler from moving code from before it to after), e.g. imagine thread_wakeup is just another __ATOMIC_RELAXED store and gets inlined - then what would be stopping the compiler from reordering them?


LibOS/shim/src/bookkeep/shim_process.c, line 171 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Please remove one line

Done.


LibOS/shim/src/bookkeep/shim_process.c, line 184 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Why don't we checkpoint wait_queue? I guess it doesn't make sense because all non-main threads are destroyed... Then could you add an explicit ``process->wait_queue = NULL` in restore func and put a comment that there is no need to checkpoint/restore the queue?

Done.


LibOS/shim/src/bookkeep/shim_process.c, line 247 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Could you rename child -> zombie in this loop?

Done.


LibOS/shim/src/bookkeep/shim_signal.c, line 722 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

You can simplify this now: sig_action->sa_flags -> sa_flags

Done.


LibOS/shim/src/bookkeep/shim_thread.c, line 28 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Please prepend with g_ all these vars.

Done.


LibOS/shim/src/bookkeep/shim_thread.c, line 125 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

I guess we don't care about proper memory release here because it's an init_* function? Otherwise please add put_thread(cur_thread).

Yes, I skipped cleaning up on failures in all init_* functions because there's noting else we can do but to kill the whole process and sometimes it would be hard/impossible to do proper cleanups.


LibOS/shim/src/bookkeep/shim_thread.c, line 173 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Ideally we should rename this to get_new_internal_tid().

Done.


LibOS/shim/src/bookkeep/shim_thread.c, line 200 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

See my other comment, but why are these per-thread?

Explained there.


LibOS/shim/src/bookkeep/shim_thread.c, line 229 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

I think put_thread() is better.

Done.


LibOS/shim/src/bookkeep/shim_thread.c, line 319 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

of -> off

Done.


LibOS/shim/src/ipc/shim_ipc_child.c, line 32 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Could you replace vmid -> vmid & 0xFFFF? We do this for readability of the debug output.`

Can you also change id to vmid in the debug message? Otherwise it is ambiguous: is it host-level pid (vmid) or Graphene-iternal pid (pid)?

Done.
Is vmid really the host pid? Because then this truncation does not make much sense. Also wouldn't it be an issue if a malicious host (in case of SGX) would show same host-level pid to two processes, wouldn't ipc code get lost then?


LibOS/shim/src/ipc/shim_ipc_child.c, line 84 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

/* Message cannot come from our own process. */

Done.


LibOS/shim/src/ipc/shim_ipc_helper.c, line 641 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Typo: until

Done.


LibOS/shim/src/sys/shim_clone.c, line 37 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

parent seems to be useless now

Done.


LibOS/shim/src/sys/shim_clone.c, line 213 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Shouldn't we check if root and cwd are not NULL, just like we do with exec?

They are all always not NULL, idk why I checked that for exec ...


LibOS/shim/src/sys/shim_clone.c, line 226 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Can we add child_proces->vmid = 0 and add a comment that vmid is set to the host-pid after the process is actually created at the host level?

Added a comment. child_process is created using a special function which should set all values to appropriate defaults (well, it just does calloc which sets everything to 0).


LibOS/shim/src/sys/shim_clone.c, line 242 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Shouldn't we have if ([exec | cwd | root]) put_*(...)

As mentioned above they are all always not NULL.


LibOS/shim/src/sys/shim_clone.c, line 424 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Useless

Done.


LibOS/shim/src/sys/shim_exec.c, line 428 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Do we need this thread lock/unlock if we know that we are the only thread left?

Most likely we do not, I just didn't pay much attention to it as taking a free lock is, well, free.
Removed them and moved the comment about it here.


LibOS/shim/src/sys/shim_exit.c, line 6 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Why are there two copyrights with your name?

The first one was missing from #1510 the second is for these changes.


LibOS/shim/src/sys/shim_exit.c, line 73 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Why do we do this?

Because we delete that reference, so no reason to leave a dangling pointer (even if it won't be ever used in practice)


LibOS/shim/src/sys/shim_exit.c, line 92 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Can you prepend This is the last thread of the process; let parent ...

Done.


LibOS/shim/src/sys/shim_futex.c, line 415 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Was that a bug before? What happens now if the thread was already in some queue (in this case we don't do get_thread(thread) but still do put_thread(thread))? Won't ref count underflow?

No, there was no bug here before, I just changed add_thread_to_queue so that it increases thread refcount if it adds it to the queue. This put_thread is just for this thread reference we got from remove_futex_waiter. The previous version just omitted it if the thread was added to the queue, since old add_thread_to_queue did not do that (basically did nothing instead of get + put).


LibOS/shim/src/sys/shim_getpid.c, line 18 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Why no lock?

tid cannot ever change (is set once in the struct creation) so we do not need a lock to access it.


LibOS/shim/src/sys/shim_getpid.c, line 28 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Why no lock?

ditto


LibOS/shim/src/sys/shim_getuid.c, line 39 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Lock of thread in all these funcs?

Done. (yeah they were missing, I probably initially thought we don't need to lock here, since other threads cannot change our ids, but did not consider signals and such)


LibOS/shim/src/sys/shim_wait.c, line 60 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

I am confused by the logic of this function. How is this possible that seen can be false? We are setting ourselves to wait in the wait_queue, so there is noone else to remove us from the queue. So why do we need the additional in_use check?

Why, one thread can remove another thread from this queue, in particular ipc helper thread does remove (and wake) threads on this queue.
We need in_use check since there is no other way of knowing if we are still on this queue and were woken up spuriously.

Copy link
Contributor

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 32 of 46 files at r3, 1 of 2 files at r4, 13 of 13 files at r6.
Reviewable status: all files reviewed, 7 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: ITL), "fixup! " found in commit messages' one-liners (waiting on @boryspoplawski, @dimakuv, @mkow, and @yamahata)


LibOS/shim/include/shim_thread.h, line 63 at r1 (raw file):

Previously, boryspoplawski (Borys Popławski) wrote…

No, on Linux uids are per thread attribute.
Side note: threading libraries (like NPTL) will make sure all threads have the same credentials, since POSIX requires that, but on kernel level each thread have a separate set of them.

OK, thanks for clarification.


LibOS/shim/include/shim_thread.h, line 84 at r6 (raw file):

    /*
     * This field is used for checking whether we handled a signal during blocking parts in LibOS

blocking parts is ambiguous. Maybe add an example in parenthesis? Like ...blocking parts (e.g. blocking read on a network socket) ...


LibOS/shim/include/shim_thread.h, line 118 at r6 (raw file):

    struct shim_thread* thread;
    /* We use this field to mark that this object is still in use (is on some queue). This is needed
     * to distinquish spurious wake-ups from real ones. */

distinquish -> distinguish


LibOS/shim/src/shim_init.c, line 623 at r1 (raw file):

Previously, boryspoplawski (Borys Popławski) wrote…

User application exit code is truncated to 0-255 (in shim_do_exitgroup) so there is no way it collides with PAL_WAIT_FOR_CHILDREN_EXIT (which is much higher).
UPDATE: seems I've left this message in libos_clean_and_exit (in LibOS/shim/src/sys/shim_exit.c).

OK.


LibOS/shim/src/ipc/shim_ipc_child.c, line 32 at r1 (raw file):

Is vmid really the host pid? Because then this truncation does not make much sense.

No, it's a bit more complicated than just a host pid:

LibOS/shim/src/shim_init.c:    g_process_ipc_info.vmid = (IDTYPE)PAL_CB(process_id);
src/host/Linux-SGX/db_main.c:    g_linux_state.process_id = (start_time & (~0xffff)) | g_pal_sec.pid;

I don't know the reason behind throwing the timestamp in the vmid value...

Also wouldn't it be an issue if a malicious host (in case of SGX) would show same host-level pid to two processes, wouldn't ipc code get lost then?

Yes, I think it's a legit issue (the attacker currently also needs to adjust the start_time but that's doable). Though recall that all IPC is encrypted in SGX, and two processes with the same vmid will conflict on the TLS sequence numbers, so it looks only DoS attacks are possible with this.


LibOS/shim/src/ipc/shim_ipc_helper.c, line 732 at r6 (raw file):

        if (clear_event(&install_new_event) < 0) {
            goto out_err_unlock;
        }

Not really related to your changes, but I think there is a data race in the IPC helper: install_new_event is used to unblock the IPC helper thread if it is in blocking poll (on new-port-added or old-port-deleted events).

When we do clear_event(), it clears all events, and it may happen after unlock(ipc_helper_lock) in the previous iteration of the while loop and before lock(ipc_helper_lock) in a new iteration of the while loop. So very rarely, new-port-added and old-port-deleted events may be lost.

The lost event of new-port-added seems fine -- this code re-adds the ports from the global port_list anyway, so the IPC helper will learn about the newly added port. However, the lost event of old-port-deleted seems bad -- the del_ipc_port_fini() callback may never be called on this deleted port (the IPC helper doesn't notice this event). I wonder if this can manifest in hangs of applications?

I can unblock this comment if this thinking is wrong or if this is out of scope of this PR.


LibOS/shim/src/sys/shim_getpid.c, line 28 at r1 (raw file):

Previously, boryspoplawski (Borys Popławski) wrote…

ditto

But we also update cur->clear_child_tid in a non-atomic way. Could it lead theoretically to "partial updates" when the same thread calls set_tid_address() in the main execution and in the signal handler?


LibOS/shim/src/sys/shim_wait.c, line 60 at r1 (raw file):

Previously, boryspoplawski (Borys Popławski) wrote…

Why, one thread can remove another thread from this queue, in particular ipc helper thread does remove (and wake) threads on this queue.
We need in_use check since there is no other way of knowing if we are still on this queue and were woken up spuriously.

Ok, now I got it.

Copy link
Contributor

@yamahata yamahata left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 32 of 46 files at r3, 1 of 2 files at r4, 13 of 13 files at r6.
Reviewable status: all files reviewed, 13 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: ITL), "fixup! " found in commit messages' one-liners (waiting on @boryspoplawski, @dimakuv, and @mkow)


LibOS/shim/src/shim_init.c, line 501 at r6 (raw file):

    RUN_INIT(init_process);
    RUN_INIT(init_thread);
    RUN_INIT(init_mount);

init_process() should be after init_mount().
Please see the comment in init_process().


LibOS/shim/src/bookkeep/shim_process.c, line 19 at r6 (raw file):

struct shim_child_process*

const struct shim_child_process*. Maybe a matter of preference.


LibOS/shim/src/bookkeep/shim_process.c, line 59 at r6 (raw file):

    if (root_config && get_config(root_config, "fs.start_dir", dir_cfg, sizeof(dir_cfg)) > 0) {
        dent = NULL;
        ret = path_lookupat(NULL, dir_cfg, 0, &dent, NULL);

This expects that all mount points are initialized.
So init_process() should be called after init_mount().


LibOS/shim/src/bookkeep/shim_process.c, line 201 at r6 (raw file):

struct shim_process

nitpick: sizeof(*process) and so on.
In graphene, variable name is preferred to type name.


LibOS/shim/src/bookkeep/shim_process.c, line 202 at r6 (raw file):

sizeof(struct shim_child_process)

ditto.


LibOS/shim/src/bookkeep/shim_process.c, line 204 at r6 (raw file):

sizeof(struct shim_child_process)

ditto.

Copy link
Contributor

@yamahata yamahata left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 15 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: ITL), "fixup! " found in commit messages' one-liners (waiting on @boryspoplawski, @dimakuv, and @mkow)


LibOS/shim/src/sys/shim_wait.c, line 89 at r6 (raw file):

        }
        /* We cannot handle any errors here. */
        (void)thread_sleep(NO_TIMEOUT);

Why do we need to sleep here?


LibOS/shim/src/sys/shim_wait.c, line 198 at r6 (raw file):

            }

            DkEventClear(self->scheduler_event);

This seems race. Probably DkEventClear() should be done within lock or before queuing thread.

Copy link
Contributor Author

@boryspoplawski boryspoplawski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 42 of 46 files reviewed, 14 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: ITL), "fixup! " found in commit messages' one-liners (waiting on @boryspoplawski, @dimakuv, @mkow, and @yamahata)


LibOS/shim/include/shim_thread.h, line 84 at r6 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

blocking parts is ambiguous. Maybe add an example in parenthesis? Like ...blocking parts (e.g. blocking read on a network socket) ...

Removed the part about blocking, check the comment now.


LibOS/shim/include/shim_thread.h, line 118 at r6 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

distinquish -> distinguish

Done.


LibOS/shim/src/bookkeep/shim_process.c, line 19 at r6 (raw file):

Previously, yamahata wrote…
struct shim_child_process*

const struct shim_child_process*. Maybe a matter of preference.

Done.


LibOS/shim/src/bookkeep/shim_process.c, line 59 at r6 (raw file):

Previously, yamahata wrote…

This expects that all mount points are initialized.
So init_process() should be called after init_mount().

I don't think we can change the order of inits, init_mount relies on g_process.root (hence fs_lock too) being initialized. Maybe we should temporarily set cwd to be the same as root and then change it after init_mount if needed?


LibOS/shim/src/bookkeep/shim_process.c, line 201 at r6 (raw file):

Previously, yamahata wrote…
struct shim_process

nitpick: sizeof(*process) and so on.
In graphene, variable name is preferred to type name.

That's true, but in this case it should be sizeof(*new_process), but new_process is defined and initialized based on off which is created in this line ...


LibOS/shim/src/bookkeep/shim_process.c, line 202 at r6 (raw file):

Previously, yamahata wrote…
sizeof(struct shim_child_process)

ditto.

Done.


LibOS/shim/src/bookkeep/shim_process.c, line 204 at r6 (raw file):

Previously, yamahata wrote…
sizeof(struct shim_child_process)

ditto.

Done.


LibOS/shim/src/ipc/shim_ipc_child.c, line 32 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Is vmid really the host pid? Because then this truncation does not make much sense.

No, it's a bit more complicated than just a host pid:

LibOS/shim/src/shim_init.c:    g_process_ipc_info.vmid = (IDTYPE)PAL_CB(process_id);
src/host/Linux-SGX/db_main.c:    g_linux_state.process_id = (start_time & (~0xffff)) | g_pal_sec.pid;

I don't know the reason behind throwing the timestamp in the vmid value...

Also wouldn't it be an issue if a malicious host (in case of SGX) would show same host-level pid to two processes, wouldn't ipc code get lost then?

Yes, I think it's a legit issue (the attacker currently also needs to adjust the start_time but that's doable). Though recall that all IPC is encrypted in SGX, and two processes with the same vmid will conflict on the TLS sequence numbers, so it looks only DoS attacks are possible with this.

Okey, if it is only DoS type then it's ok, trusting you on this :)


LibOS/shim/src/sys/shim_getpid.c, line 28 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

But we also update cur->clear_child_tid in a non-atomic way. Could it lead theoretically to "partial updates" when the same thread calls set_tid_address() in the main execution and in the signal handler?

Oops, you are right, I was talking only about tid read here, clear_child_tid needs a lock. Done.


LibOS/shim/src/sys/shim_wait.c, line 60 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Ok, now I got it.

Also while (!seen) {} is just a shortcut for if (!seen) while (1) {}


LibOS/shim/src/sys/shim_wait.c, line 89 at r6 (raw file):

Previously, yamahata wrote…

Why do we need to sleep here?

We could do a busy loop, but I guess sleeping is better? On the other hand if we got removed from the g_process.wait_queue then in_use should be cleared soon, so buys loop isn't that bad ... I don't know, I don't have any preference on either approach, let me know which one you think is better.


LibOS/shim/src/sys/shim_wait.c, line 198 at r6 (raw file):

Previously, yamahata wrote…

This seems race. Probably DkEventClear() should be done within lock or before queuing thread.

We check in_use before sleeping and wakers always set in_user before waking us up (setting scheduler_event) so all events order should be covered. Let me know if I missed something.

Copy link
Contributor Author

@boryspoplawski boryspoplawski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 42 of 46 files reviewed, 14 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: ITL), "fixup! " found in commit messages' one-liners (waiting on @boryspoplawski, @dimakuv, @mkow, and @yamahata)


LibOS/shim/src/ipc/shim_ipc_helper.c, line 732 at r6 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Not really related to your changes, but I think there is a data race in the IPC helper: install_new_event is used to unblock the IPC helper thread if it is in blocking poll (on new-port-added or old-port-deleted events).

When we do clear_event(), it clears all events, and it may happen after unlock(ipc_helper_lock) in the previous iteration of the while loop and before lock(ipc_helper_lock) in a new iteration of the while loop. So very rarely, new-port-added and old-port-deleted events may be lost.

The lost event of new-port-added seems fine -- this code re-adds the ports from the global port_list anyway, so the IPC helper will learn about the newly added port. However, the lost event of old-port-deleted seems bad -- the del_ipc_port_fini() callback may never be called on this deleted port (the IPC helper doesn't notice this event). I wonder if this can manifest in hangs of applications?

I can unblock this comment if this thinking is wrong or if this is out of scope of this PR.

Yeah, this needs more consideration.
I only thought about case of adding/removing ports from that poll list, which is race free currently (all set_event and clear_event are done under ipc_herlper_lock; if such event appears before we do clear_event at the loop begning, it must have been done before we got the lock hence we will notice all port list changes).
If a port gets disconnected, shouldn't it trigger poll on its pal handle?

Copy link
Contributor

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r7, 3 of 3 files at r8.
Reviewable status: all files reviewed, 12 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: ITL), "fixup! " found in commit messages' one-liners (waiting on @boryspoplawski, @dimakuv, @mkow, and @yamahata)


LibOS/shim/src/bookkeep/shim_process.c, line 59 at r6 (raw file):

Previously, boryspoplawski (Borys Popławski) wrote…

I don't think we can change the order of inits, init_mount relies on g_process.root (hence fs_lock too) being initialized. Maybe we should temporarily set cwd to be the same as root and then change it after init_mount if needed?

Good catch from Isaku!

I agree with Borys that we should keep the order of inits as now (init_process, then init_mount). As Borys suggests, let's:

  1. Set g_process.cwd = g_process.root temporarily.
  2. Move the "init cwd from fs.start_dir" to the very end of init_mount(). This makes sense because only at that point Graphene mounted all available directories, and now Graphene can finally set CWD. (In a cleaner approach, if Borys really wants, he can introduce a new init_cwd() function that is called right after init_mount()).

LibOS/shim/src/ipc/shim_ipc_helper.c, line 732 at r6 (raw file):

If a port gets disconnected, shouldn't it trigger poll on its pal handle?

But I'm describing exactly this case when a port get disconnected in a time window (after poll, before clear_event). The port is already not in theport_list (so it won't be readded to the poll) and its disconnect wasn't triggered on poll because it happened after poll.

Or did you mean something else?

Copy link
Contributor Author

@boryspoplawski boryspoplawski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 12 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: ITL), "fixup! " found in commit messages' one-liners (waiting on @boryspoplawski, @dimakuv, @mkow, and @yamahata)


LibOS/shim/src/ipc/shim_ipc_helper.c, line 732 at r6 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

If a port gets disconnected, shouldn't it trigger poll on its pal handle?

But I'm describing exactly this case when a port get disconnected in a time window (after poll, before clear_event). The port is already not in theport_list (so it won't be readded to the poll) and its disconnect wasn't triggered on poll because it happened after poll.

Or did you mean something else?

Isn't a disconnected port manually removed from the ports list in the loop after poll?

Copy link
Contributor

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 12 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: ITL), "fixup! " found in commit messages' one-liners (waiting on @boryspoplawski, @dimakuv, @mkow, and @yamahata)


LibOS/shim/src/ipc/shim_ipc_helper.c, line 732 at r6 (raw file):

Previously, boryspoplawski (Borys Popławski) wrote…

Isn't a disconnected port manually removed from the ports list in the loop after poll?

Ok, I was over-thinking it. If the port got disconnected after the poll, it will not be removed from the port_list until the next poll (where it will be noticed in its revents that it got disconnected). So there is no data race on this. Ignore me.

Copy link
Contributor Author

@boryspoplawski boryspoplawski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 2 files at r10.
Reviewable status: 46 of 49 files reviewed, 11 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: ITL), "fixup! " found in commit messages' one-liners (waiting on @dimakuv, @mkow, and @yamahata)


LibOS/shim/src/shim_init.c, line 501 at r6 (raw file):

Previously, yamahata wrote…

init_process() should be after init_mount().
Please see the comment in init_process().

Please check the other comment.


LibOS/shim/src/bookkeep/shim_process.c, line 59 at r6 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Good catch from Isaku!

I agree with Borys that we should keep the order of inits as now (init_process, then init_mount). As Borys suggests, let's:

  1. Set g_process.cwd = g_process.root temporarily.
  2. Move the "init cwd from fs.start_dir" to the very end of init_mount(). This makes sense because only at that point Graphene mounted all available directories, and now Graphene can finally set CWD. (In a cleaner approach, if Borys really wants, he can introduce a new init_cwd() function that is called right after init_mount()).

Done.

Copy link
Contributor

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 2 files at r9, 2 of 2 files at r10.
Reviewable status: all files reviewed, 11 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: ITL), "fixup! " found in commit messages' one-liners (waiting on @dimakuv, @mkow, and @yamahata)

@dimakuv
Copy link
Contributor

dimakuv commented Nov 13, 2020

Jenkins, retest this please (now Jenkins is back again)

@boryspoplawski
Copy link
Contributor Author

Jenkins, retest this please (making sure, run 2/10)

@boryspoplawski
Copy link
Contributor Author

Jenkins, retest this please (making sure, run 3/10)

Copy link
Member

@mkow mkow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 30 of 46 files at r3, 1 of 2 files at r4, 11 of 13 files at r6, 1 of 1 files at r7, 2 of 3 files at r8, 2 of 2 files at r9, 2 of 2 files at r10.
Reviewable status: all files reviewed, 52 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: ITL), "fixup! " found in commit messages' one-liners (waiting on @boryspoplawski, @dimakuv, and @yamahata)

a discussion (no related file):

Previously, boryspoplawski (Borys Popławski) wrote…

TODO: check if any LTP test can be enabled (probably after debugging IPC issue).

You fixed that pipes bug, I think you can check the LTPs now.



LibOS/shim/include/shim_internal.h, line 507 at r8 (raw file):

void delete_from_epoll_handles(struct shim_handle* handle);

void* allocate_stack(size_t size, size_t protect_size, bool user);

Could you document the arguments here?


LibOS/shim/include/shim_process.h, line 23 at r8 (raw file):

    IDTYPE vmid;

    int child_termination_signal;

Please explain this field in a comment (I know what it does, but it's not a well-known feature in Linux)


LibOS/shim/include/shim_process.h, line 68 at r9 (raw file):

/* Allocates a new child process structure, initializing all fields. */
struct shim_child_process* create_child_process(void);
/* Frees `child` with all acompanying resources. */

acompanying -> accompanying


LibOS/shim/include/shim_process.h, line 21 at r10 (raw file):

struct shim_child_process {
    IDTYPE pid;
    IDTYPE vmid;

Could you comment here what's this "vmid"?


LibOS/shim/include/shim_process.h, line 28 at r10 (raw file):

    int exit_code;
    int term_signal;
    IDTYPE uid;

Uid of what? I looks like it's not just the uid of the child process, because why would it be set only on the termination?


LibOS/shim/include/shim_thread.h, line 59 at r9 (raw file):

    LIST_TYPE(shim_thread) list;

    /* thread identifiers */

identifiers -> identifier


LibOS/shim/include/shim_thread.h, line 89 at r9 (raw file):

     * - `SIGNAL_NOT_HANDLED` - usually initialized to this - no signals were handled,
     * - `SIGNAL_HANDLED` - at least one signal was handled,
     * - `SIGNAL_HANDLED_RESTART` - same as above, but the signal had `SA_RESTART` flag.

Could you use an enum then?


LibOS/shim/include/shim_thread.h, line 143 at r9 (raw file):

void debug_setprefix(shim_tcb_t* tcb);

/* Set `debug_buf` for `tcb`. If `debug_buf` is `NULL`, then new one is allocated. IF `debug_buf`

IF -> If


LibOS/shim/src/bookkeep/shim_process.c, line 51 at r10 (raw file):

    int ret = path_lookupat(NULL, "/", 0, &dent, NULL);
    if (ret < 0) {
        debug("Could not setup dentry for \"/\", something is seriously broken.\n");

setup -> set up


LibOS/shim/src/bookkeep/shim_process.c, line 100 at r10 (raw file):

}

static bool mark_child_exited(child_cmp_t child_cmp, unsigned long arg, IDTYPE uid, int exit_code,

ditto - uid of what?


LibOS/shim/src/bookkeep/shim_process.c, line 157 at r10 (raw file):

        struct shim_thread* thread = wait_queue->thread;
        __atomic_store_n(&wait_queue->in_use, false, __ATOMIC_RELEASE);
        COMPILER_BARRIER();

Isn't that atomic operation above already a compiler barrier?


LibOS/shim/src/bookkeep/shim_process.c, line 215 at r10 (raw file):

    clear_lock(&new_process->children_lock);

    *(size_t*)((char*)new_process + sizeof(*new_process)) = children_count;

Could you maintain a pointer to the data and advance it, instead of calculating the offset again and again below?


LibOS/shim/src/bookkeep/shim_signal.c, line 645 at r10 (raw file):

    __UNUSED(context);
    siginfo_t info = {
        .si_signo = SIGINT,

umm, why is "suspend" function sending a SIGINT?


LibOS/shim/src/bookkeep/shim_thread.c, line 40 at r10 (raw file):

shim_signal_handles

Not handlers?


LibOS/shim/src/bookkeep/shim_thread.c, line 105 at r10 (raw file):

}

int init_thread(void) {

I'd prefer init_threading, now it sounds like a constructor for some thread structure.


LibOS/shim/src/bookkeep/shim_thread.c, line 175 at r10 (raw file):

static IDTYPE get_new_internal_tid(void) {
    IDTYPE idx = __atomic_add_fetch(&g_internal_tid_alloc_idx, 1, __ATOMIC_RELAXED);
    if (!is_internal_tid(idx)) {

How is this possible to ever fail in runtime?


LibOS/shim/src/bookkeep/shim_thread.c, line 204 at r10 (raw file):

    thread->stack     = cur_thread->stack;
    thread->stack_top = cur_thread->stack_top;
    thread->stack_red = cur_thread->stack_red;

same stack?


LibOS/shim/src/bookkeep/shim_thread.c, line 217 at r10 (raw file):

    unlock(&cur_thread->lock);

    thread->scheduler_event = DkNotificationEventCreate(PAL_TRUE);

missing error checking


LibOS/shim/src/bookkeep/shim_thread.c, line 277 at r10 (raw file):

        clear_signal_queue(&thread->signal_queue);

Isn't signal_altstack leaking here?


LibOS/shim/src/bookkeep/shim_thread.c, line 285 at r10 (raw file):

            DkObjectClose(thread->scheduler_event);
        }

what about wake_queue?


LibOS/shim/src/bookkeep/shim_thread.c, line 289 at r10 (raw file):

            release_ipc_id(thread->tid);
        }

is shim_tcb freed somewhere else?


LibOS/shim/src/bookkeep/shim_thread.c, line 508 at r10 (raw file):

    }

    thread->scheduler_event = DkNotificationEventCreate(PAL_TRUE);

missing error check


LibOS/shim/src/fs/shim_fs.c, line 240 at r10 (raw file):

        return ret;

double empty line?


LibOS/shim/src/fs/shim_fs.c, line 241 at r10 (raw file):

    char dir_cfg[CONFIG_MAX];

this will conflict with the current master (because of the change to TOML)


LibOS/shim/src/fs/shim_fs.c, line 254 at r10 (raw file):

        unlock(&g_process.fs_lock);
    }
    /* Otherwise `cwd` is already initialized.. */

double period?


LibOS/shim/src/fs/proc/thread.c, line 67 at r10 (raw file):

}

static int find_thread_link(const char* name, struct shim_qstr* link, struct shim_dentry** dentptr) {

please wrap


LibOS/shim/src/ipc/shim_ipc_child.c, line 40 at r10 (raw file):

int ipc_cld_exit_send(unsigned int exitcode, unsigned int term_signal) {
    if (!g_process.ppid) {
        /* We have no parent inside Graphene, so noone to notify. */

noone -> no one or none


LibOS/shim/src/ipc/shim_ipc_helper.c, line 647 at r10 (raw file):

/* Wake up the helper thread. */

Aren't we inside a function named "pause_ipc_helper"? Why are we waking anything up? Please clarify this comment.


LibOS/shim/src/ipc/shim_ipc_helper.c, line 658 at r10 (raw file):

    if (needs_wait) {
        /* Wait until the helper thread notices the stop event. */
        return wait_event(&helper_stopped_event);

Will this still work if two threads call pause_ipc_helper() at the same time?


LibOS/shim/src/sys/shim_alarm.c, line 14 at r10 (raw file):

#include "shim_signal.h"
#include "shim_table.h"
#include "shim_process.h"

please update sorting


LibOS/shim/src/sys/shim_alarm.c, line 25 at r10 (raw file):

        .si_code = SI_USER,
    };
    (void)kill_current_proc(&info);

could you at least print a warning, if it's not possible to handle the failure


LibOS/shim/src/sys/shim_exec.c, line 435 at r10 (raw file):

    struct shim_handle* old_exec = g_process.exec;
    g_process.exec = exec;

Why do we do this temporary change?


LibOS/shim/src/sys/shim_exit.c, line 80 at r10 (raw file):

            debug("failed to set up async cleanup_thread (exiting without clear child tid),"
                  " return code: %ld\n", ret);
            COMPILER_BARRIER();

Why is this needed here?


LibOS/shim/src/sys/shim_sigaction.c, line 246 at r10 (raw file):

    }

    struct signal_thread_arg arg = {

Why is this a structure, not just two separate args? Now it just mixes in and out args.


LibOS/shim/src/sys/shim_sigaction.c, line 282 at r10 (raw file):

}

int do_kill_pgroup(IDTYPE sender, IDTYPE pgid, int sig, bool use_ipc) {

All callsites have use_ipc == true, do we need this option?


LibOS/shim/src/sys/shim_sigaction.c, line 297 at r10 (raw file):

    if (current_pgid != pgid) {
        /* `ret` might have been set if IPC message was sent. */

I don't understand this comment, "might"? So, it can also be uninitialized? Or you maybe mean that it can only be set to non-zero if a message was sent?


LibOS/shim/src/sys/shim_wait.c, line 6 at r10 (raw file):

 */

#include <stddef.h> // to get missing size_t definition

missing in which header? also, I think it's missing only on 16.04, but I'm not sure


LibOS/shim/src/sys/shim_wait.c, line 83 at r10 (raw file):

    while (!seen) {
        DkEventClear(get_cur_thread()->scheduler_event);
        COMPILER_BARRIER();

Why this thing here?


LibOS/shim/src/sys/shim_wait.c, line 108 at r10 (raw file):

    if (options & WSTOPPED) {
        debug("Ignoring unsuported WSTOPPED flag to wait4\n");

unsuported -> unsupported (ditto for this whole file)


LibOS/shim/src/sys/shim_wait.c, line 169 at r10 (raw file):

        }

        /* There are some children we can wait for. */

I'd stick this to the previous code block, not this if, it's a bit confusing, looks like a comment for this if below.


LibOS/shim/src/sys/shim_wait.c, line 199 at r10 (raw file):

            DkEventClear(self->scheduler_event);
            COMPILER_BARRIER();

ditto

Copy link
Contributor Author

@boryspoplawski boryspoplawski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 30 of 49 files reviewed, 52 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: ITL), "fixup! " found in commit messages' one-liners (waiting on @boryspoplawski, @dimakuv, @mkow, and @yamahata)


LibOS/shim/include/shim_internal.h, line 507 at r8 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

Could you document the arguments here?

I did not add these functions, just moved them from shim_thread.h, because they are implemented in shim_init.c. I could try to check them up and write something here ...


LibOS/shim/include/shim_process.h, line 23 at r8 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

Please explain this field in a comment (I know what it does, but it's not a well-known feature in Linux)

Done.


LibOS/shim/include/shim_process.h, line 68 at r9 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

acompanying -> accompanying

Done.


LibOS/shim/include/shim_process.h, line 21 at r10 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

Could you comment here what's this "vmid"?

I have no idea what that is. All I know is that uniquely identifies a process and IPC code sometimes uses it.


LibOS/shim/include/shim_process.h, line 28 at r10 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

Uid of what? I looks like it's not just the uid of the child process, because why would it be set only on the termination?

This is uid of the child process and it's set on process termination, so that it can be reported in waitid.


LibOS/shim/include/shim_thread.h, line 59 at r9 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

identifiers -> identifier

Done.


LibOS/shim/include/shim_thread.h, line 89 at r9 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

Could you use an enum then?

Done.


LibOS/shim/include/shim_thread.h, line 143 at r9 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

IF -> If

Done.


LibOS/shim/src/bookkeep/shim_process.c, line 51 at r10 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

setup -> set up

Done.


LibOS/shim/src/bookkeep/shim_process.c, line 100 at r10 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

ditto - uid of what?

ditto


LibOS/shim/src/bookkeep/shim_process.c, line 157 at r10 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

Isn't that atomic operation above already a compiler barrier?

Afaik release only prevents code before it from being moved after and gives no such guarantees on the other direction, so I've put the barrier to be sure (we rely on correct ordering here).


LibOS/shim/src/bookkeep/shim_process.c, line 215 at r10 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

Could you maintain a pointer to the data and advance it, instead of calculating the offset again and again below?

Done.


LibOS/shim/src/bookkeep/shim_signal.c, line 645 at r10 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

umm, why is "suspend" function sending a SIGINT?

No idea?


LibOS/shim/src/bookkeep/shim_thread.c, line 40 at r10 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…
shim_signal_handles

Not handlers?

Idk, but I guess this struct has more data than just the handler.


LibOS/shim/src/bookkeep/shim_thread.c, line 105 at r10 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

I'd prefer init_threading, now it sounds like a constructor for some thread structure.

But it is exactly that: a constructor of the struct shim_thread for the main thread (+ creates one global lock).


LibOS/shim/src/bookkeep/shim_thread.c, line 175 at r10 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

How is this possible to ever fail in runtime?

In theory we could overflow IDTYPE (internal thread ids differ from normal just by highest bit set).


LibOS/shim/src/bookkeep/shim_thread.c, line 204 at r10 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

same stack?

Yup.


LibOS/shim/src/bookkeep/shim_thread.c, line 217 at r10 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

missing error checking

Done.


LibOS/shim/src/bookkeep/shim_thread.c, line 277 at r10 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

Isn't signal_altstack leaking here?

I think signal_altstack is allocated by user app, not Graphene.


LibOS/shim/src/bookkeep/shim_thread.c, line 285 at r10 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

what about wake_queue?

wake_queue is only meaningful when this thread is part of some wakeup queue (is just being woken up), which would imply ref_count > 0.


LibOS/shim/src/bookkeep/shim_thread.c, line 289 at r10 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

is shim_tcb freed somewhere else?

I think shim_tcb is provided by Pal. Generally speaking I tried not to touch shim_tcb as it could also use a cleanup and this PR grew quite a bit.


LibOS/shim/src/bookkeep/shim_thread.c, line 508 at r10 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

missing error check

Done.


LibOS/shim/src/fs/shim_fs.c, line 240 at r10 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

double empty line?

Done.


LibOS/shim/src/fs/shim_fs.c, line 241 at r10 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

this will conflict with the current master (because of the change to TOML)

And what should I do about it now? It will be fix it when rebasing.


LibOS/shim/src/fs/shim_fs.c, line 254 at r10 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

double period?

Done.


LibOS/shim/src/fs/proc/thread.c, line 67 at r10 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

please wrap

Done.


LibOS/shim/src/ipc/shim_ipc_child.c, line 40 at r10 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

noone -> no one or none

Done.


LibOS/shim/src/ipc/shim_ipc_helper.c, line 658 at r10 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

Will this still work if two threads call pause_ipc_helper() at the same time?

That's why it's wrapped in if (needs_wait), which is set under a lock (together with the ipc_helper_state change, which will be noticed by the second thread).


LibOS/shim/src/sys/shim_alarm.c, line 14 at r10 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

please update sorting

Done.


LibOS/shim/src/sys/shim_alarm.c, line 25 at r10 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

could you at least print a warning, if it's not possible to handle the failure

Done.


LibOS/shim/src/sys/shim_exec.c, line 435 at r10 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

Why do we do this temporary change?

Because g_process is passed as process description to create_process_and_send_checkpoint below.


LibOS/shim/src/sys/shim_exit.c, line 80 at r10 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

Why is this needed here?

Hmm, it should not be here ... (maybe a leftover from a intermediate version?)


LibOS/shim/src/sys/shim_sigaction.c, line 246 at r10 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

Why is this a structure, not just two separate args? Now it just mixes in and out args.

Because it's an argument to a callback in walk_thread_list.


LibOS/shim/src/sys/shim_sigaction.c, line 282 at r10 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

All callsites have use_ipc == true, do we need this option?

I guess not. Done.


LibOS/shim/src/sys/shim_sigaction.c, line 297 at r10 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

I don't understand this comment, "might"? So, it can also be uninitialized? Or you maybe mean that it can only be set to non-zero if a message was sent?

I meant that we cannot just return -ESRCH here (as the condition would suggest), because ipc call above could set ret to 0, which I hope would mean that something got really killed (but that probably just means that the message was sent).
Anyway, removed it as it is not needed anymore.


LibOS/shim/src/sys/shim_wait.c, line 6 at r10 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

missing in which header? also, I think it's missing only on 16.04, but I'm not sure

linux/signal.h iirc. Even if only on 16.04, it's still supported.


LibOS/shim/src/sys/shim_wait.c, line 83 at r10 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

Why this thing here?

The same reason as in shim_compiler.c, lets keep the discussion there.


LibOS/shim/src/sys/shim_wait.c, line 108 at r10 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

unsuported -> unsupported (ditto for this whole file)

Done.


LibOS/shim/src/sys/shim_wait.c, line 169 at r10 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

I'd stick this to the previous code block, not this if, it's a bit confusing, looks like a comment for this if below.

Done.


LibOS/shim/src/sys/shim_wait.c, line 199 at r10 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

ditto

ditto

Copy link
Contributor Author

@boryspoplawski boryspoplawski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 30 of 49 files reviewed, 52 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: ITL), "fixup! " found in commit messages' one-liners (waiting on @boryspoplawski, @dimakuv, @mkow, and @yamahata)


LibOS/shim/src/ipc/shim_ipc_helper.c, line 647 at r10 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…
/* Wake up the helper thread. */

Aren't we inside a function named "pause_ipc_helper"? Why are we waking anything up? Please clarify this comment.

Done.

yamahata
yamahata previously approved these changes Nov 16, 2020
Copy link
Contributor

@yamahata yamahata left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 3 files at r8, 2 of 2 files at r9, 18 of 18 files at r11, 1 of 1 files at r12.
Reviewable status: all files reviewed, 46 unresolved discussions, "fixup! " found in commit messages' one-liners (waiting on @boryspoplawski, @dimakuv, and @mkow)


LibOS/shim/src/bookkeep/shim_process.c, line 59 at r6 (raw file):

Previously, boryspoplawski (Borys Popławski) wrote…

Done.

Oh, that's complicated. Maybe we can defer that issue as follow up PR.


LibOS/shim/src/sys/shim_clone.c, line 173 at r12 (raw file):

     * even without any locks. Note this is a shallow copy, so `shim_tcb.context.regs` will be
     * shared with the parent. */
    shim_tcb.context.regs = self->shim_tcb->context.regs;

Do we want to copy regs twice?
Probably it's just micro-optimization, it wouldn't affect clone performance.
So I don't block it here.

Copy link
Member

@mkow mkow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 18 of 18 files at r11, 1 of 1 files at r12.
Reviewable status: all files reviewed, 14 unresolved discussions, "fixup! " found in commit messages' one-liners (waiting on @boryspoplawski and @dimakuv)


LibOS/shim/include/shim_internal.h, line 507 at r8 (raw file):

Previously, boryspoplawski (Borys Popławski) wrote…

I did not add these functions, just moved them from shim_thread.h, because they are implemented in shim_init.c. I could try to check them up and write something here ...

Ok, fine then.


LibOS/shim/src/bookkeep/shim_process.c, line 100 at r10 (raw file):

Previously, boryspoplawski (Borys Popławski) wrote…

ditto

Could you rename it to child_uid then? It's not obvious from the context here which uid is this.


LibOS/shim/src/bookkeep/shim_process.c, line 157 at r10 (raw file):

Previously, boryspoplawski (Borys Popławski) wrote…

Afaik release only prevents code before it from being moved after and gives no such guarantees on the other direction, so I've put the barrier to be sure (we rely on correct ordering here).

I rather thought it has function call semantics and can't be reordered with other calls. But I'm not sure.


LibOS/shim/src/bookkeep/shim_signal.c, line 645 at r10 (raw file):

Previously, boryspoplawski (Borys Popławski) wrote…

No idea?

:/


LibOS/shim/src/bookkeep/shim_thread.c, line 40 at r10 (raw file):

Previously, boryspoplawski (Borys Popławski) wrote…

Idk, but I guess this struct has more data than just the handler.

Isn't it just a list of handlers with a lock+refcount? They aren't any kind of handles.


LibOS/shim/src/bookkeep/shim_thread.c, line 105 at r10 (raw file):

Previously, boryspoplawski (Borys Popławski) wrote…

But it is exactly that: a constructor of the struct shim_thread for the main thread (+ creates one global lock).

Then it should be init_first_thread. The bare init_thread is confusing, because "which thread?".


LibOS/shim/src/bookkeep/shim_thread.c, line 204 at r10 (raw file):

Previously, boryspoplawski (Borys Popławski) wrote…

Yup.

Doesn't sound like this makes any sense. Could you explain it?


LibOS/shim/src/bookkeep/shim_thread.c, line 277 at r10 (raw file):

Previously, boryspoplawski (Borys Popławski) wrote…

I think signal_altstack is allocated by user app, not Graphene.

Could you add a comment here explaining why it's skipped?


LibOS/shim/src/bookkeep/shim_thread.c, line 285 at r10 (raw file):

Previously, boryspoplawski (Borys Popławski) wrote…

wake_queue is only meaningful when this thread is part of some wakeup queue (is just being woken up), which would imply ref_count > 0.

ok, but ditto - please put this into a comment


LibOS/shim/src/fs/shim_fs.c, line 241 at r10 (raw file):

Previously, boryspoplawski (Borys Popławski) wrote…

And what should I do about it now? It will be fix it when rebasing.

Nothing yet, just a heads up + a note to me to carefully review this conflict afterwards.


LibOS/shim/src/sys/shim_exec.c, line 435 at r10 (raw file):

Previously, boryspoplawski (Borys Popławski) wrote…

Because g_process is passed as process description to create_process_and_send_checkpoint below.

Could you comment this? Was quite confusing for me.


LibOS/shim/src/sys/shim_wait.c, line 6 at r10 (raw file):

Previously, boryspoplawski (Borys Popławski) wrote…

linux/signal.h iirc. Even if only on 16.04, it's still supported.

I meant, please clarify this in the comment so that we won't need to debug this again and just remove after we get rid of 16.04.

@dimakuv
Copy link
Contributor

dimakuv commented Nov 19, 2020

Jenkins, test this please (run 5/10; previous run failed only on Jenkins-Debug-18.04 with clock_gettime01 failure)

dimakuv
dimakuv previously approved these changes Nov 19, 2020
Copy link
Contributor

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 4 of 4 files at r17, 2 of 2 files at r18.
Reviewable status: all files reviewed, 3 unresolved discussions (waiting on @boryspoplawski)

a discussion (no related file):

Previously, boryspoplawski (Borys Popławski) wrote…

TODO: reorder commits so that tty fix is before threads commit

Yes, the fix looks legit.



Pal/src/db_main.c, line 436 at r18 (raw file):

        if (!parent_process) {
            printf("WARNING: Using insecure argv source. Don't use this configuration in "
                   "production!\n");

Thanks, this is much better :)

@mkow
Copy link
Member

mkow commented Nov 19, 2020

Jenkins, retest this please (run 6/10; previous run failed on Jenkins-18.04 and Jenkins-Debug with LTP.fork07: -> FAIL (returncode=1))

@mkow
Copy link
Member

mkow commented Nov 19, 2020

Jenkins, retest this please (run 7/10; previous run failed on Jenkins-18.04 (truncate01, truncate01_64, truncate02 and truncate02_64 timed out) and Jenkins (futex_wait02: -> ERROR (Timed out after 30.0 s.)))

@mkow
Copy link
Member

mkow commented Nov 19, 2020

Jenkins, retest this please (run 8/10; previous run failed on Jenkins-Debug-18.04 with fork07: -> FAIL (returncode=1) and timeouts on sendfile08 and sendfile08_64)

@mkow
Copy link
Member

mkow commented Nov 19, 2020

Jenkins, retest this please (run 9/10; previous run failed on Jenkins with fork07: -> FAIL (returncode=1))

@boryspoplawski boryspoplawski dismissed stale reviews from dimakuv and mkow via cfc64bb November 19, 2020 16:35
Copy link
Contributor Author

@boryspoplawski boryspoplawski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r19.
Reviewable status: all files reviewed, 2 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: ITL), "fixup! " found in commit messages' one-liners (waiting on @boryspoplawski)

yamahata
yamahata previously approved these changes Nov 19, 2020
Copy link
Contributor

@yamahata yamahata left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 7 files at r13, 1 of 5 files at r15, 13 of 15 files at r16, 4 of 4 files at r17, 1 of 2 files at r18, 1 of 1 files at r19.
Reviewable status: all files reviewed, 2 unresolved discussions, "fixup! " found in commit messages' one-liners (waiting on @boryspoplawski)

Sometimes we need to temporarily stop IPC helper thread from receiving
more messages, e.g. when doing execve just before migrating exited (but
not yet waited for) children list.
If a poll is done on a pipe handle and TLS handshake is still in
progress, poll might report that there is data available to be read from
the handle, while in reality only available data is TLS handshake, which
will not be visible to the caller of `pipe_attrquerybyhdl` (poll).
Currently the only supported device is tty, which is just an alias to
stdin and stdout. Graphene simply assigns host `0` and `1` fds on tty
open (without duplication), so we should not close.
Most important differences from the old version:
- strip global process information from the thread struct into a
  dedicated one,
- a parent is informed about the child death when the whole process
  (the last thread) dies (not on each thread exit),
- all threads have the same parent (spawning thread is NOT the parent of
  the spawned thread),
- a thread is able to wait on children created by another thread,
- a process is able to wait for exited children after execve,
- rewritten `waitid` implementation (no more gotos, supports __WCLONE
  and friends flags),
- added option for syscall restarting, for now used only in `waitid`.

Additionally various bugfixes, cleanups and missing locks added.
Copy link
Contributor Author

@boryspoplawski boryspoplawski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 14 of 14 files at r20.
Reviewable status: all files reviewed, all discussions resolved, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: ITL)

Copy link
Contributor

@yamahata yamahata left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 14 of 14 files at r20.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved

Copy link
Member

@mkow mkow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r19, 14 of 14 files at r20.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved

@boryspoplawski
Copy link
Contributor Author

Jenkins, retest Jenkins-Debug-18.04 please (clock_gettime01 failed, cannot trigger locally, will need to investigate further)

@boryspoplawski
Copy link
Contributor Author

boryspoplawski commented Nov 19, 2020

Added an issue to track flaky clock_gettime01: #1984

@boryspoplawski boryspoplawski merged commit 8ff7907 into master Nov 19, 2020
@boryspoplawski boryspoplawski deleted the borys/fun_with_threads branch November 19, 2020 19:58
@pwmarcz pwmarcz mentioned this pull request Nov 20, 2020
@mkow mkow mentioned this pull request Dec 3, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.