Live userspace process support for the Threads API #142

ghost · 2022-01-12T19:23:01Z

Building off of the work of #129, this PR (when completed) will implement and extend the Threads API outlined in #92 for live userspace processes. This will bring drgn one step closer to fulfilling the traditional role of a userspace debugger, most notably adding the ability to pause and resume threads at will during the execution of the program.

As this is still a work in progress, this PR is in draft mode.

ghost · 2022-01-21T00:18:10Z

@osandov Presently this is now a MVP of the feature. The core interface works well, has test cases, and is passing all of the existing tests. Due to the complexity and subtly of ptrace though (e.g. the 4 different types of ptrace-stop states, signal queuing, etc.), it's likely there are as-of-yet undiscovered edge-cases. Right now only x86 is supported, but adding other architectures should be relatively straightforward.

Previously, `fork_and_pause` was only used in the Linux helper tests cases. However, as many of the test cases for the upcoming live userspace process support for the Threads API will likely need this same utility, I've moved it to the root of the tests module. Signed-off-by: Kevin Svetlitski <[email protected]>

PR osandov#129 introduced the Threads API to drgn, but did not include support for live userspace processes. This commit changes that, both adding support for the existing Threads API, as well as extending it with a few additional methods which only make sense in the context of a live process, most notably `Thread.pause()` and `Thread.resume()` At present, only x86 is supported, but the plan is to support other architectures in the near future. Signed-off-by: Kevin Svetlitski <[email protected]>

osandov · 2022-02-02T22:22:38Z

Sorry for the delay. I've been busy with other work, but this has been in the back of my head for awhile while I though about what the interface should look like. Thanks to your experimentation and MVP, I think we uncovered a few issues that we should consider:

ptrace hijacks threads exiting and receiving signals by default. I can imagine use cases for drgn that want this, and use cases that want the process to run undisturbed. So, I think that we should have interfaces to explicitly attach to and detach from individual threads, attach to all current and future threads, and detach from all threads. By default, we probably shouldn't attach to anything at the library level; we may or may not want the CLI to attach to everything by default.
ptrace is inherently event-based. This is pretty fundamental to debugging an external program: the program is going to asynchronously hit conditions that we'd like to react to. So, I think our API should be event-based as well. Specifically, I think we can have an interface that waits for the next event on a specific thread (example use case: I just sent a signal to a thread and I want to wait until that thread stops for that signal) and an interface that waits for the next event on any thread (example use case: I set a breakpoint and now I want the program to run until any thread hits it). It'd also be nice to have non-blocking versions of those (i.e., return and event immediately if it's available, otherwise bail).
ptrace supports various kinds of events, some of which are optional. We might want to have a way to enable/disable those optional ones, although I'm fine leaving that out for now. I think by default we want clone/fork/vfork, exec, and exit (which I believe is everything that ptrace supports other than seccomp).

This is a rough sketch of what I'm picturing:

class Program:
    def attach_all_threads(self) -> None:
        """
        Attach to all threads in this program as well as any new threads that
        are created.
        """
        ...

    def detach_all_threads(self) -> None:
        """Detach from all threads in this program."""
        ...

    def get_thread_event(self, block: bool = True) -> ThreadEvent:
        """
        Get the next event for any thread in this program. If *block* is
        ``True``, wait for the next event. If *block* is ``False``, raises an
        exception if no event is available (TODO: which one? Non-blocking
        sockets seem to return an OSError with errno set appropriately).
        """
        ...


class Thread:
    def attach(self) -> bool:
        """
        Attach to this thread.

        :return: ``True`` on success, ``False`` if the thread was already attached
        """
        ...

    def detach(self) -> bool:
        """
        Detach from this thread.

        :return: ``True`` on success, ``False`` if the thread was not attached
        """
        ...

    def interrupt(self) -> None:
        """
        """
        ...

    def continue(self) -> bool:
        """
        If this thread is currently stopped, resume it.

        :return: ``True`` on success, ``False`` if the thread was not stopped
        """
        ...

    def get_event(self, block: bool = True) -> ThreadEvent:
        """
        Get the next event for this thread.
        """
        ...

Then, we can have a dumb ThreadEvent object for different types of events. Maybe something like:

ThreadEvent = Union[ThreadEventSignal, ThreadEventExit, etc...]

class ThreadEventSignal:
    signal: int

class ThreadEventExit:
    status: int

etc...

The above (along with the equivalent libdrgn interfaces) is more or less what I'd consider the MVP for this design. There are obviously lots of finer details to consider, as well as complications with ptrace that I may not have considered.

There are some "extras" that we should implement eventually, possibly in follow-up PRs:

We might still want a shortcut Thread.pause(), which I think would basically be:

def pause(self):
    self.interrupt()
    while True:
        event = self.get_event()
        if isinstance(event, ThreadEventStop):
            return
        # TODO: should we have a distinction between "stopping" events that
        # require a continue and "non-stopping" events?
        self.continue()

ptrace allows suppressing signals that the program received. Should we have a suppress_signal: bool = False parameter to continue()?
How would we expose PTRACE_SYSCALL and PTRACE_SINGLESTEP? More parameters for continue()?

This is definitely larger in scope than the basic pause and resume API from before, but I'm hoping that it makes drgn useful for a much more diverse set of tools. I'd love to hear your thoughts.

This implements the existing thread API methods for live processes other than drgn_thread_stack_trace(). It also doesn't yet add support for full-blown tracing, but it at least brings live processes to feature parity. This is taken from the non-ptrace parts of Kevin Svetlitski's PR #142, with some modifications. Signed-off-by: Omar Sandoval <[email protected]>

osandov · 2022-04-28T19:21:49Z

I wrote this project up as an issue, so I'm closing this one. Thanks again for getting this started!

ghost force-pushed the live-userspace-threads branch from 4a162a4 to a405278 Compare January 13, 2022 19:47

osandov mentioned this pull request Jan 14, 2022

Add name to drgn.Thread #143

Closed

ghost force-pushed the live-userspace-threads branch 4 times, most recently from dd03f12 to df716f4 Compare January 14, 2022 17:42

osandov mentioned this pull request Jan 14, 2022

Add drgn.Program.main_thread() #144

Closed

ghost force-pushed the live-userspace-threads branch 7 times, most recently from 6d933a6 to 9ee902e Compare January 21, 2022 00:09

Svetlitski added 2 commits January 24, 2022 10:56

ghost force-pushed the live-userspace-threads branch from 9ee902e to eea2566 Compare January 24, 2022 18:57

osandov mentioned this pull request Apr 28, 2022

Add thread attach/detach/event operations with ptrace #171

Open

osandov closed this Apr 28, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Live userspace process support for the Threads API #142

Live userspace process support for the Threads API #142

Uh oh!

ghost commented Jan 12, 2022

Uh oh!

ghost commented Jan 21, 2022

Uh oh!

osandov commented Feb 2, 2022

Uh oh!

osandov commented Apr 28, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Live userspace process support for the Threads API #142

Live userspace process support for the Threads API #142

Uh oh!

Conversation

ghost commented Jan 12, 2022

Uh oh!

ghost commented Jan 21, 2022

Uh oh!

osandov commented Feb 2, 2022

Uh oh!

osandov commented Apr 28, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants