Skip to content

Conversation

@ghost
Copy link

@ghost ghost commented Jan 12, 2022

Building off of the work of #129, this PR (when completed) will implement and extend the Threads API outlined in #92 for live userspace processes. This will bring drgn one step closer to fulfilling the traditional role of a userspace debugger, most notably adding the ability to pause and resume threads at will during the execution of the program.

As this is still a work in progress, this PR is in draft mode.

@ghost ghost force-pushed the live-userspace-threads branch from 4a162a4 to a405278 Compare January 13, 2022 19:47
@osandov osandov mentioned this pull request Jan 14, 2022
@ghost ghost force-pushed the live-userspace-threads branch 4 times, most recently from dd03f12 to df716f4 Compare January 14, 2022 17:42
@ghost ghost force-pushed the live-userspace-threads branch 7 times, most recently from 6d933a6 to 9ee902e Compare January 21, 2022 00:09
@ghost
Copy link
Author

ghost commented Jan 21, 2022

@osandov Presently this is now a MVP of the feature. The core interface works well, has test cases, and is passing all of the existing tests. Due to the complexity and subtly of ptrace though (e.g. the 4 different types of ptrace-stop states, signal queuing, etc.), it's likely there are as-of-yet undiscovered edge-cases. Right now only x86 is supported, but adding other architectures should be relatively straightforward.

Previously, `fork_and_pause` was only used in the Linux helper tests
cases. However, as many of the test cases for the upcoming live
userspace process support for the Threads API will likely need this same
utility, I've moved it to the root of the tests module.

Signed-off-by: Kevin Svetlitski <[email protected]>
PR osandov#129 introduced the Threads API to drgn, but did not include support
for live userspace processes. This commit changes that, both adding
support for the existing Threads API, as well as extending it with a
few additional methods which only make sense in the context of a
live process, most notably `Thread.pause()` and `Thread.resume()`

At present, only x86 is supported, but the plan is to support other
architectures in the near future.

Signed-off-by: Kevin Svetlitski <[email protected]>
@ghost ghost force-pushed the live-userspace-threads branch from 9ee902e to eea2566 Compare January 24, 2022 18:57
@osandov
Copy link
Owner

osandov commented Feb 2, 2022

Sorry for the delay. I've been busy with other work, but this has been in the back of my head for awhile while I though about what the interface should look like. Thanks to your experimentation and MVP, I think we uncovered a few issues that we should consider:

  1. ptrace hijacks threads exiting and receiving signals by default. I can imagine use cases for drgn that want this, and use cases that want the process to run undisturbed. So, I think that we should have interfaces to explicitly attach to and detach from individual threads, attach to all current and future threads, and detach from all threads. By default, we probably shouldn't attach to anything at the library level; we may or may not want the CLI to attach to everything by default.
  2. ptrace is inherently event-based. This is pretty fundamental to debugging an external program: the program is going to asynchronously hit conditions that we'd like to react to. So, I think our API should be event-based as well. Specifically, I think we can have an interface that waits for the next event on a specific thread (example use case: I just sent a signal to a thread and I want to wait until that thread stops for that signal) and an interface that waits for the next event on any thread (example use case: I set a breakpoint and now I want the program to run until any thread hits it). It'd also be nice to have non-blocking versions of those (i.e., return and event immediately if it's available, otherwise bail).
  3. ptrace supports various kinds of events, some of which are optional. We might want to have a way to enable/disable those optional ones, although I'm fine leaving that out for now. I think by default we want clone/fork/vfork, exec, and exit (which I believe is everything that ptrace supports other than seccomp).

This is a rough sketch of what I'm picturing:

class Program:
    def attach_all_threads(self) -> None:
        """
        Attach to all threads in this program as well as any new threads that
        are created.
        """
        ...

    def detach_all_threads(self) -> None:
        """Detach from all threads in this program."""
        ...

    def get_thread_event(self, block: bool = True) -> ThreadEvent:
        """
        Get the next event for any thread in this program. If *block* is
        ``True``, wait for the next event. If *block* is ``False``, raises an
        exception if no event is available (TODO: which one? Non-blocking
        sockets seem to return an OSError with errno set appropriately).
        """
        ...


class Thread:
    def attach(self) -> bool:
        """
        Attach to this thread.

        :return: ``True`` on success, ``False`` if the thread was already attached
        """
        ...

    def detach(self) -> bool:
        """
        Detach from this thread.

        :return: ``True`` on success, ``False`` if the thread was not attached
        """
        ...

    def interrupt(self) -> None:
        """
        """
        ...

    def continue(self) -> bool:
        """
        If this thread is currently stopped, resume it.

        :return: ``True`` on success, ``False`` if the thread was not stopped
        """
        ...

    def get_event(self, block: bool = True) -> ThreadEvent:
        """
        Get the next event for this thread.
        """
        ...

Then, we can have a dumb ThreadEvent object for different types of events. Maybe something like:

ThreadEvent = Union[ThreadEventSignal, ThreadEventExit, etc...]

class ThreadEventSignal:
    signal: int

class ThreadEventExit:
    status: int

etc...

The above (along with the equivalent libdrgn interfaces) is more or less what I'd consider the MVP for this design. There are obviously lots of finer details to consider, as well as complications with ptrace that I may not have considered.

There are some "extras" that we should implement eventually, possibly in follow-up PRs:

  • We might still want a shortcut Thread.pause(), which I think would basically be:
def pause(self):
    self.interrupt()
    while True:
        event = self.get_event()
        if isinstance(event, ThreadEventStop):
            return
        # TODO: should we have a distinction between "stopping" events that
        # require a continue and "non-stopping" events?
        self.continue()  
  • ptrace allows suppressing signals that the program received. Should we have a suppress_signal: bool = False parameter to continue()?
  • How would we expose PTRACE_SYSCALL and PTRACE_SINGLESTEP? More parameters for continue()?

This is definitely larger in scope than the basic pause and resume API from before, but I'm hoping that it makes drgn useful for a much more diverse set of tools. I'd love to hear your thoughts.

osandov added a commit that referenced this pull request Feb 12, 2022
This implements the existing thread API methods for live processes other
than drgn_thread_stack_trace(). It also doesn't yet add support for
full-blown tracing, but it at least brings live processes to feature
parity. This is taken from the non-ptrace parts of Kevin Svetlitski's
PR #142, with some modifications.

Signed-off-by: Omar Sandoval <[email protected]>
@osandov
Copy link
Owner

osandov commented Apr 28, 2022

I wrote this project up as an issue, so I'm closing this one. Thanks again for getting this started!

@osandov osandov closed this Apr 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants