Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gdbremote: Initial (and minimal) support for remote debugging #444

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

daniel-thompson
Copy link

Both usage and limitations are described in docs/advanced_usage.rst.

Testing is been but follows pretty much all the new code paths:

  1. the reported register values were compared between gdb and drgn
  2. x0 (argc) and x1 (argv) were checked and the pointers chased to verify that argv[0] contains the right value

At present I have not implemented any new automatic tests. A mock gdbserver that looks up replies from a python dictionary should be fairly easy to put togther but I wanted to make what I've got available for code review first.

@daniel-thompson daniel-thompson force-pushed the gdbremote branch 5 times, most recently from b81f461 to 2928020 Compare October 14, 2024 21:17
@daniel-thompson
Copy link
Author

At present I have not implemented any new automatic tests. A mock gdbserver that looks up replies from a python dictionary should be fairly easy to put togther but I wanted to make what I've got available for code review first.

The automatic tests are now implemented and are passing the CI tests (took me a couple of goes to get things running well on i686 but it's all good now).

@osandov
Copy link
Owner

osandov commented Oct 14, 2024

This is super cool, thank you! I'm happy with this functionality as a starting point. I only skimmed the code so far, so I'll have to give it a proper review in the next couple of days.

BTW, I have a new guy on my team taking a stab at implementing the vmcoreinfo query packet (https://github.com/osandov/drgn/wiki/gdbstub-protocol-proposal:-linux.vmcoreinfo-query-packet) that will make it possible to support KASLR, so hopefully you'll be seeing patches for that soon.

Copy link
Owner

@osandov osandov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really appreciate this! I left several comments, mostly questions to help think about how to extend this further.

Comment on lines +237 to +238
// gdbremote uses the same binary format as struct user_pt_reg
// so we can just reuse that code.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that true in general or only for some architectures?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have to admit I'm not entirely sure. kgdb uses lookup tables to convert from gdb format to struct (plain, ordinary, not-user) pt_reg. Hopefully we do that for a reason although I haven't, as yet, gone digging to find out why.

"""
Set the program to the specificed elffile and connect to a gdbserver.

:param conn: gdb connection string (e.g. localhost:2345)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll likely also want to support serial devices and Unix socket domain sockets natively in the future. This interface should still be sufficient, since we could differentiate by:

  1. If conn starts with / or ., it must be a path, and we can check whether the path is a socket or a character device.
  2. If conn looks like a host:port, assume it is a TCP connection. If a user has a path that looks like a host:port, they can prefix it with ./ to disambiguate it.
  3. Otherwise, assume it is a path.

(Nothing for you to act on here, I'm just thinking ahead.)

Copy link
Author

@daniel-thompson daniel-thompson Oct 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'd have to double check but gdb's heuristic looks even simpler than that!

libdrgn/platform.h Outdated Show resolved Hide resolved
libdrgn/program.c Outdated Show resolved Hide resolved
libdrgn/program.c Outdated Show resolved Hide resolved
Comment on lines +1130 to +1163
prog->main_thread =
drgn_thread_set_search(&prog->thread_set, &thread.tid).entry;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the thread set is the best fit for this, since the available threads are dynamic based on whatever the remote server tells us, right?

Another thing I'm curious about, does the remote protocol have a notion of a main or current thread? From what I can tell, you're free to start and stop threads at will via protocol commands, right?

(I'm assuming you just did it this way to make it easy for the minimal initial support, so I'm mainly asking to think about how the drgn API may need to be extended to represent remote protocol concepts.)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the thread set is the best fit for this, since the available threads are dynamic based on whatever the remote server tells us, right?

I think it would only be dynamic if we put the remote side into non-stop mode. Otherwise when we are talking to the remote then the whole system is stopped. In that case I figured using the thread_set would allow us to cache things (which will probably be desirable on a 115200 baud link). Having said that if we needed to cache more aggressively (including agressive lazyiness) we would probably need a different approach.

Another thing I'm curious about, does the remote protocol have a notion of a main or current thread? From what I can tell, you're free to start and stop threads at will via protocol commands, right?

Likewise, that can happen in non-stop mode but I'd suggest the initial development focus should be for all-stop mode, if only because that "feels" much like a core dump, only with specialist memory and register options.

@@ -1131,6 +1199,9 @@ drgn_thread_iterator_destroy(struct drgn_thread_iterator *it)
if (it->prog->flags & DRGN_PROGRAM_IS_LINUX_KERNEL) {
drgn_object_deinit(&it->entry.object);
linux_helper_task_iterator_deinit(&it->task_iter);
} else if (it->prog->flags & DRGN_PROGRAM_IS_GDBREMOTE) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The various combinations of flags are starting to get out of hand, but I'll leave that as a followup for me to clean up.

libdrgn/gdbremote.c Outdated Show resolved Hide resolved
@daniel-thompson
Copy link
Author

Thanks for the review. I think I've fixed all the actionable bits. Let me know if there is anything more!

@brenns10
Copy link
Contributor

Hi @daniel-thompson, I'm excited to see this after our discussion at LPC!

I have yet to dive too far into the code. I actually wanted to take a "testing-first" approach here, so I cherry-picked your changes onto the main branch and tested it out. I tried it on a rather ambitious first test case: the drgn vmtest kernel running within GDB.

I launched it like so:

python3.13 -m vmtest.vm --qemu-options '-gdb tcp::4000'

And then tried to attach to it with:

$ python3.13 -m drgn --gdbremote localhost:4000 -s build/vmtest/x86_64/kernel-6.12.0-rc5-vmtest34.1default/build/vmlinux
drgn 0.0.29+24.g2a8eff01 (using Python 3.13.0, elfutils 0.190, with libkdumpfile)
For help, type help(drgn).
>>> import drgn
>>> from drgn import FaultError, NULL, Object, alignof, cast, container_of, execscript, implicit_convert, offsetof, reinterpret, sizeof, stack_trace
>>> from drgn.helpers.common import *
>>> prog.read(0, 64)
Traceback (most recent call last):
  File "<python-input-0>", line 1, in <module>
    prog.read(0, 64)
    ~~~~~~~~~^^^^^^^
Exception: cannot read past end of gdbremote packet
>>>

I understand that KASLR is not supported, which is totally fine for now, but think I expected to see the ability to read from arbitrary virtual memory addresses (or physical, which I also tested to similar results).

I also went ahead and exited my drgn, only to find that the QEMU instance was still hung. I assume the VM was still halted?

I'll definitely take a look into the code soon as well, though I'm not nearly the reviewer that @osandov is. But I wanted to share the above so you could try it out or point out the error in my ways :)

@daniel-thompson
Copy link
Author

I understand that KASLR is not supported, which is totally fine for now, but think I expected to see the ability to read from arbitrary virtual memory addresses (or physical, which I also tested to similar results).

I'll take a look. I suspect what you are seeing is a memory fault that hasn't been translated well (since we don't yet decode error packets). However whatever it is I'll take a look, adding error packet handling was next on my list anyway. I'm deliberately trying to keep each step as small as the next (useful) step can possibly be since I have to work on this in fits and starts.

For physical access you should have received an exception since there is no support for physical memory access... but it shouldn't have been cryptic ("Cannot read from physical memory at 0").

I also went ahead and exited my drgn, only to find that the QEMU instance was still hung. I assume the VM was still halted?

drgn doesn't issue a detach packet on exit meaning QEMU should remain in the same run state it was when we connected (drgn also won't stop the target if it is running). Adding detach support would be trivial but it sits outside the minimum useful initial set I was aiming for (and IMHO it makes more sense once there is support for stopping the target during connect).

@brenns10
Copy link
Contributor

Sorry, you're right, I can read valid virtual memory addresses with no issues. And if I boot with nokaslr then drgn reads variables with no issues! Plus I get that using the kernel as a test is a bit silly given that so far there's no linux-specific support. If I had read the code first I would have seen that.

I do still see the behavior that as soon as drgn attaches to QEMU's GDB, the VM seems to be halted and it doesn't respond to any inputs. Even after I exit drgn. So I think QEMU may be automatically stopping the guest when a connection is accepted.

Introduce the basic infrastructure to communicate with remote debuggers
using the gdbremote protocol. See docs/advanced_usage.rst for both
instructions on usages and a summary of the current limitations.

Testing is been fairly modest but does cover pretty much all the new
code paths:

1. the reported register values were compared between gdb and drgn
2. frame pointer based (fallback) stack tracing
3. x0 (argc) and x1 (argv) were checked and the pointers chased
   to verify that argv[0] contains the right value

Signed-off-by: Daniel Thompson <[email protected]>
Add logic to detect error messages issued by the gdbserver and to
convert them into exceptions.

At the python prompt errors will be shown in the following way (this
example has verbose protocol debugging enabled to we get to see the
underlying gdbremote packets too):

~~~
>>> prog.read(0, 16)
=> $m0,16#30
<= $E01#a6
Traceback (most recent call last):
  File "<console>", line 1, in <module>
Exception: gdbserver reported error: 1
~~~

Signed-off-by: Daniel Thompson <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants