Support for split debuginfo #287

philipc · 2020-02-13T07:59:43Z

As mentioned in rust-lang/rust#34651 (comment), it would be nice if we could generate backtraces when using split debuginfo.

Locating the split debuginfo is something that is useful for all tools that consume DWARF, so I think it makes sense to implement this in another crate. In particular, I think it makes sense to implement this in the object crate (perhaps in its SymbolMap), because the location of the object files is specified in the symbol table entries for Mach-O (some discussion in gimli-rs/addr2line#87). I haven't investigated split DWARF for ELF yet though, so maybe that will change things.

There is also the moria crate, but I think it is targeted at stripped debug info that has been installed in a separate location (post dsymutil), rather than the unlinked DWARF (prior to running dsymutil). Maybe we should be handling that too though?

The gimli support previously used the object crate, but was changed to goblin in d3fb904. Would it be okay for me to change it back to object again? Note that git versions of object no longer depend on goblin, so I don't think the original reason for the change applies anymore. The reason for preferring object rather than goblin is because it is part of the gimli-rs project, so it is primarily targeted at handling debug info, and many existing consumers of DWARF are already using it. Plus I'm more motivated to work on object than goblin :)

cc @fitzgen

The text was updated successfully, but these errors were encountered:

alexcrichton · 2020-02-13T17:46:19Z

I'm all for anything needed to support this!

Awhile back I tried to prune the dependencies needed for the gimli-symbolize feature but there was no real reason to prefer one crate over the other. In the hope of eventually turning this on in libstd I figured it'd just be easiest if there were as few deps as possible, but hey if we need something from a dependency then we need it and should pull it in!

luser · 2020-07-22T13:34:20Z

There is also the moria crate, but I think it is targeted at stripped debug info that has been installed in a separate location (post dsymutil), rather than the unlinked DWARF (prior to running dsymutil). Maybe we should be handling that too though?

You're correct that my initial intention with moria was to handle dSYMs on macOS + the historical "ELF debug info stripped into a separate file" case. Making it handle split debuginfo feels like it might not fit quite right though, since currently the API is "give me the path to the debuginfo file", and for split debuginfo it's likely to be the case that the binary itself has a minimal set of info and pointers to a number of other files.

philipc · 2020-10-09T09:07:15Z

I've started looking at the macOS support for this again, and have a rough proof of concept for the addr2line crate, which should port to this crate fairly simply.

One thing it requires (and which I haven't attempted yet) is the ability to read objects in archives. I'm not sure what sort of efficiency/complexity trade-offs we should be making for this. Here's some possible strategies:

mmap the archive once for each object, even if this means an archive is mapped multiple times. These would be stored alongside the existing mmap for the executable or shared library, and freed when its LRU cache entry is removed.
Create a hashmap of archive mmaps that can be reused, and use reference counting to determine when they can be freed.
Read each object into a Vec.

For reading archives there is the ar crate, but it seems to only provide a streaming interface, so that would work better with reading into a Vec. This would be another dependency though. I think archive support is simple enough that it could be added to the object crate instead of adding another dependency.

alexcrichton · 2020-10-09T20:04:07Z

I agree that parsing ar archives is easy enough that it should be fine to add to object. For storing this in this crate though I'm not sure what the best strategy will be since I'm not super familiar with how it's all going to be used. For example will we need to mmap something per rlib? Per object?

In any case I think something is better than nothing so having at least a PR or feature to start with sounds good!

philipc · 2020-10-09T21:21:03Z

What we have is a list of symbols for each object that was linked in the executable/library. The object is specified by an N_OSO symbol that is either an object file name (/path/to/object.o), or an object within an archive (/path/to/archive.a(object.o)), and is followed by N_FUN etc symbols for each symbol from that object which was linked in. The symbolication logic needs to be something like:

determine which executable/library contains the address
lookup the symbol for that address (and this lookup can find the object/archive path too)
load the object (this is the step I was talking about above)
lookup the symbol in that object so that the address can be translated
parse the DWARF in that object and lookup the translated address

alexcrichton · 2020-10-11T17:28:44Z

I think it's fine to basically just throw things onto the LRU cache we already have. It may need to have its size tweaked if there's a huge number of object files in one build to avoid thrashing too much though.

luser · 2020-10-12T15:28:28Z

Create a hashmap of archive mmaps that can be reused, and use reference counting to determine when they can be freed.

I think this strategy, or something like it, to at least avoid mapping the same archive multiple times, is worthwhile.

The object is specified by an N_OSO symbol

Is there already a crate that handles parsing stabs debug info? I can't recall seeing one, but it is an old and relatively unused format so that doesn't surprise me. It's also not terribly complicated and the set of stabs you actually need to support for handling the macOS case is not that large.

It is probably worthwhile to at least ensure that whatever you design here won't need major changes to work with DWARF split debuginfo, since that would be a valuable addition.

luser · 2020-10-12T15:45:32Z

As a procedural note, it might be helpful to exhaustively enumerate all the cases we expect this crate to be able to handle gracefully, what's currently supported / needs work, and where new functionality might want to exist. Off the top of my head I can think of:

ELF binaries with DWARF:
- The simple case, full debug info in the binary (-g)
- Traditional debug info in a separate file where everything has been compiled and linked as usual, and then objdump is used to create a separate file containing only debug info, and a .gnu_debuglink section is added, or the ELF Build ID is used to locate the separate file
- Split DWARF, binary contains references to debug info in the .dwo files alongside the original object files (-gsplit-dwarf)
  - Utilizing the .gdb_index section, or the newer DWARF 5 .debug_names + .debug_str sections if present, as an optimization
- Split DWARF where the .dwo files have been packaged into a .dwp file
Mach-O binaries with DWARF:
- Debug info in the original object files with the stabs index in the binary (as discussed in this issue)
- Debug info in a separate file in a .dSYM bundle, the result of running dsymutil
Windows PE with PDB debug info:
- Standard PDB referenced in the PE debug headers containing full debug info
- PDB referenced in the PE debug headers but with most debug info as references to PDB files next to individual object files (/DEBUG:FASTLINK)

philipc · 2020-10-12T22:35:09Z

Is there already a crate that handles parsing stabs debug info?

They coexist with other symbols, so existing Mach-O parsers can read them, and it's simple enough to get the information we need without a full stabs debug info parser.

It is probably worthwhile to at least ensure that whatever you design here won't need major changes to work with DWARF split debuginfo, since that would be a valuable addition.

My feeling is that there will be little in common beyond the DWARF parsing itself, which is already done. We currently only support native backtraces, and the parsing of each file format is already separated, so I don't expect much need or possibility for sharing.

Thanks for that list. The ELF/Mach-O cases cover everything I know about. I don't know much about PDB.

alexcrichton · 2020-10-20T21:43:20Z

FWIW it took time to get the gimli implementation in this crate up to par to include in libstd, so I think it's fine if this takes awhile too. Something to play with is better than nothing at all in my opinion :).

I'd be happy to merge basically anything that works behind a feature flag, that way we should have plenty of room to experiment and optimize.

philipc · 2020-12-17T09:30:02Z

I've started looking at DWO/DWP support. This will need larger addr2line changes to handle the skeleton units.

philipc · 2020-12-31T07:04:45Z

The fixes in #401 seem to be enough to get backtraces for rustc's split DWARF support, since the skeleton units still contain location information. I assume we'll still need full DWO/DWP support eventually though.

alexcrichton · 2021-05-04T15:23:27Z

This has since been implemented and we're testing it on CI

philipc mentioned this issue Feb 14, 2020

[ELF] Include dynamic_symbols in symbol_map gimli-rs/object#188

Closed

scottlamb mentioned this issue Mar 21, 2020

shrink moonfire-nvr binary scottlamb/moonfire-nvr#70

Open

7 tasks

philipc mentioned this issue Apr 21, 2020

Switching the backtrace crate to using object gimli-rs/object#215

Closed

alexcrichton mentioned this issue May 14, 2020

Switch to gimli-symbolize by default #324

Merged

memoryruins mentioned this issue Jul 21, 2020

debuginfo: Add support for split-debuginfo on platforms that allow it rust-lang/rust#34651

Closed

philipc mentioned this issue Oct 16, 2020

read: add archive support gimli-rs/object#252

Merged

alexcrichton closed this as completed May 4, 2021

erratic-pattern mentioned this issue Oct 17, 2023

test archive does not include split-debuginfo files nextest-rs/nextest#1043

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for split debuginfo #287

Support for split debuginfo #287

philipc commented Feb 13, 2020

alexcrichton commented Feb 13, 2020

luser commented Jul 22, 2020

philipc commented Oct 9, 2020

alexcrichton commented Oct 9, 2020

philipc commented Oct 9, 2020

alexcrichton commented Oct 11, 2020

luser commented Oct 12, 2020

luser commented Oct 12, 2020

philipc commented Oct 12, 2020

alexcrichton commented Oct 20, 2020

philipc commented Dec 17, 2020

philipc commented Dec 31, 2020

alexcrichton commented May 4, 2021

Support for split debuginfo #287

Support for split debuginfo #287

Comments

philipc commented Feb 13, 2020

alexcrichton commented Feb 13, 2020

luser commented Jul 22, 2020

philipc commented Oct 9, 2020

alexcrichton commented Oct 9, 2020

philipc commented Oct 9, 2020

alexcrichton commented Oct 11, 2020

luser commented Oct 12, 2020

luser commented Oct 12, 2020

philipc commented Oct 12, 2020

alexcrichton commented Oct 20, 2020

philipc commented Dec 17, 2020

philipc commented Dec 31, 2020

alexcrichton commented May 4, 2021