Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for split debuginfo #287

Closed
philipc opened this issue Feb 13, 2020 · 13 comments
Closed

Support for split debuginfo #287

philipc opened this issue Feb 13, 2020 · 13 comments

Comments

@philipc
Copy link
Contributor

philipc commented Feb 13, 2020

As mentioned in rust-lang/rust#34651 (comment), it would be nice if we could generate backtraces when using split debuginfo.

Locating the split debuginfo is something that is useful for all tools that consume DWARF, so I think it makes sense to implement this in another crate. In particular, I think it makes sense to implement this in the object crate (perhaps in its SymbolMap), because the location of the object files is specified in the symbol table entries for Mach-O (some discussion in gimli-rs/addr2line#87). I haven't investigated split DWARF for ELF yet though, so maybe that will change things.

There is also the moria crate, but I think it is targeted at stripped debug info that has been installed in a separate location (post dsymutil), rather than the unlinked DWARF (prior to running dsymutil). Maybe we should be handling that too though?

The gimli support previously used the object crate, but was changed to goblin in d3fb904. Would it be okay for me to change it back to object again? Note that git versions of object no longer depend on goblin, so I don't think the original reason for the change applies anymore. The reason for preferring object rather than goblin is because it is part of the gimli-rs project, so it is primarily targeted at handling debug info, and many existing consumers of DWARF are already using it. Plus I'm more motivated to work on object than goblin :)

cc @fitzgen

@alexcrichton
Copy link
Member

I'm all for anything needed to support this!

Awhile back I tried to prune the dependencies needed for the gimli-symbolize feature but there was no real reason to prefer one crate over the other. In the hope of eventually turning this on in libstd I figured it'd just be easiest if there were as few deps as possible, but hey if we need something from a dependency then we need it and should pull it in!

@luser
Copy link

luser commented Jul 22, 2020

There is also the moria crate, but I think it is targeted at stripped debug info that has been installed in a separate location (post dsymutil), rather than the unlinked DWARF (prior to running dsymutil). Maybe we should be handling that too though?

You're correct that my initial intention with moria was to handle dSYMs on macOS + the historical "ELF debug info stripped into a separate file" case. Making it handle split debuginfo feels like it might not fit quite right though, since currently the API is "give me the path to the debuginfo file", and for split debuginfo it's likely to be the case that the binary itself has a minimal set of info and pointers to a number of other files.

@philipc
Copy link
Contributor Author

philipc commented Oct 9, 2020

I've started looking at the macOS support for this again, and have a rough proof of concept for the addr2line crate, which should port to this crate fairly simply.

One thing it requires (and which I haven't attempted yet) is the ability to read objects in archives. I'm not sure what sort of efficiency/complexity trade-offs we should be making for this. Here's some possible strategies:

  • mmap the archive once for each object, even if this means an archive is mapped multiple times. These would be stored alongside the existing mmap for the executable or shared library, and freed when its LRU cache entry is removed.
  • Create a hashmap of archive mmaps that can be reused, and use reference counting to determine when they can be freed.
  • Read each object into a Vec.

For reading archives there is the ar crate, but it seems to only provide a streaming interface, so that would work better with reading into a Vec. This would be another dependency though. I think archive support is simple enough that it could be added to the object crate instead of adding another dependency.

@alexcrichton
Copy link
Member

I agree that parsing ar archives is easy enough that it should be fine to add to object. For storing this in this crate though I'm not sure what the best strategy will be since I'm not super familiar with how it's all going to be used. For example will we need to mmap something per rlib? Per object?

In any case I think something is better than nothing so having at least a PR or feature to start with sounds good!

@philipc
Copy link
Contributor Author

philipc commented Oct 9, 2020

What we have is a list of symbols for each object that was linked in the executable/library. The object is specified by an N_OSO symbol that is either an object file name (/path/to/object.o), or an object within an archive (/path/to/archive.a(object.o)), and is followed by N_FUN etc symbols for each symbol from that object which was linked in. The symbolication logic needs to be something like:

  • determine which executable/library contains the address
  • lookup the symbol for that address (and this lookup can find the object/archive path too)
  • load the object (this is the step I was talking about above)
  • lookup the symbol in that object so that the address can be translated
  • parse the DWARF in that object and lookup the translated address

@alexcrichton
Copy link
Member

I think it's fine to basically just throw things onto the LRU cache we already have. It may need to have its size tweaked if there's a huge number of object files in one build to avoid thrashing too much though.

@luser
Copy link

luser commented Oct 12, 2020

Create a hashmap of archive mmaps that can be reused, and use reference counting to determine when they can be freed.

I think this strategy, or something like it, to at least avoid mapping the same archive multiple times, is worthwhile.

The object is specified by an N_OSO symbol

Is there already a crate that handles parsing stabs debug info? I can't recall seeing one, but it is an old and relatively unused format so that doesn't surprise me. It's also not terribly complicated and the set of stabs you actually need to support for handling the macOS case is not that large.

It is probably worthwhile to at least ensure that whatever you design here won't need major changes to work with DWARF split debuginfo, since that would be a valuable addition.

@luser
Copy link

luser commented Oct 12, 2020

As a procedural note, it might be helpful to exhaustively enumerate all the cases we expect this crate to be able to handle gracefully, what's currently supported / needs work, and where new functionality might want to exist. Off the top of my head I can think of:

  • ELF binaries with DWARF:
    • The simple case, full debug info in the binary (-g)
    • Traditional debug info in a separate file where everything has been compiled and linked as usual, and then objdump is used to create a separate file containing only debug info, and a .gnu_debuglink section is added, or the ELF Build ID is used to locate the separate file
    • Split DWARF, binary contains references to debug info in the .dwo files alongside the original object files (-gsplit-dwarf)
      • Utilizing the .gdb_index section, or the newer DWARF 5 .debug_names + .debug_str sections if present, as an optimization
    • Split DWARF where the .dwo files have been packaged into a .dwp file
  • Mach-O binaries with DWARF:
    • Debug info in the original object files with the stabs index in the binary (as discussed in this issue)
    • Debug info in a separate file in a .dSYM bundle, the result of running dsymutil
  • Windows PE with PDB debug info:
    • Standard PDB referenced in the PE debug headers containing full debug info
    • PDB referenced in the PE debug headers but with most debug info as references to PDB files next to individual object files (/DEBUG:FASTLINK)

@philipc
Copy link
Contributor Author

philipc commented Oct 12, 2020

Is there already a crate that handles parsing stabs debug info?

They coexist with other symbols, so existing Mach-O parsers can read them, and it's simple enough to get the information we need without a full stabs debug info parser.

It is probably worthwhile to at least ensure that whatever you design here won't need major changes to work with DWARF split debuginfo, since that would be a valuable addition.

My feeling is that there will be little in common beyond the DWARF parsing itself, which is already done. We currently only support native backtraces, and the parsing of each file format is already separated, so I don't expect much need or possibility for sharing.

Thanks for that list. The ELF/Mach-O cases cover everything I know about. I don't know much about PDB.

@alexcrichton
Copy link
Member

FWIW it took time to get the gimli implementation in this crate up to par to include in libstd, so I think it's fine if this takes awhile too. Something to play with is better than nothing at all in my opinion :).

I'd be happy to merge basically anything that works behind a feature flag, that way we should have plenty of room to experiment and optimize.

@philipc
Copy link
Contributor Author

philipc commented Dec 17, 2020

I've started looking at DWO/DWP support. This will need larger addr2line changes to handle the skeleton units.

@philipc
Copy link
Contributor Author

philipc commented Dec 31, 2020

The fixes in #401 seem to be enough to get backtraces for rustc's split DWARF support, since the skeleton units still contain location information. I assume we'll still need full DWO/DWP support eventually though.

@alexcrichton
Copy link
Member

This has since been implemented and we're testing it on CI

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants