Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

type level debuginfo is duplicated across codegen units #136059

Open
jyn514 opened this issue Jan 25, 2025 · 5 comments
Open

type level debuginfo is duplicated across codegen units #136059

jyn514 opened this issue Jan 25, 2025 · 5 comments
Labels
A-debuginfo Area: Debugging information in compiled programs (DWARF, PDB, etc.) C-bug Category: This is a bug. I-heavy Issue: Problems and improvements with respect to binary size of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@jyn514
Copy link
Member

jyn514 commented Jan 25, 2025

I tried this code:

// inner.rs
  /// Byte order that is selectable at runtime.
  #[derive(Debug, Clone, Copy, PartialEq, Eq, Hash)]
  pub enum RunTimeEndian {
      /// Little endian byte order.
      Little,
      /// Big endian byte order.
      Big,
  }

// main.rs
  use inner::RunTimeEndian;

  fn main() {
      use RunTimeEndian::*;
      println!("{:?}{:?}{:?}{:?}{:?}", Little, Big, Little, Big, Little);
  }

I expected to see this happen: RunTimeEndian only appears once in the debuginfo.

Instead, this happened: RunTimeEndian appears twice:

; dwarfdump -a /home/jyn/.local/lib/cargo/target/debug/example | rg '\sRunTimeEndian$' -B5
< 1><0x00000329>    DW_TAG_namespace
                      DW_AT_name                  inner
< 2><0x0000032e>      DW_TAG_enumeration_type
                        DW_AT_type                  <0x00000322>
                        DW_AT_enum_class            yes(1)
                        DW_AT_name                  RunTimeEndian
--
< 1><0x0000002a>    DW_TAG_namespace
                      DW_AT_name                  inner
< 2><0x0000002f>      DW_TAG_enumeration_type
                        DW_AT_type                  <0x00000088>
                        DW_AT_enum_class            yes(1)
                        DW_AT_name                  RunTimeEndian

Note that the original (non-minimized) example was much worse; the correctness integration test for addr2line has this duplicated 59 times.

cc #129722, #115455. this is not the same as either of those because it only appears across codegen units (AFAICT).

Meta

rustc --version --verbose:

rustc 1.86.0-nightly (049355708 2025-01-18)
binary: rustc
commit-hash: 049355708383ab1b9a1046559b9d4230bdb3a5bc
commit-date: 2025-01-18
host: x86_64-unknown-linux-gnu
release: 1.86.0-nightly
LLVM version: 19.1.7
@jyn514 jyn514 added the C-bug Category: This is a bug. label Jan 25, 2025
@rustbot rustbot added the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Jan 25, 2025
@jyn514

This comment has been minimized.

@rustbot rustbot added A-debuginfo Area: Debugging information in compiled programs (DWARF, PDB, etc.) I-heavy Issue: Problems and improvements with respect to binary size of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Jan 25, 2025
@jyn514
Copy link
Member Author

jyn514 commented Jan 25, 2025

here is an idea of how often this occurs:

$ dwarfdump -a /home/jyn/.local/lib/cargo/target/debug/deps/correctness-d4e7520a3982659f | rg DW_AT_name | sed 's/.*DW_AT_name\s*//' | sort | uniq -c | rg -v '^\s*1\s' | sort -k1 -h > duplicates2.txt
$ dwarfdump -a /home/jyn/.local/lib/cargo/target/debug/deps/correctness-d4e7520a3982659f | rg DW_AT_name | wc -l
362272
$ awk '{s+=$1} END {printf "%.0f", s}' duplicates2.txt
343320
$ python -c 'print(343320/362272)'
0.9476857168094691

that's 94.8% of all debuginfo which is just duplicates of other codegen units.

duplicates2.txt

there is some duplication here because DW_AT_name does not include the namespace, only the base name. so things like core::fmt::Result and std::io::Result will be double-counted. but i don't think we would get up to this order of magnitude just from overlapping basenames.

@bjorn3
Copy link
Member

bjorn3 commented Jan 25, 2025

DWARF v4 and up have type units for deduplicating type debuginfo between object files, but on macOS we are forced to use DWARF v3 which doesn't have type units. Also type units are identified using MD5 hashes truncated to 64bit.

@jyn514
Copy link
Member Author

jyn514 commented Jan 25, 2025

i gathered this info on x64 linux gnu, so either we are not using type units or something else is going wrong.

@bjorn3
Copy link
Member

bjorn3 commented Jan 25, 2025

We are indeed not using type units right now. On macOS we can't and on other platforms I'm not sure if it is the best idea given the potential for hash collisions.

@jieyouxu jieyouxu removed the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Jan 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-debuginfo Area: Debugging information in compiled programs (DWARF, PDB, etc.) C-bug Category: This is a bug. I-heavy Issue: Problems and improvements with respect to binary size of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

4 participants