Do not print locations in anonymous tag names.#159592
Do not print locations in anonymous tag names.#159592rastogishubham wants to merge 1 commit intollvm:mainfrom
Conversation
|
@llvm/pr-subscribers-debuginfo @llvm/pr-subscribers-clang-codegen Author: Shubham Sandeep Rastogi (rastogishubham) ChangesIf we use LTO as a driver, we can see that even if we have lambdas in Therefore we would never have two dies with the same linkage name but This change prevents that from happening. In our example: We can see with
But with
Full diff: https://github.com/llvm/llvm-project/pull/159592.diff 2 Files Affected:
diff --git a/clang/lib/CodeGen/CGDebugInfo.cpp b/clang/lib/CodeGen/CGDebugInfo.cpp
index 578d09f7971d6..9df792be1f283 100644
--- a/clang/lib/CodeGen/CGDebugInfo.cpp
+++ b/clang/lib/CodeGen/CGDebugInfo.cpp
@@ -397,6 +397,8 @@ PrintingPolicy CGDebugInfo::getPrintingPolicy() const {
// Apply -fdebug-prefix-map.
PP.Callbacks = &PrintCB;
+ // Disable printing of location of an anonymous tag name.
+ PP.AnonymousTagLocations = false;
return PP;
}
diff --git a/clang/test/DebugInfo/CXX/anonymous-locs.cpp b/clang/test/DebugInfo/CXX/anonymous-locs.cpp
new file mode 100644
index 0000000000000..48a8aff57d520
--- /dev/null
+++ b/clang/test/DebugInfo/CXX/anonymous-locs.cpp
@@ -0,0 +1,14 @@
+// RUN: %clang_cc1 -std=c++20 -emit-obj -debug-info-kind=standalone -dwarf-version=5 -triple x86_64-apple-darwin -o %t %s
+// RUN: llvm-dwarfdump %t | FileCheck %s
+
+// CHECK: DW_TAG_structure_type
+// CHECK-NEXT: DW_AT_calling_convention (DW_CC_pass_by_value)
+// CHECK-NEXT: DW_AT_name ("Foo<(lambda){}>")
+
+template<auto T>
+struct Foo {
+};
+
+Foo<[] {}> f;
+
+auto func() { return f; }
\ No newline at end of file
|
If we use LTO as a driver, we can see that even if we have lambdas in template parameters, two debug entries from different CUs will get merged in LTO, but their human-readable DW_AT_name will be different if we emit the locations in anonymous tag names. Therefore we would never have two dies with the same linkage name but different human readable names. Which probably should not happen. This change prevents that from happening.
378959f to
549f627
Compare
|
Any chance we could use a naming that matches/is similar to the mangling? Including the lambda numbering - could help ensure the names are unique-ish? - bonus points if the same name could be generated by just looking at the DWARF for the lambda type itself... I suspect you'd have to include the lambda numbering in the lambda to make that work nicely/without having to count lambdas in the enclosing scope or something. There's some old discussion I had in LLVM and dwarf-discuss about the problems with lambda naming - these naming problems are the reason I couldn't include templates with lambda parameters in simplified-template-names (they get unsimplified names instead). https://lists.dwarfstd.org/pipermail/dwarf-discuss/2022-January/002102.html |
|
Also - maybe compare to GCC's naming? (could be nice to converge, but I'm not sure that's possible/might be in tension with matching demangling) |
|
Oh, and did you hit some particular problem with this varied naming situation? It's not immediately obvious to me that this would be a problem - one version might be more or less confusing to the user as the other, but most of the differences I saw in (an admittedly very highly coherent build system) were "./foo.h" versus "foo.h" - in which case either would've been fine for the user to read/understand what was being referenced. |
|
Hi @dwblaikie
We actually did compare to what GCC does, and you can see with this compiler explorer link that GCC doesn't add the location to the DW_AT_name cc @Michael137 for the example (thanks for providing that!)
Yes, we had an issue with dsymutil, we saw varifiction error in the DWARFVerifier when trying to verify the contents of the Accelerator tables. Since dsymutil does type uniquing similar to LTO.
Correct me if I am wrong, but if we use the mangling, whats the difference between the DW_AT_name and the DW_AT_linkage_name? |
nod So GCC does make some extra effort to make these things a bit more unique by including the scope (though it fails for the inline variable - I /think/ they should be scoped to the inline variable too, since the type does have an ODR/linkage name, the lambdas should be deduplicable across translation units, etc) and the argument types (wonder how that works if the argument types are template parameters... that can get complicated (hmm, "tfunc<<lambda(auto:1)> >" it seems, for a basic case)): https://godbolt.org/z/4M1nfT396 It'd be nice to do at least that, but I'd argue we should do a bit more - by including the mangling number in them too.
Ah, alrighty. New verification checks you're working on/adding, I take it? Good to know/hear about/commendable - I'm sure it's a lot of little things to track down/try to fix, but should make the DWARF more robust/reliable/deterministic/expected.
Ideally, not much should be different, imho. Eventually we should/could ask the question of whether the linkage name is redundant (ideally the linkage name would be the one we remove, though I realize that places a fairly substantial burden on consumers to reconstruct names in a non-trivial way - but the upside is that DWARF descriptions have a lot of redundancy elimination in them (structurally, think about an exponential template expansion "T<T<T<>, T<>>, T<T<>, T<>>>" - DWARF contains no duplication itself in the structural description - with simplified template names it contains no duplication in the DW_AT_name, and the mangled name can only deduplicate within itself - not across names (so in the DWARF where you have the linkage name for "T<>", and for "T<T<>, T<>>" and for the outer "T<...>" there's no ability for the latter to reuse the former - but the DWARF references back to those other DIEs just fine))). (@rnk for FYI on this thread in general, and some ideas here) But even without that somewhat fanciful future world without linkage names in DWARF - notionally the DW_AT_name is a name fragment that's immediately usable without demangling, etc. So I think it still makes sense to include a name that might match the demangling (most simple names do - we include the name of function "foo" despite the fact it's identical to the name you could get from demangling, even in more complex template examples that's often still true) and it seems important to end users that they have a good chance at identifying which lambda is being referenced - which means having /at least/ as much uniqueness as the mangling. And I agree with you that it's important that names don't have /more/ uniqueness than the mangling. So the conclusion that it should have /exactly as much/ uniqueness as the mangling - which, to me, pushes pretty close to printing out something pretty close to identical to the demangling (note, I think there are demanglings that aren't unique - where multiple mangled names demangle to the same name - so perhaps even with the demangling matching motivation we might not quite reach the "have at least as much uniqueness as the mangling" property I described above) |
|
I think coming up with a scope chain like @dwblaikie described would probably be the best outcome because it is resilient to source code drift, but another possible alternative to consider would be to use the file basename with canonicalized case for case-insensitive file systems, plus the line:col. This assumes that most of the verifier failures you are encountering are driven by header file search path mismatches and path traversal component matches ( The downside to this approach is that it is fragile in the presence of source code drift, i.e. if I add a comment block to a shared header and recompile half my application, I could get debug info verifier check failures. This is not technically an ODR violation, but it is a kind of build system dependency tracking failure, but it depends where you want to make the tradeoff between debuggability, effort, and verification reliability. |
|
I should add, I agree with the goal of removing the full filepaths from the type names. They are usually quite long, and in my experience, they usually clutter profiling and debugger tool output. There could be significant debug string size savings from this change as well. |
|
Agreed with Shubham that I'll take this one forward. Do I understand correctly that we somewhat settled on the way forward being to print into If we do want to stop relying on source locations, how do we deal with unnamed structures? Those don't have an equivalent to lambda numbering (that I know of). So something like (godbolt): Clang produces: GCC produces: So do we just also align with GCC here and "give up" on a unique name for unnamed structures? If we could treat anonymous lambdas the same we do unions/structures, that'd be nice from an implementation point of view. But do we lose anything important by doing so? Also, if we don't care about uniqueness of names that have unnamed structure template parameters, does this change anything for |
Something along those lines, yes - the specific spelling, etc, I don't think we settled on. It could be identical to the mangling, or could be different - so long as it is exactly as unique as the mangling: no more, no less.
I don't know for sure, but I assume they rely on the uniqueness of DW_AT_names, with some canonicalization (eg: they do seem to handle slightly different whitespace in template argument lists, maybe different spellings for integer constants (
They do, actually, though GCC and Clang don't have to agree on the mangling, since these entities have internal linkage.
Changing to use a function template, rather than a class template - to force the mangling to show up in the output, we can see here that GCC's demangling of the types is
I don't think so - I think it's still valuable to have unique names - it's not /as/ important for internal linkage entities, because they don't need to be name-associated between CUs (ThinLTO notwithstanding... ).
So long as the names are as unique as the linkage, that should be fine.
We already classify them as nonreconstitutable because we the names we use aren't currently exactly as unique as they need to be. (see this example where I don't remember my whole thinking entirely - but I think my thinking was that while we could reconstruct the name... oh, that's right, it was the path thing - because we put paths in the lambdas, and paths can be named differently, there can be differences in reconstructed names (with type units - the original name used in one place might not be the name used in another place) so that didn't meet my bar for "perfect roundtripping" so I punted on these cases. So even if we get exactly-unique-enough names, if those names are based on lambda numbering, we wouldn't be able to classify the names as reconstitutable until we add that lambda numbering to the DWARF structural description of the type in such a way that we can reproduce the name again (which might mean more than just the numbering - ie, we'd need to know how to differentiate |
|
Thanks for the additional context. The reason I was asking is that
Right, pretty much "scope + unique-ish name (as far as the mangling allows)"
Ahh right! Yea that makes sense. Thankfully
Sounds good
Ah fair point. |
|
@Michael137 Thanks for taking over! |
In debug-info we soon have the need to print names using the full scope of the entity (see discussion in llvm#159592). Particularly, when a structure is scoped inside a function, we'd like to emit the name as `func()::foo`. `CGDebugInfo` uses the `TypePrinter` to print type names into debug-info. However, `TypePrinter` stops (and ignores) `DeclContext`s that are functions. I.e., it would just print `foo`. Ideally it would behave the same way `printNestedNameSpecifier` does. The FIXME in https://github.com/llvm/llvm-project/blob/47c1aa4cef638c97b74f3afb7bed60e92bba1f90/clang/lib/AST/TypePrinter.cpp#L1520-L1521 motivated this patch. See llvm#168533 for how this will be used by `CGDebugInfo`. The plan is to introduce a new `PrintingPolicy` that prints anonymous entities using their full scope (including function/anonymous scopes) and the mangling number.
#168534) In debug-info we soon have the need to print names using the full scope of the entity (see discussion in #159592). Particularly, when a structure is scoped inside a function, we'd like to emit the name as `func()::foo`. `CGDebugInfo` uses the `TypePrinter` to print type names into debug-info. However, `TypePrinter` stops (and ignores) `DeclContext`s that are functions. I.e., it would just print `foo`. Ideally it would behave the same way `printNestedNameSpecifier` does. The FIXME in https://github.com/llvm/llvm-project/blob/47c1aa4cef638c97b74f3afb7bed60e92bba1f90/clang/lib/AST/TypePrinter.cpp#L1520-L1521 motivated this patch. See #168533 for how this will be used by `CGDebugInfo`. The plan is to introduce a new `PrintingPolicy` that prints anonymous entities using their full scope (including function/anonymous scopes) and the mangling number.
…ameSpecifier (#168534) In debug-info we soon have the need to print names using the full scope of the entity (see discussion in llvm/llvm-project#159592). Particularly, when a structure is scoped inside a function, we'd like to emit the name as `func()::foo`. `CGDebugInfo` uses the `TypePrinter` to print type names into debug-info. However, `TypePrinter` stops (and ignores) `DeclContext`s that are functions. I.e., it would just print `foo`. Ideally it would behave the same way `printNestedNameSpecifier` does. The FIXME in https://github.com/llvm/llvm-project/blob/47c1aa4cef638c97b74f3afb7bed60e92bba1f90/clang/lib/AST/TypePrinter.cpp#L1520-L1521 motivated this patch. See llvm/llvm-project#168533 for how this will be used by `CGDebugInfo`. The plan is to introduce a new `PrintingPolicy` that prints anonymous entities using their full scope (including function/anonymous scopes) and the mangling number.
…r (#168534) In debug-info we soon have the need to print names using the full scope of the entity (see discussion in llvm/llvm-project#159592). Particularly, when a structure is scoped inside a function, we'd like to emit the name as `func()::foo`. `CGDebugInfo` uses the `TypePrinter` to print type names into debug-info. However, `TypePrinter` stops (and ignores) `DeclContext`s that are functions. I.e., it would just print `foo`. Ideally it would behave the same way `printNestedNameSpecifier` does. The FIXME in https://github.com/llvm/llvm-project/blob/cd6349c3f592dc3260f12794a0aa2d71e3758fdd/clang/lib/AST/TypePrinter.cpp#L1520-L1521 motivated this patch. See llvm/llvm-project#168533 for how this will be used by `CGDebugInfo`. The plan is to introduce a new `PrintingPolicy` that prints anonymous entities using their full scope (including function/anonymous scopes) and the mangling number. Signed-off-by: Hafidz Muzakky <ais.muzakky@gmail.com>
llvm#168534) In debug-info we soon have the need to print names using the full scope of the entity (see discussion in llvm#159592). Particularly, when a structure is scoped inside a function, we'd like to emit the name as `func()::foo`. `CGDebugInfo` uses the `TypePrinter` to print type names into debug-info. However, `TypePrinter` stops (and ignores) `DeclContext`s that are functions. I.e., it would just print `foo`. Ideally it would behave the same way `printNestedNameSpecifier` does. The FIXME in https://github.com/llvm/llvm-project/blob/47c1aa4cef638c97b74f3afb7bed60e92bba1f90/clang/lib/AST/TypePrinter.cpp#L1520-L1521 motivated this patch. See llvm#168533 for how this will be used by `CGDebugInfo`. The plan is to introduce a new `PrintingPolicy` that prints anonymous entities using their full scope (including function/anonymous scopes) and the mangling number.
llvm#168534) In debug-info we soon have the need to print names using the full scope of the entity (see discussion in llvm#159592). Particularly, when a structure is scoped inside a function, we'd like to emit the name as `func()::foo`. `CGDebugInfo` uses the `TypePrinter` to print type names into debug-info. However, `TypePrinter` stops (and ignores) `DeclContext`s that are functions. I.e., it would just print `foo`. Ideally it would behave the same way `printNestedNameSpecifier` does. The FIXME in https://github.com/llvm/llvm-project/blob/47c1aa4cef638c97b74f3afb7bed60e92bba1f90/clang/lib/AST/TypePrinter.cpp#L1520-L1521 motivated this patch. See llvm#168533 for how this will be used by `CGDebugInfo`. The plan is to introduce a new `PrintingPolicy` that prints anonymous entities using their full scope (including function/anonymous scopes) and the mangling number.
llvm#168534) In debug-info we soon have the need to print names using the full scope of the entity (see discussion in llvm#159592). Particularly, when a structure is scoped inside a function, we'd like to emit the name as `func()::foo`. `CGDebugInfo` uses the `TypePrinter` to print type names into debug-info. However, `TypePrinter` stops (and ignores) `DeclContext`s that are functions. I.e., it would just print `foo`. Ideally it would behave the same way `printNestedNameSpecifier` does. The FIXME in https://github.com/llvm/llvm-project/blob/47c1aa4cef638c97b74f3afb7bed60e92bba1f90/clang/lib/AST/TypePrinter.cpp#L1520-L1521 motivated this patch. See llvm#168533 for how this will be used by `CGDebugInfo`. The plan is to introduce a new `PrintingPolicy` that prints anonymous entities using their full scope (including function/anonymous scopes) and the mangling number.
If we use LTO as a driver, we can see that even if we have lambdas in
template parameters, two debug entries from different CUs will get
merged in LTO, but their human-readable DW_AT_name will be different if
we emit the locations in anonymous tag names.
Therefore we would never have two dies with the same linkage name but
different human readable names. Which probably should not happen.
This change prevents that from happening.
In our example:
We can see with
AnonymousTagLocationsset to true, the dwarfdump output will beDW_AT_name ("Foo<(lambda at a.cpp:12:5){}>")But with
AnonymousTagLocationsset to false, the dwarfdump output will beDW_AT_name ("Foo<(lambda){}>")