Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Doxygen tag file support #3

Open
xqms opened this issue Jan 28, 2025 · 6 comments
Open

Doxygen tag file support #3

xqms opened this issue Jan 28, 2025 · 6 comments

Comments

@xqms
Copy link
Owner

xqms commented Jan 28, 2025

Support for resolving links to doxygen documentation (tag file or XML output format)

@mosra
Copy link

mosra commented Jan 28, 2025

(Sorry for stalking this project since the moment you published it, heh. It's something I always wanted to do but never got the courage.)

Some tagfile parsing code to get you started: https://github.com/mosra/m.css/blob/6838b947a3cfae7bfe6040c934ad5bec62b6242d/plugins/m/dox.py#L63-L130

It doesn't support function overloads yet (due to laziness), OTOH I'm using the same logic to resolve references to STL symbols using the tagfile exported from cppreference.

For m.css itself I'm also planning to add some URL mapping functionality to have nicer-looking links, which would then produce a custom tagfile instead of the one Doxygen creates. Which means the XML output wouldn't be useful with that anymore, in case you'd want to use that directly instead.

@xqms
Copy link
Owner Author

xqms commented Jan 29, 2025

Thanks :) I saw that you were thinking about something similar (mosra/m.css#109) and even investigated the LSP route, but that proved quite cumbersome. The protocol is not really designed for highlighting and even the "semantic tokens" request does not provide enough information. LSP basically assumes that you have a C++ lexer already and just need some additional info. And it doesn't do template instantiations, so linking in templated contexts would not work (which is relevant for my stuff).

So far I have resolved overloads etc a) by matching the location of the declaration (file / line number) or b) by matching the qualified name + parameter types. (a) is very robust, but only works with doxygen's XML output. The tag files don't contain locations... (b) may be a little bit more involved. Hopefully doxygen and clang's extracted parameter types will not differ too much in format and semantics 🙏

For the STL / cppreference.com I had to build something "smarter" since the supplied tag files don't contain parameter types at all (see https://github.com/xqms/clang-highlight/blob/1b8825eea38dcedb6ca805f5edb5089b5c158c92/src/clang_highlight/map_stl.py for the hacky details). But hopefully that won't be necessary for tag files generated by doxygen.

Anyway, my hope is to build something that can be easily integrated with m.css :)

@mosra
Copy link

mosra commented Jan 29, 2025

The tag files don't contain locations...

Hmm, that might be worth adding, either by m.css itself in some sort of postprocessing step (along with that proposed URL mapping, for example), or from within Doxygen itself. All that's there for matching overloads right now is a hash that changes based on random spacing and doxygen version, so apart from doing a bug-compatible string compare there's no way to know which is which. And then Doxygen often thinks two methods with the same signature but different template arguments are the same... Having file/line info could at least allow to track it back to the original file.

If clang tells you file/line of the overload, it's definitely easier if it's possible to search by that in the tag file. Everything else is messy, and there's enough messy stuff in the XML processing itself :D

investigated the LSP route

I wanted to go this way because I wanted to have something for Python as well. And I figured if I go through the pain of the insane server/client architecture that LSP is, then I'd have a possibility to make it work with just about any language or dialect afterwards, instead of having to do it from scratch for each. Well, apparently not, so thanks for proving that it's a dead-end.

EDIT: Oh, and libstdc++ itself has Doxygen comments. But I don't think the output matches cppreference, nor it is linked/cross-referenced to it in any way, so I don't think it's useful to make a tagfile out of those. One would have to equip those headers with some custom mapping tags, and ensure they're maintained over time... and in that case I think it's just easier to not use STL at all 😅

@xqms
Copy link
Owner Author

xqms commented Jan 31, 2025

If clang tells you file/line of the overload, it's definitely easier if it's possible to search by that in the tag file. Everything else is messy, and there's enough messy stuff in the XML processing itself :D

Yep, for each call I currently extract the following info from the clang side (here a call to Corrade::Containers::arrayReserve()):

{
  "file": "/home/max/projects/Corrade/Containers/GrowableArray.h",
  "line": 1568,
  "column": 48,
  "name": "arrayReserve",
  "qualified_name": "Corrade::Containers::arrayReserve",
  "parameter_types": [
    "Array<T> &",
    "const std::size_t"
  ]
}

Here's an entry from Corrade's tag file:

<member kind="function">
      <type>std::size_t</type>
      <name>arrayReserve</name>
      <anchorfile>namespaceCorrade_1_1Containers.html</anchorfile>
      <anchor>a365d83a4f4c08c44886e7429fe6afd08</anchor>
      <arglist>(Array&lt; T &gt; &amp;array, std::size_t capacity)</arglist>
</member>

It seems possible to match the arglist against what we get from clang (but note that doxygen is missing a const). So a resolver for doxygen tag files could first look for location information and use that, and then fall back to name + arglist matching, maybe with something "robust" like edit distance.

@mosra
Copy link

mosra commented Feb 1, 2025

That const is unfortunate, heh. Is there no way to make Clang ignore those? Because it's used only in the implementation (not the declaration), with the same semantics as marking a local variable const. In other words, it has no effect on the interface, only on the implementation.

Doxygen has all the file/line info when generating the tagfile, and has even a distinct info for a definition and declaration, it just doesn't use that. I could look into making a Doxygen PR that adds it into tagfiles. That would circumvent the const issue for Magnum, as I don't think such use of const is very common elsewhere.

@xqms
Copy link
Owner Author

xqms commented Feb 1, 2025

I imagine I can get the locations of all declarations & definition on the clang side 🤔

I'll also look into "normalizing" the parameter types to get rid of the const. The logic has to be in the compiler already, otherwise it wouldn't match definition + declaration. Just a matter of finding it 😅

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants