Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

include-symbols filter shouldn't discard members #797

Closed
anarthal opened this issue Jan 15, 2025 · 3 comments · Fixed by #812
Closed

include-symbols filter shouldn't discard members #797

anarthal opened this issue Jan 15, 2025 · 3 comments · Fixed by #812

Comments

@anarthal
Copy link

When using an input filter like this:

include-symbols:
  - 'boost::mysql::*'

Where boost::mysql is a namespace, I'd expect mrdocs to render all members for any matched classes. Currently, it looks like the filter is applied to members, too, making boost::mysql::* not very practical.

@alandefreitas
Copy link
Collaborator

alandefreitas commented Jan 15, 2025

Yes. Scopes (namespaces and classes) always have to be handled differently. The filter usually checks if the scope matches the prefix of the glob expressions instead of the whole glob expression.

That makes more sense for namespaces but maybe not so much for classes. For instance,

  • boost::mysql doesn't match boost::mysql::*, but we extract boost::mysql because it matches the prefix of boost::mysql::* and then we evaluate the symbols in boost::mysql individually to check if they match boost::mysql::*. This will include everything in boost::mysql but not symbols in, say, boost::mysql::mysql_sublibrary
  • for classes, something like boost::mysql::any_address matches boost::mysql::*, but the members of boost::mysql::any_address don't.

Actually, the distinction is not only between namespaces and classes. The distinction is about what to do when a scope matches the pattern explicitly (not just the prefix), but the members don't.

Regardless of how we change this, we should also consider the distinction between * and ** remains useful. For instance, boost::mysql::*::detail::** is very different from boost::mysql::**::detail::**, especially when used as a pattern to exclude symbols.

As usual, we can just check what doxygen does in this case. Although I don't think doxygen supports the distinction between * and ** for symbols.

@alandefreitas
Copy link
Collaborator

alandefreitas commented Jan 15, 2025

Maybe we should have different rules depending on the symbol type:

Symbol Type Description Rule Matches Excludes
Namespaces Only need to match the glob prefix std::filesystem::* or std::vector std (because of the prefix) and keeps traversing std::ranges
Records and enums Need to match the Rule exactly std::vector std::vector std::list
Record and enum members Parent needs to match the rules exactly std::vector or std::* std::vector::iterator std::ranges::find
Other symbols Need to match the Rule exactly std::* std::memory_order std::ranges::find

We need to formalize that.

I still have to compare it with Doxygen. That's not so easy because the documentation doesn't say much: https://www.doxygen.nl/manual/config.html#cfg_exclude_symbols

But one thing I noticed in this comparison is the way this is applied to include-symbols when whitelisting symbols is what makes the problem complex, but it creates more surprising behavior. For instance, it's surprising when you include the class but not its members. There's always the expectation that other things will come with a symbol that's whitelisted. And Doxygen only has the exclude-symbols option. For exclude-symbols, requiring the rules to match exactly leads to less surprises. The rules can be simpler because once a symbol is excluded, you just have to ignore all of its members. So you can just extract everything and stop traversing when some symbol matches exactly.

Maybe that's a precedent to actually make the include-symbols rules more strict instead of loosening them. We would document that if something doesn't match exactly, it won't be extracted, even if it's scope. And the we would stimulate the use of ** in include-symbols. Maintaining the include-symbols options is still a good idea though, because we should always prefer whitelisting.

@alandefreitas
Copy link
Collaborator

Just thought of a simpler rule:

  • For input rules:
    • If a namespace: the namespace needs to match one of the patterns, or {name}:: must match one of the pattern prefixes
    • If a record member: the member must match one of the patterns, or a parent must match one of the patterns. (This is defined recursively for members of members).
    • For anything else, we need to match one of the patterns exactly
  • For exclude rules:
    • Only symbols that strictly match one of the exclusion patterns should be excluded.

That's simple to implement and minimizes surprises.

The distinction between * and ** suffixes is maintained as an extra feature doxygen doesn't have. This distinction is too convenient because allowing my_lib::* is much easier and cheaper than allowing my_lib::* then having to exclude my_lib::details::*, my_lib::details1::*, my_lib::details2::*, etc. While still giving the user the option to allow my_lib::**.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants