Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure deterministic resolution of toctree #12888

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

khanxmetu
Copy link
Contributor

Subject: Ensure deterministic toctree generation

Feature or Bugfix

  • Bugfix

Purpose

  • Ensures that the path from a specified doc to root ancestor is deterministic (lexicographically greatest parent chosen) in all cases by sorting Builder.env.toctree_includes before write phase begins so that the toctree is generated in a deterministic way.

Detail

Relates

@khanxmetu khanxmetu changed the title Ensure deterministic toctree generation Ensure deterministic toctree resolution Sep 15, 2024
@khanxmetu khanxmetu changed the title Ensure deterministic toctree resolution Ensure deterministic resolution of toctree Sep 15, 2024
Copy link
Member

@AA-Turner AA-Turner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally we'd have a test for this, but I'm not sure how easy it would be, so I don't mind as much not having one.

Please could you add an entry to CHANGES, though?

A

Copy link
Member

@chrisjsewell chrisjsewell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great work guys!

@khanxmetu
Copy link
Contributor Author

khanxmetu commented Sep 19, 2024

This PR only resolves the issue of non-determinism, however users may still have their own expectations of what parent should be chosen which is not trivial to solve. I think we should warn the user when a document is included in multiple toctrees (different files). What do you think?

Edit: I have added the warning in the recent commit but feel free to discard.

@@ -758,6 +758,7 @@ def check_consistency(self) -> None:
continue
logger.warning(__("document isn't included in any toctree"),
location=docname)
_check_toc_parents(self.toctree_includes)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the consistency check here instead of doing it while resolution in _get_toctree_ancestors() because _get_toctree_ancestors() is being called multiple times for the same document while writing.

Comment on lines +803 to +804
__("document is referenced in multiple toctrees: %s, "
"selecting: %s <- %s"),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably have to put translations for this (I'm not sure how).

Copy link
Contributor

@jayaddison jayaddison left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Along the lines of "correctness first, performance / etc second", I like this and would suggest that we merge it and gather feedback 👍

There are two thoughts I had while reviewing this:

  • Whether tree depth might be better than lexicographic order as a sort key (@khanxmetu has also noted this as a possibility in an updated comment)
  • A question really: it puzzles me slightly that we have nondeterminism here and not in the breadcrumb navigation trail (e.g. System Emulation > QEMU System Emulator Targets > PowerPC System emulator > pSeries family boards (pseries) > ... in the header of the referenced qemu page). Does that component gather the navtree differently? (I should, and will eventually, check)

@khanxmetu
Copy link
Contributor Author

khanxmetu commented Sep 28, 2024

This seems a better approach than solely relying upon lexicographic order. However note that sorting and lexicographic order is still required to break the tie in case of equal path depths. Besides that, are you proposing for minimum or maximum path depth to be chosen?

  • A question really: it puzzles me slightly that we have nondeterminism here and not in the breadcrumb navigation trail (e.g. System Emulation > QEMU System Emulator Targets > PowerPC System emulator > pSeries family boards (pseries) > ... in the header of the referenced qemu page). Does that component gather the navtree differently? (I should, and will eventually, check)

Thank you for bringing this up. I found two important issues:

  1. It's true that the navtree header is different than what is shown in the sidebar but the problem in this specific case of qemu is not related to determinism. I've opened a new issue for that: TOC in navigation side bar is generated incorrectly for depth > 3  #12926

  2. You're right that navtree header is gathered differently than the global toctree and this needs to be considered here. The parents in the navtree header come from BuildEnvironment.collect_relations() which uses _traverse_toctree() generator. Here an in order tree traversal takes place and the relations: parent, prev, next are generated in one pass. The parent obtained from in order traversal is the one that is discovered first in traversal and doesn't depend on lexicograph order or path depth. We also need to make sure this is consistent with our desired behavior of toctree generation. (In case of qemu’s specs/ppc-spapr-numa, navtree header happens to be generated consistently with toctree ancestors by chance).

I propose that shortest depth path should be chosen. We can add a BFS traversal function similar to _traverse_toctree to find the parents for all nodes satisfying shortest path. I would also suggest that we precompute desired parent mapping and use the same implementation for _get_toctree_ancestors .

@jayaddison
Copy link
Contributor

This seems a better approach than solely relying upon lexicographic order. However note that sorting and lexicographic order is still required to break the tie in case of equal path depths. Besides that, are you proposing for minimum or maximum path depth to be chosen?

Thanks @khanxmetu, that makes sense that a tiebreaker will still be required if we use a path-depth approach.

To answer whether I'm suggesting to include path-depth in the sorting: initially I say no, let's continue to use solely lexicographic sorting -- because whatever method we choose, I think some projects/pages will still emit sidebar navigation menus that seem unexpected to some people, because the origin of the ambiguity is in the source files.

Even so, I think additional discussion of the navtree is worthwhile:

  1. You're right that navtree header is gathered differently than the global toctree and this needs to be considered here. The parents in the navtree header come from BuildEnvironment.collect_relations() which uses _traverse_toctree() generator. Here an in order tree traversal takes place and the relations: parent, prev, next are generated in one pass. The parent obtained from in order traversal is the one that is discovered first in traversal and doesn't depend on lexicograph order or path depth. We also need to make sure this is consistent with our desired behavior of toctree generation. (In case of qemu’s specs/ppc-spapr-numa, navtree header happens to be generated consistently with toctree ancestors by chance).

I wouldn't worry too much about ensuring that the navtree is always consistent with the sidebar toctree; as you've noticed in #12926, table-of-contents displays can be customized by themes and layout; my sense is that it could be difficult to get them to correspond precisely, and also that additional tree traversals might introduce unpredictable build performance changes (especially for large projects).

To resolve the bug we can focus on making the build output stable (ensuring that it stays the same for two or more subsequent builds) - and your changeset here already does that.

If it seems easy to re-use logic between _traverse_toctree and _get_toctree_ancestors, -- or alternatively to rename them to make them more distinct -- then that could be a nice subsequent cleanup as a separate pull request, but I don't think that's required initially.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants