Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parser: Proposal to introduce invariant for order of active formatting elements that are also in stack of open elements #10870

Open
kspangsege opened this issue Dec 18, 2024 · 1 comment

Comments

@kspangsege
Copy link
Contributor

kspangsege commented Dec 18, 2024

What is the issue with the HTML Standard?

I propose to introduce the following non-normative invariant for the combined state of the stack of open elements and the list of active formatting elements:

Invariant: Let the common elements be those elements that occur in both the stack of open elements and the list of active formatting elements. The common elements then always occur in the stack of open elements in the same order as they occur in the list of active formatting elements.

See below for a proof of the claim that this is in fact an invariant.

I believe that this invariant will be generally useful to readers of the standard that wish to understand the parser mechanics in depth, and in particular, the possible behaviors of the "adoption agency" algorithm.

More specifically, this invariant is very useful for implementers, like myself, who seek to represent the list of active formatting elements as an array (rather than a doubly linked list), and represent the active formatting elements bookmark of the "adoption agency" algorithm as an index into this array (rather than a pointer to an entry in a doubly linked list).

If there is an interest, or at least not too much resistance, I intend to follow up with a pull request with specific text changes.

Proof of invariant claim:

First of all, removals from either place (stack or list) cannot by themselves cause a violation of the proposed invariant. Further more, since all insertions are of newly created elements (no reinsertions after removal), we only need to consider the cases where elements are inserted in both places at once (stack and list). There are only a few places where this happens:

  1. A number of places in the specified behavior for the "in body" insertion mode where the start tag of one of the formatting elements is processed.
  2. One of the steps (6) of the inner loop of the "adoption agency" algorithm where a recreated auxiliary formatting element (see definition below) replaces the original in both places (stack and list).
  3. The two last steps (18 and 19) of the outer loop of the "adoption agency" algorithm where the recreated primary formatting element (see definition below) replaces the original in both places (stack and list).

In the case of the processing of start tags of formatting elements (item 1 in the list above), the new element is pushed at the bottom of the stack and appended to the list, so this will uphold the invariant.

In the case of the recreation of auxiliary formatting elements (item 2 in the list above), the new element is inserted at the position of the original, so this will uphold the invariant.

In the case of the recreation of the primary formatting element (item 3 in the list above), the new element is inserted into the stack of open elements below the furthest block, so below all auxiliary formatting elements and above any trailing formatting elements (see definition below). From the point of view of the list of active formatting elements, the new element is inserted after any auxiliary formatting elements (or in place of itself if there are no auxiliary formatting elements), and before any trailing formatting elements. Since the change in the two places is the same from the point of the order of common elements, the invariant is upheld.

The primary formatting element is the formatting element that is passed as argument to the "adoption agency" algorithm.

The auxiliary formatting elements are the active formatting elements that occur between the common ancestor and the furthest block in the stack of open elements (there may be none).

The trailing formatting elements are the active formatting elements that occur below the furthest block in the stack of open elements prior to the invocation of the "adoption agency" algorithm (there may be none).

@annevk
Copy link
Member

annevk commented Dec 18, 2024

Thank you for the detailed feedback. I hope @hsivonen and @zcorpan can make time to evaluate this. Clarifying the HTML parser further seems like a good thing.

@kspangsege kspangsege changed the title Proposal to introduce invariant for order of active formatting elements that are also in stack of open elements Parser: Proposal to introduce invariant for order of active formatting elements that are also in stack of open elements Dec 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

2 participants