Skip to content

fix: [#1949] Implement implicit closing of <p> elements per HTML spec#2007

Merged
capricorn86 merged 1 commit intocapricorn86:masterfrom
TrevorBurnham:1949-fix
Jan 20, 2026
Merged

fix: [#1949] Implement implicit closing of <p> elements per HTML spec#2007
capricorn86 merged 1 commit intocapricorn86:masterfrom
TrevorBurnham:1949-fix

Conversation

@TrevorBurnham
Copy link
Copy Markdown
Contributor

Fixes #1949

This PR fixes the HTML parser to correctly handle malformed HTML involving <p> elements. Per the HTML spec, <p> elements should be implicitly closed when certain block-level elements are encountered.

Problem

When parsing malformed HTML like <p>testing with </div><p>new line</p>, happy-dom produced:

<p>testing with <p>new line</p></p>

But browsers produce:

<p>testing with </p><p>new line</p>

Changes

packages/happy-dom/src/config/HTMLElementConfig.ts

Updated the <p> element configuration from anyDescendants to noForbiddenFirstLevelDescendants with a list of elements that should implicitly close <p>:

p: {
    className: 'HTMLParagraphElement',
    contentModel: HTMLElementConfigContentModelEnum.noForbiddenFirstLevelDescendants,
    forbiddenDescendants: [
        'address', 'article', 'aside', 'blockquote', 'details', 'dialog',
        'div', 'dl', 'fieldset', 'figcaption', 'figure', 'footer', 'form',
        'h1', 'h2', 'h3', 'h4', 'h5', 'h6', 'header', 'hgroup', 'hr',
        'main', 'menu', 'nav', 'ol', 'p', 'pre', 'section', 'table', 'ul'
    ]
}

Implementation Details

The fix follows the WHATWG HTML Standard which specifies that a <p> element's end tag can be omitted (implicitly closed) when followed by these elements:

address, article, aside, blockquote, details, dialog, div, dl, fieldset, figcaption, figure, footer, form, h1, h2, h3, h4, h5, h6, header, hgroup, hr, main, menu, nav, ol, p, pre, search, section, table, ul

The implementation leverages the existing noForbiddenFirstLevelDescendants content model, which is already used by elements like <dd>, <dt>, and <option>.

Test Coverage

  1. Original issue: stray </div> with nested <p>
  2. Multiple <p> elements closing each other
  3. Block-level elements closing <p> (div, h1, ul, table, section, hr, blockquote)
  4. Inline elements NOT closing <p> (span, a, strong)
  5. Nested structures
  6. Stray end tags
  7. Fragment parsing

Copy link
Copy Markdown
Owner

@capricorn86 capricorn86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your contribution @TrevorBurnham! ⭐

@capricorn86 capricorn86 merged commit 63b8f3d into capricorn86:master Jan 20, 2026
3 checks passed
silverwind added a commit to go-gitea/gitea that referenced this pull request Jan 24, 2026
1. Upgrade to [jQuery
4.0](https://blog.jquery.com/2026/01/17/jquery-4-0-0/). Two of the
removed APIs are in use by fomantic, but there are [polyfills
present](https://github.com/go-gitea/gitea/blob/a3a3e581aa387969ce6410ab54c4775e9023ec40/web_src/fomantic/build/components/dropdown.js#L15-L17)
so it continues to work.
2. Remove manual naming of webpack chunks. I was running into below
webpack error and I see no reason for this manual chunk naming which is
prone to naming collisions. Also, the webpack build now shows all output
assets. This change will result in longer asset filenames, but webpack
should now be able to guarentee that the names are without collisions.
    ````
    ERROR in SplitChunksPlugin
    Cache group "defaultVendors" conflicts with existing chunk.
Both have the same name "--------" and existing chunk is not a parent of
the selected modules.
Use a different name for the cache group or make sure that the existing
chunk is a parent (e. g. via dependOn).
    HINT: You can omit "name" to automatically create a name.
BREAKING CHANGE: webpack < 5 used to allow to use an entrypoint as
splitChunk. This is no longer allowed when the entrypoint is not a
parent of the selected modules.
Remove this entrypoint and add modules to cache group's 'test' instead.
If you need modules to be evaluated on startup, add them to the existing
entrypoints (make them arrays). See migration guide of more info.
3. Fix test issue related to `p > div` which is invalid as per HTML spec
because `div` is not [phrasing
content](https://html.spec.whatwg.org/multipage/dom.html#phrasing-content-2)
and therefor can not be a descendant of `p`. This is related to
capricorn86/happy-dom#2007.
4. Add webpack globals
5. Remove obsolete docs glob
6. fix security issue for `seroval` package
7. disable [vitest isolate](https://vitest.dev/config/isolate.html) for
30% faster JS tests, which are all pure.
atzzCokeK added a commit to atzzCokeK/happy-dom that referenced this pull request Feb 6, 2026
…w flow content

Fixes capricorn86#2052

## Problem

The caption element was incorrectly configured with contentModel: textOrComments,
which only allowed text nodes and comments. Per the HTML spec, caption elements
should contain flow content (inline and block elements), except table elements.

This caused elements like <b>, <em>, <span>, etc. to be incorrectly moved
outside the caption during parsing.

## Changes

### packages/happy-dom/src/config/HTMLElementConfig.ts

Updated caption element configuration to match the HTML spec and follow the
same pattern as td/th elements:

- Changed contentModel from textOrComments to noForbiddenFirstLevelDescendants
- Added forbiddenDescendants: table structure elements (table, tbody, thead,
  tfoot, tr, td, th, col, colgroup)
- Added permittedParents: ['table'] to ensure caption only appears in tables

### packages/happy-dom/test/html-parser/HTMLParser.malformedHTML.test.ts

Added comprehensive test coverage for caption element content model:
- Inline elements preservation (b, strong, em, span, a)
- Nested inline elements
- Block-level elements (p, div)
- Table element prohibition
- Content serialization
- permittedParents validation (wrong parent, standalone, correct parent)

## Implementation Details

This fix follows the same pattern as PR capricorn86#2007 for paragraph elements, using
the existing noForbiddenFirstLevelDescendants content model. While this
doesn't recursively check deeply nested table elements, it matches the
existing codebase patterns and handles the vast majority of real-world cases.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
atzzCokeK added a commit to atzzCokeK/happy-dom that referenced this pull request Feb 6, 2026
…w flow content

Fixes capricorn86#2052

## Problem

The caption element was incorrectly configured with contentModel: textOrComments,
which only allowed text nodes and comments. Per the HTML spec, caption elements
should contain flow content (inline and block elements), except table elements.

This caused elements like <b>, <em>, <span>, etc. to be incorrectly moved
outside the caption during parsing.

## Changes

### packages/happy-dom/src/config/HTMLElementConfig.ts

Updated caption element configuration to match the HTML spec and follow the
same pattern as td/th elements:

- Changed contentModel from textOrComments to noForbiddenFirstLevelDescendants
- Added forbiddenDescendants: table structure elements (table, tbody, thead,
  tfoot, tr, td, th, col, colgroup)
- Added permittedParents: ['table'] to ensure caption only appears in tables

### packages/happy-dom/test/html-parser/HTMLParser.malformedHTML.test.ts

Added comprehensive test coverage for caption element content model:
- Inline elements preservation (b, strong, em, span, a)
- Nested inline elements
- Block-level elements (p, div)
- Table element prohibition
- Content serialization
- permittedParents validation (wrong parent, standalone, correct parent)

## Implementation Details

This fix follows the same pattern as PR capricorn86#2007 for paragraph elements, using
the existing noForbiddenFirstLevelDescendants content model. While this
doesn't recursively check deeply nested table elements, it matches the
existing codebase patterns and handles the vast majority of real-world cases.
atzzCokeK added a commit to atzzCokeK/happy-dom that referenced this pull request Feb 6, 2026
…w flow content

Fixes capricorn86#2052

## Problem

The caption element was incorrectly configured with contentModel: textOrComments,
which only allowed text nodes and comments. Per the HTML spec, caption elements
should contain flow content (inline and block elements), except table elements.

This caused elements like <b>, <em>, <span>, etc. to be incorrectly moved
outside the caption during parsing.

## Changes

### packages/happy-dom/src/config/HTMLElementConfig.ts

Updated caption element configuration to match the HTML spec and follow the
same pattern as td/th elements:

- Changed contentModel from textOrComments to noForbiddenFirstLevelDescendants
- Added forbiddenDescendants: table structure elements (table, tbody, thead,
  tfoot, tr, td, th, col, colgroup)
- Added permittedParents: ['table'] to ensure caption only appears in tables

### packages/happy-dom/test/html-parser/HTMLParser.malformedHTML.test.ts

Added comprehensive test coverage for caption element content model:
- Inline elements preservation (b, strong, em, span, a)
- Nested inline elements
- Block-level elements (p, div)
- Table element prohibition
- Content serialization
- permittedParents validation (wrong parent, standalone, correct parent)

## Implementation Details

This fix follows the same pattern as PR capricorn86#2007 for paragraph elements, using
the existing noForbiddenFirstLevelDescendants content model. While this
doesn't recursively check deeply nested table elements, it matches the
existing codebase patterns and handles the vast majority of real-world cases.
capricorn86 pushed a commit that referenced this pull request Feb 9, 2026
…ent (#2058)

Fixes #2052

## Problem

The caption element was incorrectly configured with contentModel: textOrComments,
which only allowed text nodes and comments. Per the HTML spec, caption elements
should contain flow content (inline and block elements), except table elements.

This caused elements like <b>, <em>, <span>, etc. to be incorrectly moved
outside the caption during parsing.

## Changes

### packages/happy-dom/src/config/HTMLElementConfig.ts

Updated caption element configuration to match the HTML spec and follow the
same pattern as td/th elements:

- Changed contentModel from textOrComments to noForbiddenFirstLevelDescendants
- Added forbiddenDescendants: table structure elements (table, tbody, thead,
  tfoot, tr, td, th, col, colgroup)
- Added permittedParents: ['table'] to ensure caption only appears in tables

### packages/happy-dom/test/html-parser/HTMLParser.malformedHTML.test.ts

Added comprehensive test coverage for caption element content model:
- Inline elements preservation (b, strong, em, span, a)
- Nested inline elements
- Block-level elements (p, div)
- Table element prohibition
- Content serialization
- permittedParents validation (wrong parent, standalone, correct parent)

## Implementation Details

This fix follows the same pattern as PR #2007 for paragraph elements, using
the existing noForbiddenFirstLevelDescendants content model. While this
doesn't recursively check deeply nested table elements, it matches the
existing codebase patterns and handles the vast majority of real-world cases.
RAprogramm pushed a commit to RAprogramm/fork-happy-dom that referenced this pull request Feb 20, 2026
…w flow content (capricorn86#2058)

Fixes capricorn86#2052

## Problem

The caption element was incorrectly configured with contentModel: textOrComments,
which only allowed text nodes and comments. Per the HTML spec, caption elements
should contain flow content (inline and block elements), except table elements.

This caused elements like <b>, <em>, <span>, etc. to be incorrectly moved
outside the caption during parsing.

## Changes

### packages/happy-dom/src/config/HTMLElementConfig.ts

Updated caption element configuration to match the HTML spec and follow the
same pattern as td/th elements:

- Changed contentModel from textOrComments to noForbiddenFirstLevelDescendants
- Added forbiddenDescendants: table structure elements (table, tbody, thead,
  tfoot, tr, td, th, col, colgroup)
- Added permittedParents: ['table'] to ensure caption only appears in tables

### packages/happy-dom/test/html-parser/HTMLParser.malformedHTML.test.ts

Added comprehensive test coverage for caption element content model:
- Inline elements preservation (b, strong, em, span, a)
- Nested inline elements
- Block-level elements (p, div)
- Table element prohibition
- Content serialization
- permittedParents validation (wrong parent, standalone, correct parent)

## Implementation Details

This fix follows the same pattern as PR capricorn86#2007 for paragraph elements, using
the existing noForbiddenFirstLevelDescendants content model. While this
doesn't recursively check deeply nested table elements, it matches the
existing codebase patterns and handles the vast majority of real-world cases.
RAprogramm pushed a commit to RAprogramm/fork-happy-dom that referenced this pull request Feb 20, 2026
…w flow content (capricorn86#2058)

Fixes capricorn86#2052

## Problem

The caption element was incorrectly configured with contentModel: textOrComments,
which only allowed text nodes and comments. Per the HTML spec, caption elements
should contain flow content (inline and block elements), except table elements.

This caused elements like <b>, <em>, <span>, etc. to be incorrectly moved
outside the caption during parsing.

## Changes

### packages/happy-dom/src/config/HTMLElementConfig.ts

Updated caption element configuration to match the HTML spec and follow the
same pattern as td/th elements:

- Changed contentModel from textOrComments to noForbiddenFirstLevelDescendants
- Added forbiddenDescendants: table structure elements (table, tbody, thead,
  tfoot, tr, td, th, col, colgroup)
- Added permittedParents: ['table'] to ensure caption only appears in tables

### packages/happy-dom/test/html-parser/HTMLParser.malformedHTML.test.ts

Added comprehensive test coverage for caption element content model:
- Inline elements preservation (b, strong, em, span, a)
- Nested inline elements
- Block-level elements (p, div)
- Table element prohibition
- Content serialization
- permittedParents validation (wrong parent, standalone, correct parent)

## Implementation Details

This fix follows the same pattern as PR capricorn86#2007 for paragraph elements, using
the existing noForbiddenFirstLevelDescendants content model. While this
doesn't recursively check deeply nested table elements, it matches the
existing codebase patterns and handles the vast majority of real-world cases.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

DOMParser does not correctly handle malformed HTML

2 participants