Skip to content

fix: [#1910] Replace custom CSS selector parser with parsel-js#2010

Closed
TrevorBurnham wants to merge 1 commit into
capricorn86:masterfrom
TrevorBurnham:fix/1910-css-has-expression-parsing
Closed

fix: [#1910] Replace custom CSS selector parser with parsel-js#2010
TrevorBurnham wants to merge 1 commit into
capricorn86:masterfrom
TrevorBurnham:fix/1910-css-has-expression-parsing

Conversation

@TrevorBurnham
Copy link
Copy Markdown
Contributor

Fixes #1910

Problem

Complex CSS selectors with deeply nested pseudo-classes were being parsed incorrectly, causing wrong query results. For example:

.emXRrt:not(:has(> tbody > tr:not([data-ignore-row])))>thead>tr>th:nth-child(2)

The issue was that the custom regex-based parser only handled one level of nested parentheses.

Solution

Replaced the custom regex-based CSS selector parser with parsel-js, a battle-tested CSS selector parser by Lea Verou.

Benefits:

  • Handles arbitrarily nested parentheses correctly
  • More spec-compliant CSS selector parsing
  • Well-maintained external library (~5KB minified)
  • Reduces custom parsing code complexity
  • Handles edge cases we may not have considered

Implementation Details

The new parser:

  1. Pre-processes selectors to handle escaped characters that parsel-js can't handle (e.g., \:, \#, \&, \[, \])
  2. Uses parsel-js to parse the selector into an AST
  3. Converts the AST to the existing SelectorItem format
  4. Restores escaped characters in the final output

Edge cases handled:

  • Escaped characters in class names, IDs, and attribute names
  • Empty attribute values ([attr=""])
  • Brackets in attribute values ([attr="bracket[]bracket"])
  • Space before combinators in :has() (e.g., :has( +h2))

Testing

Added two test cases for issue #1910:

  1. The exact selector from the issue report with a table structure
  2. A simpler test case demonstrating :not(:has(> .child:not([attr]))) parsing

Updated tests for stricter CSS compliance:

  • Invalid selectors (.before:, .before&after, button:not([type]) now throw errors instead of returning empty results
  • Fixed test for escaped colon in attribute selector to use correct attribute name

Changes

  • packages/happy-dom/package.json: Added parsel-js dependency
  • packages/happy-dom/src/query-selector/SelectorParser.ts: Rewrote to use parsel-js for parsing
  • packages/happy-dom/test/query-selector/QuerySelector.test.ts: Added tests for issue Error when parsing complex CSS has expression #1910 and updated edge case tests

@TrevorBurnham
Copy link
Copy Markdown
Contributor Author

Looks like this also fixes #1912.

…l-js

Complex CSS selectors with deeply nested pseudo-classes like
:not(:has(> .child:not([attr]))) were being parsed incorrectly.

Replaced the custom regex-based parser with parsel-js, a battle-tested
CSS selector parser by Lea Verou (~5KB minified) that handles
arbitrarily nested parentheses correctly.

Added tests for the reported issue.
@TrevorBurnham
Copy link
Copy Markdown
Contributor Author

Also fixes #2034.

@capricorn86
Copy link
Copy Markdown
Owner

It seems like this has a noticeable performance impact. Also "parcel-js" has quite few downloads and hasn't been updated for some time.

I would prefer to fix the logic in Happy DOM if possible. Each library added needs to be maintained and adds some risk.

@TrevorBurnham
Copy link
Copy Markdown
Contributor Author

It seems like this has a noticeable performance impact…

I understand your concerns, but I suspect that any regex-based solution will always have edge-case bugs. I could imagine a hybrid solution where trivial selectors are parsed with regex and those with more complexity are handled by a more robust parser.

In the meantime, I've identified some low-hanging optimization fruit in parsel-js and sent over a PR: LeaVerou/parsel#85 Let's see where that goes.

@capricorn86
Copy link
Copy Markdown
Owner

@TrevorBurnham Thank you for your contribution! As mentioned I prefer to make a performant version in Happy DOM.

I created #2068 with a fix that uses 2-3 layers of RegExp to separate the groups, but still in an efficient way, which I released now. I've added unit tests for all issues and it also covers the tests in this PR.

@capricorn86
Copy link
Copy Markdown
Owner

I will close this PR now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Error when parsing complex CSS has expression

2 participants