-
Couldn't load subscription status.
- Fork 1.1k
fix: highlighting gets stuck on heavy sites #751
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughAdds caches, dialog-aware content analysis, bounded shadow-DOM descendant traversal, batched/cached child-XPath generation, improved structural/class logic, containment/XPath utilities, cache-aware helpers, cleanup of new caches, and a new public method Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant C as ClientSelectorGenerator
participant D as DOM
participant Cache as Caches (path/desc/meaningful/selector)
Note over C: generateSelector(entry, options)
C->>Cache: check selectorCache
alt cache hit
Cache-->>C: return selectors
else
alt listMode && empty listSelector
C->>D: findAllDialogElements()
alt dialogs with content
C->>D: getElementsFromDialogs() / traverse shadow
else
C->>D: use page root
end
end
C->>D: getAllDescendantsIncludingShadow()
C->>Cache: consult meaningfulCache during traversal
D-->>C: bounded meaningful descendants
C->>C: generateOptimizedChildXPaths (batched, deduped)
C->>Cache: update selectorCache
alt paginationMode
C->>C: chain multiple candidate selectors
end
C-->>C: return selector(s)
end
sequenceDiagram
autonumber
participant C as ClientSelectorGenerator
participant Cache as path/desc/meaningful/selector
participant Util as Utilities
Note over C: Cached helper lookups
C->>Cache: getOptimizedStructuralPath(node)
alt miss
C->>Util: compute structural XPath
Util-->>C: path or null
C->>Cache: pathCache.set(node, path|null)
end
C->>Cache: descendantsCache.get(root)
alt miss
C->>Util: BFS collect descendants (depth/limit, shadow-aware)
C->>Cache: descendantsCache.set(root, results)
end
C->>Cache: meaningfulCache.get(node)
alt miss
C->>Util: isMeaningfulElement (fast-paths & cached)
C->>Cache: meaningfulCache.set(node, bool)
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
Tip 🔌 Remote MCP (Model Context Protocol) integration is now available!Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats. ✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
src/helpers/clientSelectorGenerator.ts (3)
2721-2727: XPath attribute escaping is incorrect; backslash does not escape quotes in XPath
replace(/'/g, "\\'")produces invalid XPath when attribute values contain single quotes. Use proper literal escaping (single, double, or concat). Apply:- const meaningfulAttrs = ["role", "type", "name", "src", "aria-label"]; + const meaningfulAttrs = ["role", "type", "name", "src", "aria-label"]; for (const attrName of meaningfulAttrs) { if (element.hasAttribute(attrName)) { - const value = element.getAttribute(attrName)!.replace(/'/g, "\\'"); - return `${tagName}[@${attrName}='${value}']`; + const value = this.escapeXPathLiteral(element.getAttribute(attrName)!); + return `${tagName}[@${attrName}=${value}]`; } }Add this helper once to the class:
// Safe XPath string literal builder private escapeXPathLiteral(value: string): string { if (!value.includes("'")) return `'${value}'`; if (!value.includes('"')) return `"${value}"`; // contains both ' and " → concat('foo',"'",'bar',"'",'baz') return "concat('" + value.split("'").join(`',"'",'`) + "')"; }This fix is necessary to avoid broken selectors on attributes like names with apostrophes.
3827-3831: Same XPath quoting issue here in group selectorWhen pushing common attribute predicates, quote using the same
escapeXPathLiteralto avoid malformed XPath:- for (const [attr, value] of Object.entries(commonAttributes)) { - predicates.push(`@${attr}='${value}'`); - } + for (const [attr, value] of Object.entries(commonAttributes)) { + predicates.push(`@${attr}=${this.escapeXPathLiteral(value)}`); + }Without this, modal or list containers with quotes in attribute values will break matching.
2472-2476: Logical bug: overlay-controls exclusion check always evaluates true
!hoveredElement.closest("#overlay-controls") != nullfirst negates the element, producing a boolean, then compares that boolean tonull, which is always true. This causes the overlay guard to be bypassed.Fix:
- if ( - hoveredElement != null && - !hoveredElement.closest("#overlay-controls") != null - ) { + if ( + hoveredElement != null && + hoveredElement.closest("#overlay-controls") == null + ) {This prevents selecting the highlighter’s own controls, a common cause of “stuck” or incorrect highlighting.
🧹 Nitpick comments (9)
src/helpers/clientSelectorGenerator.ts (9)
112-116: Solid caching additions; consider document-scoped invalidation to avoid stale entriesThe new WeakMap/Map caches are appropriate for element- and string-keyed memoization. One caveat:
meaningfulCacheanddescendantsCachepersist across DOM mutations within the same document, which can drift when pages dynamically change (SPA updates). Consider clearing these when:
analyzeElementGroupsdetectsiframeDoc !== lastAnalyzedDocument, and- before heavy list operations (e.g., in
getChildSelectors) if the DOM likely changed.This reduces the risk of stale paths/descendants causing incorrect highlights on long-lived pages.
405-419: Dialog-aware list analysis: good direction; tighten “dialog” detection and cost boundsSwitching analysis to dialog content prevents background-page noise. Improvements to reduce false positives and costs on heavy pages:
- Filter
dialogElementsby visibility and prominence: requireposition ∈ {fixed, absolute}and a minimum area or z-index threshold before scanning children.- Prefer native affordances first:
[role="dialog"],<dialog open>,[aria-modal="true"].- Early-cap
dialogContentElementstraversal (e.g., limit to top 2-3 dialogs by area).This keeps the scan bounded on modal-heavy apps.
540-552: Meaningfulness cache is fine; add fast visibility/hidden checks to cut noise and work
isMeaningfulElementreturns true for many nodes (text/href/src/custom elements) without checking visibility/hidden state. BecausegetAllDescendantsIncludingShadowdoesn’t filter for visibility, this inflates candidate sets on heavy pages and can slow highlighting.Add inexpensive guards:
- Skip if
element.hidden,aria-hidden="true".- Skip if
display:noneorvisibility:hidden.- Skip if
getBoundingClientRect()has zero area.Apply before the current fast-paths. Example change:
private isMeaningfulElement(element: HTMLElement): boolean { const tagName = element.tagName.toLowerCase(); - + // Cheap visibility checks to avoid hidden/offscreen noise + if (element.hasAttribute('hidden') || element.getAttribute('aria-hidden') === 'true') return false; + const style = element.ownerDocument?.defaultView?.getComputedStyle(element); + if (style && (style.display === 'none' || style.visibility === 'hidden')) return false; + const rect = element.getBoundingClientRect?.(); + if (rect && (rect.width === 0 || rect.height === 0)) return false; + // Fast path for common meaningful elements if (["a", "img", "input", "button", "select"].includes(tagName)) { return true; }This reduces traversal sizes and helps prevent the highlighter from “getting stuck.”
Also applies to: 556-591
2498-2546: Selector caching per (parentSelector, URL): good; guard for SPA mutationsThe
selectorCachekeyed byparentSelector+document.location.hrefis a sensible win. On SPAs where URL doesn’t change, consider:
- Storing a content version (e.g.,
document.readyState+ a monotonic counter you bump onmutationbatches) in the key, or- Invalidating this entry on a lightweight MutationObserver tick.
Not mandatory for this PR but will prevent stale child selectors after client-side re-renders.
2553-2628: Bounded BFS is the right call; ignore non-content tags and respect ARIA hiddenTo further reduce queue churn on heavy DOMs, skip obvious non-content nodes during expansion:
- const children = element.children; + const children = element.children; const childLimit = Math.min(children.length, 30); for (let i = 0; i < childLimit; i++) { const child = children[i] as HTMLElement; + const tn = child.tagName; + if (tn === 'SCRIPT' || tn === 'STYLE' || tn === 'TEMPLATE' || child.getAttribute('aria-hidden') === 'true') { + continue; + } if (!visited.has(child)) {This small filter removes a lot of dead ends cheaply.
2930-2933: Class cache key may collide across documents
cacheKeyonly includes tag + classes + “other list count”. On multi-frame/multi-doc contexts this can cross-contaminate. Includingthis.lastAnalyzedDocument?.URL(or a short doc hash) in the key avoids edge collisions.- const cacheKey = `${targetElement.tagName}_${targetClasses.join(',')}_${otherListElements.length}`; + const docId = targetElement.ownerDocument?.URL || 'doc'; + const cacheKey = `${docId}__${targetElement.tagName}_${targetClasses.join(',')}_${otherListElements.length}`;
4048-4150: Dialog helpers: useful, but scan cost and false positives need guardrails
findAllDialogElementsqueries all elements and then matches by substrings like “overlay”, which can hit layout wrappers. Add:
- A visibility check (area > 0).
- A positioning/z-index heuristic, similar to your overlay handling elsewhere.
- Prefer
[aria-modal="true"]in the first pass.
findDialogElement/findDeepestInDialogdefined here duplicate similar logic insidegetSelectors(Lines 1956-1986). Consolidate to one implementation to avoid divergence.
4152-4224: Shadow content extraction in dialogs: nice; bound recursion
getElementsFromShadowRootrecurses without an explicit depth cap. Given you already cap elsewhere, consider a shallow cap (e.g., 4) to avoid pathological trees.- private getElementsFromShadowRoot(shadowRoot: ShadowRoot): HTMLElement[] { + private getElementsFromShadowRoot(shadowRoot: ShadowRoot, depth: number = 0): HTMLElement[] { + if (depth > 4) return []; const elements: HTMLElement[] = []; ... - shadowChildren.forEach((element) => { + shadowChildren.forEach((element) => { elements.push(element); ... - if (element.shadowRoot) { - const nestedShadowElements = this.getElementsFromShadowRoot(element.shadowRoot); + if (element.shadowRoot) { + const nestedShadowElements = this.getElementsFromShadowRoot(element.shadowRoot, depth + 1); elements.push(...nestedShadowElements); }This avoids deep recursion on component libraries with layered shadows.
3369-3386: Minor: prefer unique parent forlistSelector; otherwise bail earlyYou already return
nullwhencontainingParentis falsy. IfevaluateXPath(listSelector)yields multiple parents, consider warning once (or preferring the one that actually containstargetElement) to avoid subtle mismatches:- const containingParent = parentElements[0]; + const containingParent = parentElements.find(p => this.elementContains(p, targetElement)) ?? parentElements[0];This improves robustness when a page has repeated list containers.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
src/helpers/clientSelectorGenerator.ts(11 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
src/helpers/clientSelectorGenerator.ts (1)
maxun-core/src/browserSide/scraper.js (10)
allElements(1064-1064)tagName(450-450)parentElement(914-914)selector(28-28)MAX_DEPTH(955-955)pathParts(569-573)depth(956-956)current(1010-1010)doc(225-225)shadowRoot(992-992)
🔇 Additional comments (5)
src/helpers/clientSelectorGenerator.ts (5)
2642-2670: Batching and max-selector cutoff are sensibleProcessing descendants in batches of 25 and stopping at 250 selectors is a good defensive bound for responsiveness.
2848-2913: Path caching with null sentinel: good; add small guard for root containment failuresCaching null for non-contained targets is great to avoid rework. Consider also clearing
pathCachewhenlistSelectorchanges or whengetChildSelectorsrebuilds (you already resetpathCachethere—nice). LGTM overall.
4239-4243: Comprehensive cleanup is appreciatedClearing all newly added caches in
cleanuplooks complete and correct.
3586-3598: Shadow DOM detection heuristic is robust across realmsUsing
constructor.nameor presence ofhost/modehandles cross-realm ShadowRoots. LGTM.
3701-3755: Matching-mode flow is reasonable; pagination chaining: verify consumer supportThe deterministic chaining under pagination mode returns a comma-separated list of CSS-like selectors. Confirm downstream consumers accept comma-chained selectors (and the mixed
:>>/>>separators embedded from frame/shadow selectors) without splitting issues.If needed, I can scan usages of
generateSelector(...)to confirm the expected format and update tests accordingly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
src/helpers/clientSelectorGenerator.ts (3)
2721-2726: XPath attribute quoting is incorrect (XPath doesn’t use backslash escapes)Replacing ' with ' inside an XPath string doesn’t work. Use proper quoting or concat(). This will otherwise break when attribute values contain single quotes.
Apply this diff to use a safe literal builder:
@@ - const meaningfulAttrs = ["role", "type", "name", "src", "aria-label"]; + const meaningfulAttrs = ["role", "type", "name", "src", "aria-label"]; for (const attrName of meaningfulAttrs) { if (element.hasAttribute(attrName)) { - const value = element.getAttribute(attrName)!.replace(/'/g, "\\'"); - return `${tagName}[@${attrName}='${value}']`; + const value = element.getAttribute(attrName)!; + return `${tagName}[@${attrName}=${this.toXPathLiteral(value)}]`; } } @@ - if ( + if ( attr.name.startsWith("data-") && attr.name !== "data-testid" && attr.name !== "data-mx-id" && attr.value ) { - return `${tagName}[@${attr.name}='${attr.value}']`; + return `${tagName}[@${attr.name}=${this.toXPathLiteral(attr.value)}]`; } }Add this helper (outside the selected range) to the class:
// Safe XPath string literal builder (handles both quote types) private toXPathLiteral(value: string): string { if (!value.includes("'")) return `'${value}'`; if (!value.includes('"')) return `"${value}"`; const parts = value.split("'"); const args: string[] = []; for (let i = 0; i < parts.length; i++) { if (parts[i]) args.push(`'${parts[i]}'`); if (i < parts.length - 1) args.push(`"'"`); } return `concat(${args.join(",")})`; }Also applies to: 2740-2749
2473-2476: Bug: overlay-controls guard is always true due to negation precedenceThe condition
!hoveredElement.closest("#overlay-controls") != nullalways evaluates to true when hoveredElement exists. You likely intended to skip elements within the overlay.Apply this diff:
@@ - if ( - hoveredElement != null && - !hoveredElement.closest("#overlay-controls") != null - ) { + if ( + hoveredElement != null && + hoveredElement.closest("#overlay-controls") == null + ) {
1125-1130: Operator precedence bug in INPUT date/time checkThe current condition evaluates
(type === "date")even when tagName !== "INPUT". Group both type checks under the INPUT guard.Apply this diff:
@@ - } else if ( - (targetElement?.tagName === "INPUT" && - (targetElement as HTMLInputElement).type === "time") || - (targetElement as HTMLInputElement).type === "date" - ) { + } else if ( + targetElement?.tagName === "INPUT" && + ( + (targetElement as HTMLInputElement).type === "time" || + (targetElement as HTMLInputElement).type === "date" + ) + ) {
🧹 Nitpick comments (8)
src/helpers/clientSelectorGenerator.ts (8)
540-552: Use the cached meaningfulness in child traversalgetMeaningfulChildren still calls isMeaningfulElement; switch to the cached variant to avoid repeated expensive checks.
Apply this diff:
@@ - if (this.isMeaningfulElement(htmlChild)) { + if (this.isMeaningfulElementCached(htmlChild)) { meaningfulChildren.push(htmlChild); } else { // If not meaningful itself, check its children traverse(htmlChild, depth + 1); } @@ - if (this.isMeaningfulElement(htmlShadowChild)) { + if (this.isMeaningfulElementCached(htmlShadowChild)) { meaningfulChildren.push(htmlShadowChild); } else { traverse(htmlShadowChild, depth + 1); }Also applies to: 506-538
2498-2505: Selector caching: add a mutation-aware version to avoid stale resultsKeying selectorCache by URL + parentSelector helps, but long-lived pages mutate frequently (SPA). Without invalidation, we can return stale selectors and “stuck” highlights. Consider a lightweight MutationObserver to bump a document-version counter included in the cache key.
If helpful, I can provide a minimal mutation tracker you can toggle on analyzeElementGroups() and clear in cleanup().
Also applies to: 2543-2546
2553-2628: Bounded BFS is good; minor guardrails for hot pathsThe traversal is careful with depth and size caps. Two small improvements:
- Early-return if currentDepth is small and element is not displayable (e.g., visibility: hidden) to cut noise on heavy sites.
- Prefer iterating via static arrays (Array.from) once per node to avoid live HTMLCollection surprises.
I can draft a micro-refactor if you want to test it behind a feature flag.
2919-2990: Class similarity scan can still be heavy across many listsThis function repeatedly calls getAllDescendantsIncludingShadow(listEl) in a loop (again inside every targetClass check later). It benefits from the descendants cache, but you can still avoid repeated scans by precomputing, per list element and tag, the normalized class sets once.
I can sketch a small precomputation map {listEl-> {TAG -> [normalizedClassSets]}} to de-duplicate work.
3098-3115: Nit: unused local variablegridKey is computed but not used in getNearbySelectorCandidates().
Apply this diff:
@@ - const gridKey = this.getElementGridKey(element); const rect = element.getBoundingClientRect();
3468-3474: Fallback XPath → CSS translation has semantic mismatch for positionsTranslating
//tag[n]totag:nth-child(n)isn’t equivalent;nth-childindexes all children, not only siblings with the same tag. If this path is hit, results can be wrong. Consider skipping this translation or limiting it to cases where siblings are homogeneous.I can help guard this with a quick DOM check before returning the CSS path.
3042-3081: Minor: console.time/timeEnd in productionThe precomputation timer is useful during development but can be noisy in production consoles. Consider gating with a debug flag.
I can wire a simple
this.performanceConfig.debugLoggingflag if desired.
4048-4076: Unused class-level dialog helpers – consider consolidating or removingIt looks like you’ve got two sets of dialog-related helpers in
src/helpers/clientSelectorGenerator.ts:
An inline arrow-function helper at lines 1956–1986:
const findDeepestInDialog = (elements: HTMLElement[]): HTMLElement | null => { … } // … const deepestInDialog = findDeepestInDialog(dialogElements);which is actually invoked at line 1987.
Two private class methods that are never referenced anywhere else:
private findDialogElement(elements: HTMLElement[]): HTMLElement | null { … }(lines 4051–4076)private findDeepestInDialog(dialogElements: HTMLElement[], dialogElement: HTMLElement): HTMLElement | null { … }(lines 4081–4113)This duplication risks them drifting out of sync. You have two options:
- Reuse the class methods throughout (replace the inline arrow function and its call site with calls to
this.findDialogElement/this.findDeepestInDialog).- Remove the unused class methods entirely and keep the single inline implementation.
Either approach will eliminate dead code and ensure there’s a single source of truth for dialog-element lookup.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
src/helpers/clientSelectorGenerator.ts(11 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
src/helpers/clientSelectorGenerator.ts (1)
maxun-core/src/browserSide/scraper.js (7)
tagName(450-450)parentElement(914-914)selector(28-28)MAX_DEPTH(955-955)pathParts(569-573)depth(956-956)shadowRoot(992-992)
🔇 Additional comments (5)
src/helpers/clientSelectorGenerator.ts (5)
112-116: Good addition: targeted caches to cut recomputation on heavy pagesThe new WeakMaps/Map look appropriate for DOM-bound caching, and you’ve remembered to clear them in cleanup(). Nice.
2642-2670: Batching and cap at 250 selectors look reasonableThe batched loop and early breaks prevent runaway work. Good call.
2848-2851: Path caching with negative entries is smartCaching null paths to avoid recomputation is a good optimization. You already reset pathCache on new getChildSelectors() runs—nice touch.
Also applies to: 2904-2913
3669-3680: Nice: precompute mappings when cachedChildSelectors are providedPrecomputing here lines up with your cache-aware matching path and reduces jank during hover.
4223-4227: Good housekeeping: clearing new caches in cleanup()This avoids memory leaks and stale reads across sessions.
| let allElements = this.getAllVisibleElementsWithShadow(iframeDoc); | ||
|
|
||
| if (this.getList === true && this.listSelector === "") { | ||
| const dialogElements = this.findAllDialogElements(iframeDoc); | ||
|
|
||
| if (dialogElements.length > 0) { | ||
| // Check if dialogs contain significant content worth analyzing | ||
| const dialogContentElements = this.getElementsFromDialogs(dialogElements); | ||
|
|
||
| // Only switch to dialog-focused analysis if dialogs have substantial content | ||
| if (dialogContentElements.length > 5) { | ||
| allElements = dialogContentElements; | ||
| } | ||
| } | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
List-mode dialog pivot is a solid idea, but current dialog discovery is too narrow
Switching analysis to dialog content when listSelector is empty will help avoid scanning the whole page. However, your finder only picks role="dialog"/ (see findAllDialogElements), missing common “modal/popup/overlay” patterns. Prefer broadening the discovery to avoid false negatives that regress into “heavy-site” scans.
Apply this diff to make dialog detection more robust:
@@
- private findAllDialogElements(doc: Document): HTMLElement[] {
- const dialogElements: HTMLElement[] = [];
- const allElements = Array.from(doc.querySelectorAll("*")) as HTMLElement[];
-
- for (const element of allElements) {
- if (element.getAttribute("role") === "dialog") {
- dialogElements.push(element);
- continue;
- }
-
- if (element.tagName.toLowerCase() === "dialog") {
- dialogElements.push(element);
- continue;
- }
- }
-
- return dialogElements;
- }
+ private findAllDialogElements(doc: Document): HTMLElement[] {
+ const dialogElements: HTMLElement[] = [];
+ const all = Array.from(doc.querySelectorAll("*")) as HTMLElement[];
+
+ for (const el of all) {
+ const tag = el.tagName.toLowerCase();
+ const role = el.getAttribute("role")?.toLowerCase();
+ const id = (el.id || "").toLowerCase();
+ const cls = el.className?.toString().toLowerCase() || "";
+
+ if (role === "dialog" || tag === "dialog") {
+ dialogElements.push(el);
+ continue;
+ }
+ // Heuristics for common modal implementations
+ if (
+ cls.includes("modal") || cls.includes("dialog") ||
+ cls.includes("popup") || cls.includes("overlay") ||
+ id.includes("modal") || id.includes("dialog") || id.includes("popup")
+ ) {
+ dialogElements.push(el);
+ }
+ }
+ return dialogElements;
+ }📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| let allElements = this.getAllVisibleElementsWithShadow(iframeDoc); | |
| if (this.getList === true && this.listSelector === "") { | |
| const dialogElements = this.findAllDialogElements(iframeDoc); | |
| if (dialogElements.length > 0) { | |
| // Check if dialogs contain significant content worth analyzing | |
| const dialogContentElements = this.getElementsFromDialogs(dialogElements); | |
| // Only switch to dialog-focused analysis if dialogs have substantial content | |
| if (dialogContentElements.length > 5) { | |
| allElements = dialogContentElements; | |
| } | |
| } | |
| } | |
| private findAllDialogElements(doc: Document): HTMLElement[] { | |
| const dialogElements: HTMLElement[] = []; | |
| const all = Array.from(doc.querySelectorAll("*")) as HTMLElement[]; | |
| for (const el of all) { | |
| const tag = el.tagName.toLowerCase(); | |
| const role = el.getAttribute("role")?.toLowerCase(); | |
| const id = (el.id || "").toLowerCase(); | |
| const cls = el.className?.toString().toLowerCase() || ""; | |
| // Always include native dialogs and ARIA dialogs | |
| if (role === "dialog" || tag === "dialog") { | |
| dialogElements.push(el); | |
| continue; | |
| } | |
| // Heuristics for common modal/popup/overlay patterns | |
| if ( | |
| cls.includes("modal") || cls.includes("dialog") || | |
| cls.includes("popup") || cls.includes("overlay") || | |
| id.includes("modal") || id.includes("dialog") || | |
| id.includes("popup") | |
| ) { | |
| dialogElements.push(el); | |
| } | |
| } | |
| return dialogElements; | |
| } |
|
|
||
| // Fast path for common meaningful elements | ||
| if (["a", "img", "input", "button", "select"].includes(tagName)) { | ||
| return true; | ||
| } | ||
|
|
||
| const text = (element.textContent || "").trim(); | ||
| const hasHref = element.hasAttribute("href"); | ||
| const hasSrc = element.hasAttribute("src"); | ||
|
|
||
| // Quick checks first | ||
| if (text.length > 0 || hasHref || hasSrc) { | ||
| return true; | ||
| } | ||
|
|
||
| const isCustomElement = tagName.includes("-"); | ||
|
|
||
| // For custom elements, be more lenient about what's considered meaningful | ||
| if (isCustomElement) { | ||
| const hasChildren = element.children.length > 0; | ||
| const hasSignificantAttributes = Array.from(element.attributes).some( | ||
| (attr) => !["class", "style", "id"].includes(attr.name.toLowerCase()) | ||
| ); | ||
|
|
||
| return ( | ||
| text.length > 0 || | ||
| hasHref || | ||
| hasSrc || | ||
| hasChildren || | ||
| hasSignificantAttributes || | ||
| element.hasAttribute("role") || | ||
| element.hasAttribute("aria-label") | ||
| ); | ||
| } | ||
|
|
||
| return ( | ||
| text.length > 0 || | ||
| hasHref || | ||
| hasSrc || | ||
| ["a", "img", "input", "button", "select"].includes(tagName) | ||
| ); | ||
| return false; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hidden inputs and aria-hidden should not be treated as meaningful
The fast-path currently returns true for all inputs. That will include hidden inputs and non-interactive controls. Also consider honoring aria-hidden on the element itself.
Apply this diff:
@@
- // Fast path for common meaningful elements
- if (["a", "img", "input", "button", "select"].includes(tagName)) {
- return true;
- }
+ // Fast path for common meaningful elements with guards
+ if (tagName === "input") {
+ const type = (element as HTMLInputElement).type?.toLowerCase();
+ if (type === "hidden") return false;
+ return true;
+ }
+ if (["a", "img", "button", "select", "textarea"].includes(tagName)) {
+ return true;
+ }
@@
- // Quick checks first
- if (text.length > 0 || hasHref || hasSrc) {
+ // Quick checks first
+ if (
+ (element.getAttribute("aria-hidden") !== "true") &&
+ (text.length > 0 || hasHref || hasSrc)
+ ) {
return true;
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // Fast path for common meaningful elements | |
| if (["a", "img", "input", "button", "select"].includes(tagName)) { | |
| return true; | |
| } | |
| const text = (element.textContent || "").trim(); | |
| const hasHref = element.hasAttribute("href"); | |
| const hasSrc = element.hasAttribute("src"); | |
| // Quick checks first | |
| if (text.length > 0 || hasHref || hasSrc) { | |
| return true; | |
| } | |
| const isCustomElement = tagName.includes("-"); | |
| // For custom elements, be more lenient about what's considered meaningful | |
| if (isCustomElement) { | |
| const hasChildren = element.children.length > 0; | |
| const hasSignificantAttributes = Array.from(element.attributes).some( | |
| (attr) => !["class", "style", "id"].includes(attr.name.toLowerCase()) | |
| ); | |
| return ( | |
| text.length > 0 || | |
| hasHref || | |
| hasSrc || | |
| hasChildren || | |
| hasSignificantAttributes || | |
| element.hasAttribute("role") || | |
| element.hasAttribute("aria-label") | |
| ); | |
| } | |
| return ( | |
| text.length > 0 || | |
| hasHref || | |
| hasSrc || | |
| ["a", "img", "input", "button", "select"].includes(tagName) | |
| ); | |
| return false; | |
| } | |
| // Fast path for common meaningful elements with guards | |
| if (tagName === "input") { | |
| const type = (element as HTMLInputElement).type?.toLowerCase(); | |
| if (type === "hidden") return false; | |
| return true; | |
| } | |
| if (["a", "img", "button", "select", "textarea"].includes(tagName)) { | |
| return true; | |
| } | |
| const text = (element.textContent || "").trim(); | |
| const hasHref = element.hasAttribute("href"); | |
| const hasSrc = element.hasAttribute("src"); | |
| // Quick checks first | |
| if ( | |
| (element.getAttribute("aria-hidden") !== "true") && | |
| (text.length > 0 || hasHref || hasSrc) | |
| ) { | |
| return true; | |
| } | |
| const isCustomElement = tagName.includes("-"); | |
| // For custom elements, be more lenient about what's considered meaningful | |
| if (isCustomElement) { | |
| const hasChildren = element.children.length > 0; | |
| const hasSignificantAttributes = Array.from(element.attributes).some( | |
| (attr) => !["class", "style", "id"].includes(attr.name.toLowerCase()) | |
| ); | |
| return ( | |
| hasChildren || | |
| hasSignificantAttributes || | |
| element.hasAttribute("role") || | |
| element.hasAttribute("aria-label") | |
| ); | |
| } | |
| return false; |
🤖 Prompt for AI Agents
In src/helpers/clientSelectorGenerator.ts around lines 558 to 591, the fast-path
incorrectly treats all <input> elements (and any element with aria-hidden not
considered) as meaningful; update the checks so that elements with
aria-hidden="true" are always treated as not meaningful, and only treat inputs
as meaningful when they are not type="hidden" and are interactive (e.g., not
disabled). Concretely: add an early return false if
element.hasAttribute('aria-hidden') && element.getAttribute('aria-hidden') ===
'true'; change the fast-path to special-case "input" by reading its type
attribute and only return true for input when type !== 'hidden' (and ideally
ensure it is not disabled); keep the existing fast-path behavior for a, img,
button, select but guard them with the aria-hidden check too.
What this PR does?
Summary by CodeRabbit