Skip to content

Conversation

@RohitR311
Copy link
Contributor

@RohitR311 RohitR311 commented Aug 23, 2025

What this PR does?

  1. Fixes highlighting getting stuck issue: Bug [Performance]: Slow Auto List Capture for Heavy Sites #744
  2. Fixes capture actions for modal components: Bug: Modal capture list not highlighting #738

Summary by CodeRabbit

  • New Features
    • Improved dialog-aware content extraction, including nested and shadow DOM content.
    • Enhanced pagination mode chains multiple candidate selectors for more reliable results.
  • Performance
    • Faster processing via caching, batched child-selector generation, and bounded DOM traversal.
  • Bug Fixes
    • More robust containment checks across light/shadow DOM and safer selector/XPath handling.
    • Better handling of empty lists and edge cases in class-similarity detection.
  • Stability
    • Reduced recomputation and clearer cleanup to prevent stale state.
  • API
    • Added a public precompute method to allow precomputing child-selector mappings.

@RohitR311 RohitR311 added the Type: Bug Something isn't working label Aug 23, 2025
@coderabbitai
Copy link

coderabbitai bot commented Aug 23, 2025

Walkthrough

Adds caches, dialog-aware content analysis, bounded shadow-DOM descendant traversal, batched/cached child-XPath generation, improved structural/class logic, containment/XPath utilities, cache-aware helpers, cleanup of new caches, and a new public method precomputeChildSelectorMappings.

Changes

Cohort / File(s) Summary
Client selector enhancements
src/helpers/clientSelectorGenerator.ts
Added internal caches (path/descendants/meaningful/selector); dialog detection and content extraction helpers; cached meaningful-element checks with fast-paths; bounded BFS descendant collection including shadow DOM; batched/cached child XPath generation; structural-path caching; improved common-class computation; elementContains/XPath utilities; deterministic pagination-mode selector chaining; cleanup clears new caches; added precomputeChildSelectorMappings(childSelectors: string[], document: Document) public method.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant C as ClientSelectorGenerator
  participant D as DOM
  participant Cache as Caches (path/desc/meaningful/selector)

  Note over C: generateSelector(entry, options)

  C->>Cache: check selectorCache
  alt cache hit
    Cache-->>C: return selectors
  else
    alt listMode && empty listSelector
      C->>D: findAllDialogElements()
      alt dialogs with content
        C->>D: getElementsFromDialogs() / traverse shadow
      else
        C->>D: use page root
      end
    end

    C->>D: getAllDescendantsIncludingShadow()
    C->>Cache: consult meaningfulCache during traversal
    D-->>C: bounded meaningful descendants

    C->>C: generateOptimizedChildXPaths (batched, deduped)
    C->>Cache: update selectorCache

    alt paginationMode
      C->>C: chain multiple candidate selectors
    end
    C-->>C: return selector(s)
  end
Loading
sequenceDiagram
  autonumber
  participant C as ClientSelectorGenerator
  participant Cache as path/desc/meaningful/selector
  participant Util as Utilities

  Note over C: Cached helper lookups

  C->>Cache: getOptimizedStructuralPath(node)
  alt miss
    C->>Util: compute structural XPath
    Util-->>C: path or null
    C->>Cache: pathCache.set(node, path|null)
  end

  C->>Cache: descendantsCache.get(root)
  alt miss
    C->>Util: BFS collect descendants (depth/limit, shadow-aware)
    C->>Cache: descendantsCache.set(root, results)
  end

  C->>Cache: meaningfulCache.get(node)
  alt miss
    C->>Util: isMeaningfulElement (fast-paths & cached)
    C->>Cache: meaningfulCache.set(node, bool)
  end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested labels

Scope: Recorder

Suggested reviewers

  • amhsirak

Poem

I hop through shadowed DOMs by night,
I cache each path and batched delight.
Dialogs I probe, descendants I find,
Precomputed maps to ease your mind.
A rabbit's tweaks — selectors lined. 🐇✨

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch reclag-fix

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@RohitR311 RohitR311 requested a review from amhsirak August 23, 2025 13:19
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
src/helpers/clientSelectorGenerator.ts (3)

2721-2727: XPath attribute escaping is incorrect; backslash does not escape quotes in XPath

replace(/'/g, "\\'") produces invalid XPath when attribute values contain single quotes. Use proper literal escaping (single, double, or concat). Apply:

-      const meaningfulAttrs = ["role", "type", "name", "src", "aria-label"];
+      const meaningfulAttrs = ["role", "type", "name", "src", "aria-label"];
       for (const attrName of meaningfulAttrs) {
         if (element.hasAttribute(attrName)) {
-          const value = element.getAttribute(attrName)!.replace(/'/g, "\\'");
-          return `${tagName}[@${attrName}='${value}']`;
+          const value = this.escapeXPathLiteral(element.getAttribute(attrName)!);
+          return `${tagName}[@${attrName}=${value}]`;
         }
       }

Add this helper once to the class:

// Safe XPath string literal builder
private escapeXPathLiteral(value: string): string {
  if (!value.includes("'")) return `'${value}'`;
  if (!value.includes('"')) return `"${value}"`;
  // contains both ' and " → concat('foo',"'",'bar',"'",'baz')
  return "concat('" + value.split("'").join(`',"'",'`) + "')";
}

This fix is necessary to avoid broken selectors on attributes like names with apostrophes.


3827-3831: Same XPath quoting issue here in group selector

When pushing common attribute predicates, quote using the same escapeXPathLiteral to avoid malformed XPath:

-  for (const [attr, value] of Object.entries(commonAttributes)) {
-    predicates.push(`@${attr}='${value}'`);
-  }
+  for (const [attr, value] of Object.entries(commonAttributes)) {
+    predicates.push(`@${attr}=${this.escapeXPathLiteral(value)}`);
+  }

Without this, modal or list containers with quotes in attribute values will break matching.


2472-2476: Logical bug: overlay-controls exclusion check always evaluates true

!hoveredElement.closest("#overlay-controls") != null first negates the element, producing a boolean, then compares that boolean to null, which is always true. This causes the overlay guard to be bypassed.

Fix:

-      if (
-        hoveredElement != null &&
-        !hoveredElement.closest("#overlay-controls") != null
-      ) {
+      if (
+        hoveredElement != null &&
+        hoveredElement.closest("#overlay-controls") == null
+      ) {

This prevents selecting the highlighter’s own controls, a common cause of “stuck” or incorrect highlighting.

🧹 Nitpick comments (9)
src/helpers/clientSelectorGenerator.ts (9)

112-116: Solid caching additions; consider document-scoped invalidation to avoid stale entries

The new WeakMap/Map caches are appropriate for element- and string-keyed memoization. One caveat: meaningfulCache and descendantsCache persist across DOM mutations within the same document, which can drift when pages dynamically change (SPA updates). Consider clearing these when:

  • analyzeElementGroups detects iframeDoc !== lastAnalyzedDocument, and
  • before heavy list operations (e.g., in getChildSelectors) if the DOM likely changed.

This reduces the risk of stale paths/descendants causing incorrect highlights on long-lived pages.


405-419: Dialog-aware list analysis: good direction; tighten “dialog” detection and cost bounds

Switching analysis to dialog content prevents background-page noise. Improvements to reduce false positives and costs on heavy pages:

  • Filter dialogElements by visibility and prominence: require position ∈ {fixed, absolute} and a minimum area or z-index threshold before scanning children.
  • Prefer native affordances first: [role="dialog"], <dialog open>, [aria-modal="true"].
  • Early-cap dialogContentElements traversal (e.g., limit to top 2-3 dialogs by area).

This keeps the scan bounded on modal-heavy apps.


540-552: Meaningfulness cache is fine; add fast visibility/hidden checks to cut noise and work

isMeaningfulElement returns true for many nodes (text/href/src/custom elements) without checking visibility/hidden state. Because getAllDescendantsIncludingShadow doesn’t filter for visibility, this inflates candidate sets on heavy pages and can slow highlighting.

Add inexpensive guards:

  • Skip if element.hidden, aria-hidden="true".
  • Skip if display:none or visibility:hidden.
  • Skip if getBoundingClientRect() has zero area.

Apply before the current fast-paths. Example change:

   private isMeaningfulElement(element: HTMLElement): boolean {
     const tagName = element.tagName.toLowerCase();
-    
+    // Cheap visibility checks to avoid hidden/offscreen noise
+    if (element.hasAttribute('hidden') || element.getAttribute('aria-hidden') === 'true') return false;
+    const style = element.ownerDocument?.defaultView?.getComputedStyle(element);
+    if (style && (style.display === 'none' || style.visibility === 'hidden')) return false;
+    const rect = element.getBoundingClientRect?.();
+    if (rect && (rect.width === 0 || rect.height === 0)) return false;
+
     // Fast path for common meaningful elements
     if (["a", "img", "input", "button", "select"].includes(tagName)) {
       return true;
     }

This reduces traversal sizes and helps prevent the highlighter from “getting stuck.”

Also applies to: 556-591


2498-2546: Selector caching per (parentSelector, URL): good; guard for SPA mutations

The selectorCache keyed by parentSelector + document.location.href is a sensible win. On SPAs where URL doesn’t change, consider:

  • Storing a content version (e.g., document.readyState + a monotonic counter you bump on mutation batches) in the key, or
  • Invalidating this entry on a lightweight MutationObserver tick.

Not mandatory for this PR but will prevent stale child selectors after client-side re-renders.


2553-2628: Bounded BFS is the right call; ignore non-content tags and respect ARIA hidden

To further reduce queue churn on heavy DOMs, skip obvious non-content nodes during expansion:

-      const children = element.children;
+      const children = element.children;
       const childLimit = Math.min(children.length, 30);
       for (let i = 0; i < childLimit; i++) {
         const child = children[i] as HTMLElement;
+        const tn = child.tagName;
+        if (tn === 'SCRIPT' || tn === 'STYLE' || tn === 'TEMPLATE' || child.getAttribute('aria-hidden') === 'true') {
+          continue;
+        }
         if (!visited.has(child)) {

This small filter removes a lot of dead ends cheaply.


2930-2933: Class cache key may collide across documents

cacheKey only includes tag + classes + “other list count”. On multi-frame/multi-doc contexts this can cross-contaminate. Including this.lastAnalyzedDocument?.URL (or a short doc hash) in the key avoids edge collisions.

-    const cacheKey = `${targetElement.tagName}_${targetClasses.join(',')}_${otherListElements.length}`;
+    const docId = targetElement.ownerDocument?.URL || 'doc';
+    const cacheKey = `${docId}__${targetElement.tagName}_${targetClasses.join(',')}_${otherListElements.length}`;

4048-4150: Dialog helpers: useful, but scan cost and false positives need guardrails

  • findAllDialogElements queries all elements and then matches by substrings like “overlay”, which can hit layout wrappers. Add:

    • A visibility check (area > 0).
    • A positioning/z-index heuristic, similar to your overlay handling elsewhere.
    • Prefer [aria-modal="true"] in the first pass.
  • findDialogElement/findDeepestInDialog defined here duplicate similar logic inside getSelectors (Lines 1956-1986). Consolidate to one implementation to avoid divergence.


4152-4224: Shadow content extraction in dialogs: nice; bound recursion

getElementsFromShadowRoot recurses without an explicit depth cap. Given you already cap elsewhere, consider a shallow cap (e.g., 4) to avoid pathological trees.

-  private getElementsFromShadowRoot(shadowRoot: ShadowRoot): HTMLElement[] {
+  private getElementsFromShadowRoot(shadowRoot: ShadowRoot, depth: number = 0): HTMLElement[] {
+    if (depth > 4) return [];
     const elements: HTMLElement[] = [];
...
-      shadowChildren.forEach((element) => {
+      shadowChildren.forEach((element) => {
         elements.push(element);
...
-        if (element.shadowRoot) {
-          const nestedShadowElements = this.getElementsFromShadowRoot(element.shadowRoot);
+        if (element.shadowRoot) {
+          const nestedShadowElements = this.getElementsFromShadowRoot(element.shadowRoot, depth + 1);
           elements.push(...nestedShadowElements);
         }

This avoids deep recursion on component libraries with layered shadows.


3369-3386: Minor: prefer unique parent for listSelector; otherwise bail early

You already return null when containingParent is falsy. If evaluateXPath(listSelector) yields multiple parents, consider warning once (or preferring the one that actually contains targetElement) to avoid subtle mismatches:

-  const containingParent = parentElements[0];
+  const containingParent = parentElements.find(p => this.elementContains(p, targetElement)) ?? parentElements[0];

This improves robustness when a page has repeated list containers.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between ca29967 and 96dea7f.

📒 Files selected for processing (1)
  • src/helpers/clientSelectorGenerator.ts (11 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
src/helpers/clientSelectorGenerator.ts (1)
maxun-core/src/browserSide/scraper.js (10)
  • allElements (1064-1064)
  • tagName (450-450)
  • parentElement (914-914)
  • selector (28-28)
  • MAX_DEPTH (955-955)
  • pathParts (569-573)
  • depth (956-956)
  • current (1010-1010)
  • doc (225-225)
  • shadowRoot (992-992)
🔇 Additional comments (5)
src/helpers/clientSelectorGenerator.ts (5)

2642-2670: Batching and max-selector cutoff are sensible

Processing descendants in batches of 25 and stopping at 250 selectors is a good defensive bound for responsiveness.


2848-2913: Path caching with null sentinel: good; add small guard for root containment failures

Caching null for non-contained targets is great to avoid rework. Consider also clearing pathCache when listSelector changes or when getChildSelectors rebuilds (you already reset pathCache there—nice). LGTM overall.


4239-4243: Comprehensive cleanup is appreciated

Clearing all newly added caches in cleanup looks complete and correct.


3586-3598: Shadow DOM detection heuristic is robust across realms

Using constructor.name or presence of host/mode handles cross-realm ShadowRoots. LGTM.


3701-3755: Matching-mode flow is reasonable; pagination chaining: verify consumer support

The deterministic chaining under pagination mode returns a comma-separated list of CSS-like selectors. Confirm downstream consumers accept comma-chained selectors (and the mixed :>>/>> separators embedded from frame/shadow selectors) without splitting issues.

If needed, I can scan usages of generateSelector(...) to confirm the expected format and update tests accordingly.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
src/helpers/clientSelectorGenerator.ts (3)

2721-2726: XPath attribute quoting is incorrect (XPath doesn’t use backslash escapes)

Replacing ' with ' inside an XPath string doesn’t work. Use proper quoting or concat(). This will otherwise break when attribute values contain single quotes.

Apply this diff to use a safe literal builder:

@@
-      const meaningfulAttrs = ["role", "type", "name", "src", "aria-label"];
+      const meaningfulAttrs = ["role", "type", "name", "src", "aria-label"];
       for (const attrName of meaningfulAttrs) {
         if (element.hasAttribute(attrName)) {
-          const value = element.getAttribute(attrName)!.replace(/'/g, "\\'");
-          return `${tagName}[@${attrName}='${value}']`;
+          const value = element.getAttribute(attrName)!;
+          return `${tagName}[@${attrName}=${this.toXPathLiteral(value)}]`;
         }
       }
@@
-        if (
+        if (
           attr.name.startsWith("data-") &&
           attr.name !== "data-testid" &&
           attr.name !== "data-mx-id" &&
           attr.value
         ) {
-          return `${tagName}[@${attr.name}='${attr.value}']`;
+          return `${tagName}[@${attr.name}=${this.toXPathLiteral(attr.value)}]`;
         }
       }

Add this helper (outside the selected range) to the class:

// Safe XPath string literal builder (handles both quote types)
private toXPathLiteral(value: string): string {
  if (!value.includes("'")) return `'${value}'`;
  if (!value.includes('"')) return `"${value}"`;
  const parts = value.split("'");
  const args: string[] = [];
  for (let i = 0; i < parts.length; i++) {
    if (parts[i]) args.push(`'${parts[i]}'`);
    if (i < parts.length - 1) args.push(`"'"`);
  }
  return `concat(${args.join(",")})`;
}

Also applies to: 2740-2749


2473-2476: Bug: overlay-controls guard is always true due to negation precedence

The condition !hoveredElement.closest("#overlay-controls") != null always evaluates to true when hoveredElement exists. You likely intended to skip elements within the overlay.

Apply this diff:

@@
-      if (
-        hoveredElement != null &&
-        !hoveredElement.closest("#overlay-controls") != null
-      ) {
+      if (
+        hoveredElement != null &&
+        hoveredElement.closest("#overlay-controls") == null
+      ) {

1125-1130: Operator precedence bug in INPUT date/time check

The current condition evaluates (type === "date") even when tagName !== "INPUT". Group both type checks under the INPUT guard.

Apply this diff:

@@
-            } else if (
-              (targetElement?.tagName === "INPUT" &&
-                (targetElement as HTMLInputElement).type === "time") ||
-              (targetElement as HTMLInputElement).type === "date"
-            ) {
+            } else if (
+              targetElement?.tagName === "INPUT" &&
+              (
+                (targetElement as HTMLInputElement).type === "time" ||
+                (targetElement as HTMLInputElement).type === "date"
+              )
+            ) {
🧹 Nitpick comments (8)
src/helpers/clientSelectorGenerator.ts (8)

540-552: Use the cached meaningfulness in child traversal

getMeaningfulChildren still calls isMeaningfulElement; switch to the cached variant to avoid repeated expensive checks.

Apply this diff:

@@
-        if (this.isMeaningfulElement(htmlChild)) {
+        if (this.isMeaningfulElementCached(htmlChild)) {
           meaningfulChildren.push(htmlChild);
         } else {
           // If not meaningful itself, check its children
           traverse(htmlChild, depth + 1);
         }
@@
-          if (this.isMeaningfulElement(htmlShadowChild)) {
+          if (this.isMeaningfulElementCached(htmlShadowChild)) {
             meaningfulChildren.push(htmlShadowChild);
           } else {
             traverse(htmlShadowChild, depth + 1);
           }

Also applies to: 506-538


2498-2505: Selector caching: add a mutation-aware version to avoid stale results

Keying selectorCache by URL + parentSelector helps, but long-lived pages mutate frequently (SPA). Without invalidation, we can return stale selectors and “stuck” highlights. Consider a lightweight MutationObserver to bump a document-version counter included in the cache key.

If helpful, I can provide a minimal mutation tracker you can toggle on analyzeElementGroups() and clear in cleanup().

Also applies to: 2543-2546


2553-2628: Bounded BFS is good; minor guardrails for hot paths

The traversal is careful with depth and size caps. Two small improvements:

  • Early-return if currentDepth is small and element is not displayable (e.g., visibility: hidden) to cut noise on heavy sites.
  • Prefer iterating via static arrays (Array.from) once per node to avoid live HTMLCollection surprises.

I can draft a micro-refactor if you want to test it behind a feature flag.


2919-2990: Class similarity scan can still be heavy across many lists

This function repeatedly calls getAllDescendantsIncludingShadow(listEl) in a loop (again inside every targetClass check later). It benefits from the descendants cache, but you can still avoid repeated scans by precomputing, per list element and tag, the normalized class sets once.

I can sketch a small precomputation map {listEl-> {TAG -> [normalizedClassSets]}} to de-duplicate work.


3098-3115: Nit: unused local variable

gridKey is computed but not used in getNearbySelectorCandidates().

Apply this diff:

@@
-    const gridKey = this.getElementGridKey(element);
     const rect = element.getBoundingClientRect();

3468-3474: Fallback XPath → CSS translation has semantic mismatch for positions

Translating //tag[n] to tag:nth-child(n) isn’t equivalent; nth-child indexes all children, not only siblings with the same tag. If this path is hit, results can be wrong. Consider skipping this translation or limiting it to cases where siblings are homogeneous.

I can help guard this with a quick DOM check before returning the CSS path.


3042-3081: Minor: console.time/timeEnd in production

The precomputation timer is useful during development but can be noisy in production consoles. Consider gating with a debug flag.

I can wire a simple this.performanceConfig.debugLogging flag if desired.


4048-4076: Unused class-level dialog helpers – consider consolidating or removing

It looks like you’ve got two sets of dialog-related helpers in src/helpers/clientSelectorGenerator.ts:

  • An inline arrow-function helper at lines 1956–1986:

    const findDeepestInDialog = (elements: HTMLElement[]): HTMLElement | null => {  }
    // …
    const deepestInDialog = findDeepestInDialog(dialogElements);

    which is actually invoked at line 1987.

  • Two private class methods that are never referenced anywhere else:

    • private findDialogElement(elements: HTMLElement[]): HTMLElement | null { … } (lines 4051–4076)
    • private findDeepestInDialog(dialogElements: HTMLElement[], dialogElement: HTMLElement): HTMLElement | null { … } (lines 4081–4113)

This duplication risks them drifting out of sync. You have two options:

  • Reuse the class methods throughout (replace the inline arrow function and its call site with calls to this.findDialogElement/this.findDeepestInDialog).
  • Remove the unused class methods entirely and keep the single inline implementation.

Either approach will eliminate dead code and ensure there’s a single source of truth for dialog-element lookup.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 96dea7f and 0fba0f0.

📒 Files selected for processing (1)
  • src/helpers/clientSelectorGenerator.ts (11 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
src/helpers/clientSelectorGenerator.ts (1)
maxun-core/src/browserSide/scraper.js (7)
  • tagName (450-450)
  • parentElement (914-914)
  • selector (28-28)
  • MAX_DEPTH (955-955)
  • pathParts (569-573)
  • depth (956-956)
  • shadowRoot (992-992)
🔇 Additional comments (5)
src/helpers/clientSelectorGenerator.ts (5)

112-116: Good addition: targeted caches to cut recomputation on heavy pages

The new WeakMaps/Map look appropriate for DOM-bound caching, and you’ve remembered to clear them in cleanup(). Nice.


2642-2670: Batching and cap at 250 selectors look reasonable

The batched loop and early breaks prevent runaway work. Good call.


2848-2851: Path caching with negative entries is smart

Caching null paths to avoid recomputation is a good optimization. You already reset pathCache on new getChildSelectors() runs—nice touch.

Also applies to: 2904-2913


3669-3680: Nice: precompute mappings when cachedChildSelectors are provided

Precomputing here lines up with your cache-aware matching path and reduces jank during hover.


4223-4227: Good housekeeping: clearing new caches in cleanup()

This avoids memory leaks and stale reads across sessions.

Comment on lines +405 to +420
let allElements = this.getAllVisibleElementsWithShadow(iframeDoc);

if (this.getList === true && this.listSelector === "") {
const dialogElements = this.findAllDialogElements(iframeDoc);

if (dialogElements.length > 0) {
// Check if dialogs contain significant content worth analyzing
const dialogContentElements = this.getElementsFromDialogs(dialogElements);

// Only switch to dialog-focused analysis if dialogs have substantial content
if (dialogContentElements.length > 5) {
allElements = dialogContentElements;
}
}
}

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

List-mode dialog pivot is a solid idea, but current dialog discovery is too narrow

Switching analysis to dialog content when listSelector is empty will help avoid scanning the whole page. However, your finder only picks role="dialog"/ (see findAllDialogElements), missing common “modal/popup/overlay” patterns. Prefer broadening the discovery to avoid false negatives that regress into “heavy-site” scans.

Apply this diff to make dialog detection more robust:

@@
-  private findAllDialogElements(doc: Document): HTMLElement[] {
-    const dialogElements: HTMLElement[] = [];
-    const allElements = Array.from(doc.querySelectorAll("*")) as HTMLElement[];
-
-    for (const element of allElements) {
-      if (element.getAttribute("role") === "dialog") {
-        dialogElements.push(element);
-        continue;
-      }
-
-      if (element.tagName.toLowerCase() === "dialog") {
-        dialogElements.push(element);
-        continue;
-      }
-    }
-
-    return dialogElements;
-  }
+  private findAllDialogElements(doc: Document): HTMLElement[] {
+    const dialogElements: HTMLElement[] = [];
+    const all = Array.from(doc.querySelectorAll("*")) as HTMLElement[];
+
+    for (const el of all) {
+      const tag = el.tagName.toLowerCase();
+      const role = el.getAttribute("role")?.toLowerCase();
+      const id = (el.id || "").toLowerCase();
+      const cls = el.className?.toString().toLowerCase() || "";
+
+      if (role === "dialog" || tag === "dialog") {
+        dialogElements.push(el);
+        continue;
+      }
+      // Heuristics for common modal implementations
+      if (
+        cls.includes("modal") || cls.includes("dialog") ||
+        cls.includes("popup") || cls.includes("overlay") ||
+        id.includes("modal") || id.includes("dialog") || id.includes("popup")
+      ) {
+        dialogElements.push(el);
+      }
+    }
+    return dialogElements;
+  }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
let allElements = this.getAllVisibleElementsWithShadow(iframeDoc);
if (this.getList === true && this.listSelector === "") {
const dialogElements = this.findAllDialogElements(iframeDoc);
if (dialogElements.length > 0) {
// Check if dialogs contain significant content worth analyzing
const dialogContentElements = this.getElementsFromDialogs(dialogElements);
// Only switch to dialog-focused analysis if dialogs have substantial content
if (dialogContentElements.length > 5) {
allElements = dialogContentElements;
}
}
}
private findAllDialogElements(doc: Document): HTMLElement[] {
const dialogElements: HTMLElement[] = [];
const all = Array.from(doc.querySelectorAll("*")) as HTMLElement[];
for (const el of all) {
const tag = el.tagName.toLowerCase();
const role = el.getAttribute("role")?.toLowerCase();
const id = (el.id || "").toLowerCase();
const cls = el.className?.toString().toLowerCase() || "";
// Always include native dialogs and ARIA dialogs
if (role === "dialog" || tag === "dialog") {
dialogElements.push(el);
continue;
}
// Heuristics for common modal/popup/overlay patterns
if (
cls.includes("modal") || cls.includes("dialog") ||
cls.includes("popup") || cls.includes("overlay") ||
id.includes("modal") || id.includes("dialog") ||
id.includes("popup")
) {
dialogElements.push(el);
}
}
return dialogElements;
}

Comment on lines +558 to 591

// Fast path for common meaningful elements
if (["a", "img", "input", "button", "select"].includes(tagName)) {
return true;
}

const text = (element.textContent || "").trim();
const hasHref = element.hasAttribute("href");
const hasSrc = element.hasAttribute("src");

// Quick checks first
if (text.length > 0 || hasHref || hasSrc) {
return true;
}

const isCustomElement = tagName.includes("-");

// For custom elements, be more lenient about what's considered meaningful
if (isCustomElement) {
const hasChildren = element.children.length > 0;
const hasSignificantAttributes = Array.from(element.attributes).some(
(attr) => !["class", "style", "id"].includes(attr.name.toLowerCase())
);

return (
text.length > 0 ||
hasHref ||
hasSrc ||
hasChildren ||
hasSignificantAttributes ||
element.hasAttribute("role") ||
element.hasAttribute("aria-label")
);
}

return (
text.length > 0 ||
hasHref ||
hasSrc ||
["a", "img", "input", "button", "select"].includes(tagName)
);
return false;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Hidden inputs and aria-hidden should not be treated as meaningful

The fast-path currently returns true for all inputs. That will include hidden inputs and non-interactive controls. Also consider honoring aria-hidden on the element itself.

Apply this diff:

@@
-    // Fast path for common meaningful elements
-    if (["a", "img", "input", "button", "select"].includes(tagName)) {
-      return true;
-    }
+    // Fast path for common meaningful elements with guards
+    if (tagName === "input") {
+      const type = (element as HTMLInputElement).type?.toLowerCase();
+      if (type === "hidden") return false;
+      return true;
+    }
+    if (["a", "img", "button", "select", "textarea"].includes(tagName)) {
+      return true;
+    }
@@
-    // Quick checks first
-    if (text.length > 0 || hasHref || hasSrc) {
+    // Quick checks first
+    if (
+      (element.getAttribute("aria-hidden") !== "true") &&
+      (text.length > 0 || hasHref || hasSrc)
+    ) {
       return true;
     }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// Fast path for common meaningful elements
if (["a", "img", "input", "button", "select"].includes(tagName)) {
return true;
}
const text = (element.textContent || "").trim();
const hasHref = element.hasAttribute("href");
const hasSrc = element.hasAttribute("src");
// Quick checks first
if (text.length > 0 || hasHref || hasSrc) {
return true;
}
const isCustomElement = tagName.includes("-");
// For custom elements, be more lenient about what's considered meaningful
if (isCustomElement) {
const hasChildren = element.children.length > 0;
const hasSignificantAttributes = Array.from(element.attributes).some(
(attr) => !["class", "style", "id"].includes(attr.name.toLowerCase())
);
return (
text.length > 0 ||
hasHref ||
hasSrc ||
hasChildren ||
hasSignificantAttributes ||
element.hasAttribute("role") ||
element.hasAttribute("aria-label")
);
}
return (
text.length > 0 ||
hasHref ||
hasSrc ||
["a", "img", "input", "button", "select"].includes(tagName)
);
return false;
}
// Fast path for common meaningful elements with guards
if (tagName === "input") {
const type = (element as HTMLInputElement).type?.toLowerCase();
if (type === "hidden") return false;
return true;
}
if (["a", "img", "button", "select", "textarea"].includes(tagName)) {
return true;
}
const text = (element.textContent || "").trim();
const hasHref = element.hasAttribute("href");
const hasSrc = element.hasAttribute("src");
// Quick checks first
if (
(element.getAttribute("aria-hidden") !== "true") &&
(text.length > 0 || hasHref || hasSrc)
) {
return true;
}
const isCustomElement = tagName.includes("-");
// For custom elements, be more lenient about what's considered meaningful
if (isCustomElement) {
const hasChildren = element.children.length > 0;
const hasSignificantAttributes = Array.from(element.attributes).some(
(attr) => !["class", "style", "id"].includes(attr.name.toLowerCase())
);
return (
hasChildren ||
hasSignificantAttributes ||
element.hasAttribute("role") ||
element.hasAttribute("aria-label")
);
}
return false;
🤖 Prompt for AI Agents
In src/helpers/clientSelectorGenerator.ts around lines 558 to 591, the fast-path
incorrectly treats all <input> elements (and any element with aria-hidden not
considered) as meaningful; update the checks so that elements with
aria-hidden="true" are always treated as not meaningful, and only treat inputs
as meaningful when they are not type="hidden" and are interactive (e.g., not
disabled). Concretely: add an early return false if
element.hasAttribute('aria-hidden') && element.getAttribute('aria-hidden') ===
'true'; change the fast-path to special-case "input" by reading its type
attribute and only return true for input when type !== 'hidden' (and ideally
ensure it is not disabled); keep the existing fast-path behavior for a, img,
button, select but guard them with the aria-hidden check too.

@amhsirak amhsirak merged commit af8ddaf into develop Aug 25, 2025
1 check passed
@coderabbitai coderabbitai bot mentioned this pull request Oct 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Type: Bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants