Skip to content

Conversation

RohitR311
Copy link
Contributor

@RohitR311 RohitR311 commented Oct 8, 2025

What this PR does?

  1. Increases depth limit to capture deeper elements
  2. Checks for child elements and filters out elements if children exists
  3. Fixes the issue that was preventing capturing similar elements due to certain attributes like aria-label, name, etc
  4. Fixes the rrweb rendering error

Summary by CodeRabbit

  • New Features

    • Smarter element detection and deeper traversal for more accurate clicks/selections (e.g., images require source, links require target).
    • Cross-list and path-based matching to choose more consistent selectors across similar elements.
  • Bug Fixes

    • Fewer mis-selections in dense DOMs, shadow DOMs, frames, and tables; improved fallback to actionable descendants.
    • Snapshot replay more faithful after reconstruction change.
  • Performance

    • Caching reduces repeated element checks for snappier interactions.
  • Refactor

    • More consistent attribute-based selector logic across competing element lists.

@RohitR311 RohitR311 added Type: Bug Something isn't working Type: Enhancement Improvements to existing features labels Oct 8, 2025
@coderabbitai
Copy link

coderabbitai bot commented Oct 8, 2025

Walkthrough

Refactors selector generation in src/helpers/clientSelectorGenerator.ts (tightened meaningful-element rules, added cross-list attribute consistency, path utilities, caching, atomic-child fallback, raised MAX_DEPTH to 20, and stronger shadow/frame guards) and disables rrweb CSS hacking (hackCss: false) in src/components/recorder/DOMBrowserRenderer.tsx.

Changes

Cohort / File(s) Summary
Selector generation core
src/helpers/clientSelectorGenerator.ts
Reworked meaningful-element detection (e.g., img meaningful only with src, a only with href, elements with children not automatically meaningful); added isMeaningfulElementCached and caches; added defensive guards and expanded shadow/frame error handling.
Cross-list & path utilities
src/helpers/clientSelectorGenerator.ts
Added getElementPath, findCorrespondingElement, and isAttributeCommonAcrossLists to gate emitting attributes/classes across competing lists; updated getCommonClassesAcrossLists signature and caching.
Depth & traversal policy
src/helpers/clientSelectorGenerator.ts
Increased MAX_DEPTH from 12 → 20; removed dynamic depth-by-density logic; updated shadow/iframe traversal checks to use fixed depth.
Deepest-element & atomic-child fallback
src/helpers/clientSelectorGenerator.ts
getDeepestElementFromPoint now falls back to findAtomicChildAtPoint when deepest element isn't meaningful; implemented subtree/shadow DOM atomic-child search with bounding-box checks.
Grouping & table handling
src/helpers/clientSelectorGenerator.ts
Adjusted row/group detection to require meaningful children and align with new depth/meaningfulness rules; table-row grouping retained but criteria tightened.
rrweb rebuild option change
src/components/recorder/DOMBrowserRenderer.tsx
Set hackCss: false in rrweb rebuild call inside renderRRWebSnapshot to disable CSS-hacking during snapshot rebuild.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor User
  participant Page
  participant SelectorGen as clientSelectorGenerator
  participant Shadow as Shadow/Subtree

  User->>Page: Click / Point(x,y)
  Page->>SelectorGen: getDeepestElementFromPoint(x,y)
  SelectorGen->>Page: elementsFromPoint(x,y)
  alt deepest is meaningful
    SelectorGen-->>Page: return deepestElement
  else not meaningful
    SelectorGen->>Shadow: findAtomicChildAtPoint(deepest, bbox)
    Shadow-->>SelectorGen: atomicMeaningfulDescendant?
    alt found
      SelectorGen-->>Page: return atomicDescendant
    else not found
      SelectorGen-->>Page: return original deepest
    end
  end
Loading
sequenceDiagram
  autonumber
  participant Gen as Selector Builder
  participant Target as TargetElement
  participant Lists as OtherListElements
  participant Utils as Path/Attr Utils

  Gen->>Utils: getElementPath(Target)
  Gen->>Lists: findCorrespondingElement(root, path)*
  note over Gen,Lists: Align competitor elements by DOM path
  loop Candidate attributes/classes
    Gen->>Utils: isAttributeCommonAcrossLists(attrName, attrValue, Lists)
    alt common across lists
      Gen-->>Gen: emit attribute predicate
    else not common
      Gen-->>Gen: skip predicate
    end
  end
  Gen-->>Target: Final selector
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60–90 minutes

Possibly related PRs

Suggested labels

Scope: Recorder

Suggested reviewers

  • amhsirak

Poem

Thump-thump — I hop and sift the DOM,
I check each href and src, I carry on.
I cache my carrots, follow paths with care,
Twenty steps deep, I find what's fair. 🥕

Pre-merge checks and finishing touches

❌ Failed checks (1 inconclusive)
Check name Status Explanation Resolution
Title Check ❓ Inconclusive The title "fix: extract clean data" is too generic and does not convey the substantial updates made to DOM selector extraction depth, meaningful element filtering, cross-list attribute consistency, or the rrweb snapshot rendering fix. It fails to highlight the main changes such as increasing traversal depth limits, refining element detection, and disabling CSS hacks. Because the phrasing is vague, reviewers cannot quickly grasp the primary objective of the pull request. Please update the title to clearly summarize the primary changes by explicitly mentioning the increased depth limit, refined element extraction logic, or the rrweb rendering fix. For instance, “fix: increase DOM traversal depth and refine meaningful element extraction” or “fix: disable rrweb hackCss to prevent rendering errors.” A more descriptive title will help reviewers understand the scope and intent of the changes at a glance.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed No functions found in the changes. Docstring coverage check skipped.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch clean-data

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7645983 and ce2e28a.

📒 Files selected for processing (1)
  • src/helpers/clientSelectorGenerator.ts (9 hunks)
🔇 Additional comments (2)
src/helpers/clientSelectorGenerator.ts (2)

556-577: LGTM! Anchor check correctly positioned before children check.

The anchor with href check (lines 563-565) now executes before the children length check (lines 567-569), which correctly preserves anchors with nested content as meaningful elements. This addresses the concern raised in the previous review.


2551-2551: LGTM! MAX_DEPTH increase aligns with PR objectives.

The depth limit has been correctly increased to 20 to capture deeper elements as stated in the PR objectives. The depth guards at lines 2566, 2575, and 2590 consistently enforce this limit across light DOM and shadow DOM traversal.

Also applies to: 2566-2566, 2575-2577, 2590-2601


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/helpers/clientSelectorGenerator.ts (1)

976-993: Invert grouped parent/child filtering to keep leaf-most elements.

filterParentChildGroupedElements currently keeps parents that contain other grouped elements. Usually we want the deepest (leaf) grouped elements to avoid duplicate highlights.

Apply this diff:

   private filterParentChildGroupedElements(
     groupedElements: HTMLElement[]
   ): HTMLElement[] {
     const result: HTMLElement[] = [];
 
     for (const element of groupedElements) {
       const hasGroupedChild = groupedElements.some(
         (other) => other !== element && element.contains(other)
       );
 
-      if (hasGroupedChild) {
-        result.push(element);
-      }
+      if (!hasGroupedChild) {
+        result.push(element);
+      }
     }
 
     return result.length > 0 ? result : groupedElements;
   }
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 95a4d3c and b645664.

📒 Files selected for processing (1)
  • src/helpers/clientSelectorGenerator.ts (9 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
src/helpers/clientSelectorGenerator.ts (1)
maxun-core/src/browserSide/scraper.js (1)
  • MAX_DEPTH (955-955)
🔇 Additional comments (1)
src/helpers/clientSelectorGenerator.ts (1)

2552-2552: Ensure MAX_DEPTH divergence is intentional
MAX_DEPTH increased from 4→20 in clientSelectorGenerator.ts, while upstream scraper and server selectors cap at 5/4. If this deeper traversal interacts with those components, confirm it won’t degrade performance or cause selector mismatches—consider aligning or documenting the rationale.

Comment on lines 2699 to 2715
if (!addPositionToAll) {
const meaningfulAttrs = ["role", "type", "name", "src", "aria-label"];
const meaningfulAttrs = ["role", "type"];
for (const attrName of meaningfulAttrs) {
if (element.hasAttribute(attrName)) {
const value = element.getAttribute(attrName)!.replace(/'/g, "\\'");
return `${tagName}[@${attrName}='${value}']`;
const isCommonAttribute = this.isAttributeCommonAcrossLists(
element,
attrName,
value,
otherListElements
);
if (isCommonAttribute) {
return `${tagName}[@${attrName}='${value}']`;
}
}
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Cross-list attribute check maps by absolute path; compute relative path to the list root and fail closed.

isAttributeCommonAcrossLists builds a path from the document root (getElementPath) then applies it starting from other list roots. This mismaps descendants and treats “corresponding element not found” as implicitly OK, letting non-common attributes pass. Compute a path relative to the current list root and require a match in every other list.

Apply these diffs:

@@
-    if (!addPositionToAll) {
-      const meaningfulAttrs = ["role", "type"];
+    if (!addPositionToAll) {
+      const meaningfulAttrs = ["role", "type"];
       for (const attrName of meaningfulAttrs) {
         if (element.hasAttribute(attrName)) {
           const value = element.getAttribute(attrName)!.replace(/'/g, "\\'");
-          const isCommonAttribute = this.isAttributeCommonAcrossLists(
-            element,
-            attrName,
-            value,
-            otherListElements
-          );
+          const isCommonAttribute = rootElement
+            ? this.isAttributeCommonAcrossLists(
+                element,
+                rootElement,
+                attrName,
+                value,
+                otherListElements
+              )
+            : false;
           if (isCommonAttribute) {
             return `${tagName}[@${attrName}='${value}']`;
           }
         }
       }
     }
@@
-    const testId = element.getAttribute("data-testid");
+    const testId = element.getAttribute("data-testid");
     if (testId && !addPositionToAll) {
-      const isCommon = this.isAttributeCommonAcrossLists(
-        element,
-        "data-testid",
-        testId,
-        otherListElements
-      );
+      const isCommon = rootElement
+        ? this.isAttributeCommonAcrossLists(
+            element,
+            rootElement,
+            "data-testid",
+            testId,
+            otherListElements
+          )
+        : false;
       if (isCommon) {
         return `${tagName}[@data-testid='${testId}']`;
       }
     }
@@
-    if (element.id && !element.id.match(/^\d/) && !addPositionToAll) {
-      const isCommon = this.isAttributeCommonAcrossLists(
-        element,
-        "id",
-        element.id,
-        otherListElements
-      );
+    if (element.id && !element.id.match(/^\d/) && !addPositionToAll) {
+      const isCommon = rootElement
+        ? this.isAttributeCommonAcrossLists(
+            element,
+            rootElement,
+            "id",
+            element.id,
+            otherListElements
+          )
+        : false;
       if (isCommon) {
         return `${tagName}[@id='${element.id}']`;
       }
     }
@@
-    if (!addPositionToAll) {
+    if (!addPositionToAll) {
       for (const attr of Array.from(element.attributes)) {
         if (
           attr.name.startsWith("data-") &&
           attr.name !== "data-testid" &&
           attr.name !== "data-mx-id" &&
           attr.value
         ) {
-          const isCommon = this.isAttributeCommonAcrossLists(
-            element,
-            attr.name,
-            attr.value,
-            otherListElements
-          );
+          const isCommon = rootElement
+            ? this.isAttributeCommonAcrossLists(
+                element,
+                rootElement,
+                attr.name,
+                attr.value,
+                otherListElements
+              )
+            : false;
           if (isCommon) {
             return `${tagName}[@${attr.name}='${attr.value}']`;
           }
         }
       }
     }
@@
-  private isAttributeCommonAcrossLists(
-    targetElement: HTMLElement,
-    attrName: string,
-    attrValue: string,
-    otherListElements: HTMLElement[]
-  ): boolean {
-    if (otherListElements.length === 0) {
-      return true;
-    }
-
-    const targetPath = this.getElementPath(targetElement);
-
-    for (const otherListElement of otherListElements) {
-      const correspondingElement = this.findCorrespondingElement(
-        otherListElement,
-        targetPath
-      );
-      if (correspondingElement) {
-        const otherValue = correspondingElement.getAttribute(attrName);
-        if (otherValue !== attrValue) {
-          return false;
-        }
-      }
-    }
-
-    return true;
-  }
+  private isAttributeCommonAcrossLists(
+    targetElement: HTMLElement,
+    rootElement: HTMLElement,
+    attrName: string,
+    attrValue: string,
+    otherListElements: HTMLElement[]
+  ): boolean {
+    if (otherListElements.length === 0) return true;
+    const relPath = this.getElementPath(targetElement, rootElement);
+    if (!relPath) return false; // cannot map reliably relative to root
+    for (const otherRoot of otherListElements) {
+      const corresponding = this.findCorrespondingElement(otherRoot, relPath);
+      if (!corresponding) return false;
+      if (corresponding.getAttribute(attrName) !== attrValue) return false;
+    }
+    return true;
+  }
@@
-  private getElementPath(element: HTMLElement): number[] {
-    const path: number[] = [];
-    let current: HTMLElement | null = element;
-
-    while (current && current.parentElement) {
-      const siblings = Array.from(current.parentElement.children);
-      path.unshift(siblings.indexOf(current));
-      current = current.parentElement;
-    }
-
-    return path;
-  }
+  private getElementPath(
+    element: HTMLElement,
+    rootElement: HTMLElement
+  ): number[] | null {
+    const path: number[] = [];
+    let current: HTMLElement | null = element;
+    // Walk up until rootElement (crossing shadow hosts if needed)
+    while (current && current !== rootElement) {
+      const parent =
+        current.parentElement ||
+        ((current.getRootNode() as ShadowRoot).host as HTMLElement | null);
+      if (!parent) return null;
+      const siblings = Array.from(parent.children);
+      const idx = siblings.indexOf(current);
+      if (idx < 0) return null;
+      path.unshift(idx);
+      current = parent;
+    }
+    return current === rootElement ? path : null;
+  }

Also applies to: 2717-2728, 2730-2740, 2749-2759, 2926-2953, 2954-2966, 2967-2983

🤖 Prompt for AI Agents
In src/helpers/clientSelectorGenerator.ts around lines 2699-2715 (and also apply
same change to ranges 2717-2728, 2730-2740, 2749-2759, 2926-2953, 2954-2966,
2967-2983): the current isAttributeCommonAcrossLists logic builds element paths
from the document root and then queries other list roots using those absolute
paths, which can map to wrong descendants and treats "not found" as success;
change it to compute the element path relative to the current list root (strip
or build the path starting from the list root node) and when checking
otherListElements require that the corresponding relative path resolves to an
element and that its attribute value equals the expected value in every other
list (i.e., treat missing matches as failure), returning false if any list does
not have a matching element or attribute value; update all mentioned code ranges
to use the relative-path lookup and strict matching.

Comment on lines +3993 to +3999
if (!this.isMeaningfulElementCached(deepestElement)) {
const atomicChild = this.findAtomicChildAtPoint(deepestElement, x, y);
if (atomicChild) {
return atomicChild;
}
}

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Atomic-child fallback misses shadow DOM descendants.

findAtomicChildAtPoint only walks light DOM children. If the meaningful descendant is in a shadow root, the fallback won’t find it.

Apply this diff:

-    if (!this.isMeaningfulElementCached(deepestElement)) {
-      const atomicChild = this.findAtomicChildAtPoint(deepestElement, x, y);
+    if (!this.isMeaningfulElementCached(deepestElement)) {
+      const atomicChild = this.findAtomicChildAtPoint(deepestElement, x, y);
       if (atomicChild) {
         return atomicChild;
       }
     }

And enhance traversal:

   private findAtomicChildAtPoint(
     parent: HTMLElement,
     x: number,
     y: number
   ): HTMLElement | null {
     const stack: HTMLElement[] = [parent];
     const visited = new Set<HTMLElement>();
 
     while (stack.length > 0) {
       const element = stack.pop()!;
       if (visited.has(element)) continue;
       visited.add(element);
 
       if (element !== parent && this.isMeaningfulElementCached(element)) {
         const rect = element.getBoundingClientRect();
         if (x >= rect.left && x <= rect.right && y >= rect.top && y <= rect.bottom) {
           return element;
         }
       }
 
-      for (let i = element.children.length - 1; i >= 0; i--) {
+      // Light DOM children (iterate from topmost in stacking order approximation)
+      for (let i = element.children.length - 1; i >= 0; i--) {
         const child = element.children[i] as HTMLElement;
         const rect = child.getBoundingClientRect();
         if (x >= rect.left && x <= rect.right && y >= rect.top && y <= rect.bottom) {
           stack.push(child);
         }
       }
+      // Shadow DOM children (if any)
+      if ((element as HTMLElement).shadowRoot) {
+        const shadowChildren = (element as HTMLElement).shadowRoot!.children;
+        for (let i = shadowChildren.length - 1; i >= 0; i--) {
+          const child = shadowChildren[i] as HTMLElement;
+          const rect = child.getBoundingClientRect();
+          if (x >= rect.left && x <= rect.right && y >= rect.top && y <= rect.bottom) {
+            stack.push(child);
+          }
+        }
+      }
     }
 
     return null;
   }

Also applies to: 4003-4034

🤖 Prompt for AI Agents
In src/helpers/clientSelectorGenerator.ts around lines 3993-3999 (and similarly
for 4003-4034), the atomic-child fallback uses findAtomicChildAtPoint which only
traverses light DOM children and therefore misses meaningful descendants inside
shadow roots; update the traversal to also check shadow DOM by: when examining a
candidate element at the point, if it has a shadowRoot call
shadowRoot.elementFromPoint(x,y) (or shadowRoot.deep element lookup) and
continue descending into that shadow root’s composed tree (including following
assigned nodes for slots and checking shadow-inserted children) before falling
back to light DOM children; ensure the function returns the deepest atomic
descendant found in either the shadowRoot or light children and mirror this
shadow-aware logic in the other affected block (4003-4034).

@amhsirak amhsirak merged commit 67cd03c into develop Oct 21, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Type: Bug Something isn't working Type: Enhancement Improvements to existing features

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants