Skip to content

Conversation

@RohitR311
Copy link
Collaborator

@RohitR311 RohitR311 commented Jul 15, 2025

Summary by CodeRabbit

  • New Features

    • Added robust support for shadow DOM elements in selector generation, highlighting, and data extraction.
    • Introduced similar element highlighting and improved visual cues when interacting with DOM elements.
    • Implemented caching and loading indicators for child selector computations, enhancing performance and user feedback.
  • Improvements

    • Enhanced attribute selection UI and loading overlays for a smoother user experience.
    • Optimized selector generation and matching, including advanced XPath handling and performance optimizations.
  • Bug Fixes

    • Improved handling of element selection and highlighting across shadow DOM boundaries.

@RohitR311 RohitR311 added the Type: Enhancement Improvements to existing features label Jul 15, 2025
@coderabbitai
Copy link

coderabbitai bot commented Jul 15, 2025

Walkthrough

This change introduces comprehensive support for shadow DOM in selector generation, DOM highlighting, and data extraction. It adds shadow DOM awareness to state, props, and internal logic across browser window, renderer, context, and helper modules. Selector generation, caching, and highlighter logic are refactored to robustly handle shadow roots, similar elements, and optimized XPath evaluation, with UI and control flow enhancements for improved user feedback.

Changes

File(s) Change Summary
src/components/browser/BrowserWindow.tsx Enhanced BrowserWindow with shadow DOM and similar element highlighting, child selector caching, loading overlays, and improved attribute modal UI. State and handler signatures updated for shadow DOM support.
src/components/recorder/DOMBrowserRenderer.tsx Added isCachingChildSelectors prop, extended highlight/selection callbacks with shadow DOM and similarity info, and updated rendering logic to handle caching state.
src/components/recorder/RightSidePanel.tsx Propagated isShadow property in list extraction settings and pagination objects within getListSettingsObject.
src/context/browserSteps.tsx Added isShadow to step interfaces and context, updated addListStep to support shadow DOM, and renamed shadow to isShadow in selectors.
src/helpers/clientListExtractor.ts Refactored for shadow DOM-aware XPath evaluation, renamed shadow to isShadow, and improved extraction logic to handle shadow roots stepwise.
src/helpers/clientSelectorGenerator.ts Major refactor: added shadow DOM support, caching, spatial indexing, optimized XPath generation, and new API for child selector generation and highlighter data. Deprecated and replaced legacy selector logic.
maxun-core/src/browserSide/scraper.js Replaced simple XPath evaluation with custom shadow DOM-aware XPath evaluator, added helper functions for parsing and matching XPath conditions, and updated scraping logic to support shadow DOM selectors.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant BrowserWindow
    participant DOMBrowserRenderer
    participant ClientSelectorGenerator
    participant ClientListExtractor

    User->>BrowserWindow: Interacts with DOM (hover/select)
    BrowserWindow->>DOMBrowserRenderer: Passes selection/highlight event
    DOMBrowserRenderer->>ClientSelectorGenerator: generateDataForHighlighter(coordinates, doc, ...)
    ClientSelectorGenerator->>ClientSelectorGenerator: Traverse DOM (including shadow roots)
    ClientSelectorGenerator->>BrowserWindow: Return selector, isShadow, similarElements
    BrowserWindow->>BrowserWindow: Update highlighterData state
    BrowserWindow->>DOMBrowserRenderer: Render highlights (with shadow DOM and similar elements)
    User->>BrowserWindow: Confirms selection
    BrowserWindow->>ClientListExtractor: extractListData(doc, selector, fields, isShadow)
    ClientListExtractor->>ClientListExtractor: Evaluate XPath stepwise (shadow DOM aware)
    ClientListExtractor->>BrowserWindow: Return extracted data
Loading

Possibly related PRs

  • getmaxun/maxun#275: Enhances XPath evaluation and element querying with shadow DOM and iframe traversal in scraper logic, closely related to shadow DOM support here.
  • getmaxun/maxun#675: Improves BrowserWindow DOM highlighting and child selector caching focusing on group elements, related but distinct from shadow DOM enhancements in this PR.
  • getmaxun/maxun#296: Adds shadow DOM support in scraping logic including XPath evaluation, strongly connected to the custom XPath evaluator changes here.

Suggested labels

Type: Feature, Scope: Ext

Suggested reviewers

  • amhsirak

Poem

In the warren of code where the shadow roots grow,
The bunny hops softly, selector in tow.
With caches and highlights, and elements found,
Through shadowy tunnels, it burrows around.
Now lists and their children are easy to see—
Hooray for the rabbit, and shadow DOM glee!

((\
( -.-)
o_(")(")

✨ Finishing Touches
  • 📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🔭 Outside diff range comments (1)
src/components/browser/BrowserWindow.tsx (1)

723-735: Remove duplicate useEffect hook.

This useEffect is an exact duplicate of the one at lines 708-721. This will cause event listeners to be registered twice, potentially leading to performance issues and unexpected behavior.

Remove this entire duplicate useEffect block.

🧹 Nitpick comments (10)
src/helpers/clientListExtractor.ts (1)

439-544: Comprehensive shadow DOM XPath evaluation implementation

The new evaluateXPathSingle method provides robust shadow DOM support with proper handling of:

  • Indexed XPath expressions
  • Step-by-step traversal through shadow boundaries
  • Correct positional selector handling (1-based to 0-based conversion)

However, due to the complexity of this implementation, consider adding performance monitoring to ensure the step-by-step evaluation doesn't impact extraction performance on pages with deep shadow DOM trees.

Consider adding detailed JSDoc documentation for this method to explain the shadow DOM traversal algorithm, especially the handling of indexed paths and context switching between Document, Element, and ShadowRoot.

src/helpers/clientSelectorGenerator.ts (9)

132-132: Remove unused debounceMs property.

The debounceMs property in the performance configuration is defined but never used in the code.

Remove the unused property:

 private performanceConfig = {
   enableSpatialIndexing: true,
   maxSelectorBatchSize: 50,
   useElementCache: true,
-  debounceMs: 16, // ~60fps
 };

183-227: Refactor duplicated custom element check.

The custom element check tagName.includes("-") is duplicated and the exclusion logic can be simplified.

Extract the custom element check and simplify the logic:

 const tagName = element.tagName.toLowerCase();
-
-const isCustomElement = tagName.includes("-");
-
-const standardExcludeSelectors = [
-  "script",
-  "style",
-  "meta",
-  "link",
-  "title",
-  "head",
-];
-if (!isCustomElement && standardExcludeSelectors.includes(tagName)) {
-  return null;
-}
-
+const isCustomElement = tagName.includes("-");
+const isExcludedTag = !isCustomElement && this.groupingConfig.excludeSelectors.includes(tagName);
+
+if (isExcludedTag) return null;
-if (this.groupingConfig.excludeSelectors.includes(tagName)) return null;

 // ... rest of the method ...

 const relevantAttributes = Array.from(element.attributes)
   .filter((attr) => {
-    if (isCustomElement) {
-      return ![
-        "id",
-        "style",
-        "data-reactid",
-        "data-react-checksum",
-      ].includes(attr.name.toLowerCase());
-    } else {
-      return (
-        !["id", "style", "data-reactid", "data-react-checksum"].includes(
-          attr.name.toLowerCase()
-        ) &&
-        (!attr.name.startsWith("data-") ||
-          attr.name === "data-type" ||
-          attr.name === "data-role")
-      );
-    }
+    const excludedAttrs = ["id", "style", "data-reactid", "data-react-checksum"];
+    if (excludedAttrs.includes(attr.name.toLowerCase())) return false;
+    
+    if (!isCustomElement && attr.name.startsWith("data-") && 
+        attr.name !== "data-type" && attr.name !== "data-role") {
+      return false;
+    }
+    
+    return true;
   })

333-366: Consider performance optimizations for large DOM trees.

The getAllVisibleElementsWithShadow method performs visibility checks on every element, which could be expensive for large DOM trees.

Consider adding a depth limit and lazy evaluation:

-private getAllVisibleElementsWithShadow(doc: Document): HTMLElement[] {
+private getAllVisibleElementsWithShadow(doc: Document, maxDepth: number = 10): HTMLElement[] {
   const allElements: HTMLElement[] = [];
   const visited = new Set<HTMLElement>();
 
-  const traverseContainer = (container: Document | ShadowRoot) => {
+  const traverseContainer = (container: Document | ShadowRoot, depth: number = 0) => {
+    if (depth > maxDepth) {
+      console.warn(`Max shadow DOM depth ${maxDepth} reached`);
+      return;
+    }
+    
     try {
       const elements = Array.from(container.querySelectorAll("*")).filter(
         (el) => {
           const rect = el.getBoundingClientRect();
           return rect.width > 0 && rect.height > 0; // Only visible elements
         }
       ) as HTMLElement[];
 
       elements.forEach((element) => {
         if (!visited.has(element)) {
           visited.add(element);
           allElements.push(element);
 
           // Traverse shadow DOM if it exists
           if (element.shadowRoot) {
-            traverseContainer(element.shadowRoot);
+            traverseContainer(element.shadowRoot, depth + 1);
           }
         }
       });
     } catch (error) {
       console.warn(`⚠️ Error traversing container:`, error);
     }
   };
 
   // Start from main document
-  traverseContainer(doc);
+  traverseContainer(doc, 0);
 
   return allElements;
 }

2821-2824: Add more context to error logging.

The error logging lacks context about which selector caused the error.

 } catch (error) {
-  console.error("Error in getChildSelectors:", error);
+  console.error("Error in getChildSelectors for parent selector:", parentSelector, error);
   return [];
 }

3214-3214: Make performance timing conditional on debug mode.

The console.time/timeEnd calls should be conditional to avoid console pollution in production.

Consider adding a debug flag:

+private debugMode = false;
+
 // In precomputeSelectorMappings method:
-console.time("Precomputing selector mappings");
+if (this.debugMode) console.time("Precomputing selector mappings");
 
 // ... rest of the method ...
 
-console.timeEnd("Precomputing selector mappings");
+if (this.debugMode) console.timeEnd("Precomputing selector mappings");

Also applies to: 3252-3252


3323-3326: Use optional chaining for cleaner code.

The null check can be simplified using optional chaining.

-const cachedElements = this.selectorElementCache.get(selector);
-if (cachedElements && cachedElements.includes(targetElement)) {
+const cachedElements = this.selectorElementCache.get(selector);
+if (cachedElements?.includes(targetElement)) {
   matches.push(selector);
 }

3327-3330: Remove unnecessary continue statement.

The continue statement at the end of a loop iteration is redundant.

 } catch (error) {
-  continue;
+  // Skip invalid selectors
 }

3405-3408: Remove unnecessary continue statement.

The continue statement at the end of a loop iteration is redundant.

 } catch (error) {
-  continue;
+  // Skip invalid selectors
 }

1-4233: Consider splitting this large file into smaller, focused modules.

The ClientSelectorGenerator class has grown to over 4000 lines and handles many responsibilities including DOM traversal, XPath generation, caching, spatial indexing, and shadow DOM handling. This violates the Single Responsibility Principle.

Consider refactoring into separate modules:

  • ShadowDOMTraverser - Shadow DOM traversal logic
  • XPathGenerator - XPath generation and optimization
  • SelectorCache - Caching and spatial indexing logic
  • ElementMatcher - Element similarity and matching logic
  • GroupAnalyzer - Element grouping logic

This would improve maintainability, testability, and make the codebase easier to understand.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0a53d2a and b5531c2.

📒 Files selected for processing (6)
  • src/components/browser/BrowserWindow.tsx (18 hunks)
  • src/components/recorder/DOMBrowserRenderer.tsx (8 hunks)
  • src/components/recorder/RightSidePanel.tsx (1 hunks)
  • src/context/browserSteps.tsx (6 hunks)
  • src/helpers/clientListExtractor.ts (6 hunks)
  • src/helpers/clientSelectorGenerator.ts (25 hunks)
🧰 Additional context used
🪛 Biome (1.9.4)
src/helpers/clientSelectorGenerator.ts

[error] 3324-3325: Change to an optional chain.

Unsafe fix: Change to an optional chain.

(lint/complexity/useOptionalChain)


[error] 3330-3332: Unnecessary continue statement

Unsafe fix: Delete the unnecessary continue statement

(lint/correctness/noUnnecessaryContinue)


[error] 3408-3410: Unnecessary continue statement

Unsafe fix: Delete the unnecessary continue statement

(lint/correctness/noUnnecessaryContinue)

🔇 Additional comments (11)
src/components/recorder/RightSidePanel.tsx (1)

463-498: Consistent shadow DOM metadata propagation

The changes properly extend the list settings object to include the isShadow boolean property at all relevant levels (fields, pagination, and settings). This ensures shadow DOM context is preserved throughout the list extraction process.

src/context/browserSteps.tsx (1)

8-8: Well-structured shadow DOM support in browser steps

The changes consistently implement the isShadow property across all relevant interfaces and maintain proper state management. The rename from shadow to isShadow improves clarity, and the logic correctly preserves existing values when updating steps.

Also applies to: 25-30, 42-42, 74-82, 116-116, 138-138

src/components/recorder/DOMBrowserRenderer.tsx (2)

105-105: Proper shadow DOM support and caching mode integration

The implementation correctly adds isCachingChildSelectors prop to prevent rendering during child selector caching operations. Shadow DOM metadata (isShadow and similarElements) is properly propagated through highlight and selection callbacks.

Also applies to: 158-158, 249-250, 304-307, 411-411, 760-760


1087-1087: Cursor style remains enforced by global rule

The removal of !important on the inline style at line 1087 does not weaken the pointer cursor: the surrounding CSS block in the same file still applies

/* src/components/recorder/DOMBrowserRenderer.tsx:807–809 */
* {
  cursor: "pointer" !important;
}

This global rule ensures the pointer cursor is always shown, and there are no other conflicting overrides.

src/helpers/clientListExtractor.ts (2)

9-9: Consistent property naming for shadow DOM context

The rename from shadow to isShadow maintains consistency across the codebase and improves clarity.

Also applies to: 22-22, 329-329


546-672: Well-structured XPath condition matching

The helper methods queryInsideContext, parseXPathPart, and elementMatchesCondition provide comprehensive support for various XPath conditions including attributes, classes, text content, and element counts. The implementation correctly handles edge cases and provides good debugging output.

src/components/browser/BrowserWindow.tsx (5)

150-161: LGTM! Well-structured state extension for shadow DOM support.

The addition of optional properties for shadow DOM awareness and similar elements is properly typed and maintains backward compatibility.


1200-1257: Modal restructuring looks good.

The attribute selection modal has been restructured with improved layout and custom styling. The implementation maintains functionality while providing better visual organization.


1519-1547: Well-implemented loading overlay.

The loading overlay provides clear visual feedback during child selector caching and properly prevents user interaction. The styling is consistent with the application theme.


514-529: Excellent implementation of similar elements highlighting.

The coordinate mapping and rendering logic for similar elements is well-implemented. The visual differentiation with dashed borders and labels provides clear user feedback.

Also applies to: 1399-1445


175-184: Ignore unused setPendingNotification warning

The setPendingNotification setter is used later in this file (e.g. it’s called to clear the notification around line 348). The original comment’s claim that it’s never called is therefore incorrect—no further action is needed here.

Likely an incorrect or invalid review comment.

Comment on lines +1509 to +1514
<style>{`
@keyframes spin {
0% { transform: rotate(0deg); }
100% { transform: rotate(360deg); }
}
`}</style>
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Remove duplicate CSS animation.

This CSS keyframe animation is already defined at lines 1292-1297. Having duplicate styles increases bundle size unnecessarily.

Remove this duplicate style block.

🤖 Prompt for AI Agents
In src/components/browser/BrowserWindow.tsx between lines 1509 and 1514, there
is a duplicate CSS keyframe animation named "spin" that is already defined at
lines 1292-1297. To reduce bundle size and avoid redundancy, remove this
duplicate style block entirely from lines 1509 to 1514.

Comment on lines 853 to 854
isShadow: highlighterData.elementInfo?.isShadowRoot,
attribute,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

⚠️ Potential issue

Standardize shadow DOM flag usage.

The code inconsistently uses elementInfo?.isShadowRoot and highlighterData?.isShadow to determine if an element is in shadow DOM. This could lead to bugs if these values differ.

Standardize on using highlighterData?.isShadow as the source of truth:

-                isShadow: highlighterData.elementInfo?.isShadowRoot,
+                isShadow: highlighterData.isShadow,

Apply this change to all occurrences for consistency.

Also applies to: 908-908, 987-987, 1081-1081, 1135-1135, 1148-1148

🤖 Prompt for AI Agents
In src/components/browser/BrowserWindow.tsx at lines 853-854 and also at lines
908, 987, 1081, 1135, and 1148, replace all uses of elementInfo?.isShadowRoot
with highlighterData?.isShadow to standardize the shadow DOM flag usage. This
means wherever isShadowRoot is used to determine shadow DOM presence, update it
to use highlighterData?.isShadow instead, ensuring consistent and reliable
checks throughout the file.

Comment on lines +328 to +356
setTimeout(() => {
try {
const childSelectors =
clientSelectorGenerator.getChildSelectors(
iframeElement.contentDocument as Document,
listSelector
);

clientSelectorGenerator.precomputeChildSelectorMappings(
childSelectors,
iframeElement.contentDocument as Document
);

setCachedChildSelectors([]);
setCachedChildSelectors(childSelectors);
} catch (error) {
console.error("Error during child selector caching:", error);
} finally {
setIsCachingChildSelectors(false);

if (currentSnapshot) {
const iframeElement = document.querySelector(
"#dom-browser-iframe"
) as HTMLIFrameElement;
if (iframeElement?.contentDocument) {
const childSelectors = clientSelectorGenerator.getChildSelectors(
iframeElement.contentDocument,
listSelector
);
setCachedChildSelectors(childSelectors);
if (pendingNotification) {
notify(pendingNotification.type, pendingNotification.message);
setPendingNotification(null);
}
}
}
}, 100);
} else {
setIsCachingChildSelectors(false);
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Improve error handling and remove arbitrary timeout.

The child selector caching implementation has several issues:

  1. The 100ms timeout seems arbitrary and could cause race conditions
  2. Errors are only logged to console without notifying the user
  3. The pendingNotification is never set, making lines 347-350 dead code

Consider this improved implementation:

-            setTimeout(() => {
              try {
                const childSelectors =
                  clientSelectorGenerator.getChildSelectors(
                    iframeElement.contentDocument as Document,
                    listSelector
                  );

                clientSelectorGenerator.precomputeChildSelectorMappings(
                  childSelectors,
                  iframeElement.contentDocument as Document
                );

                setCachedChildSelectors(childSelectors);
              } catch (error) {
                console.error("Error during child selector caching:", error);
+               notify("error", "Failed to cache child selectors. Please try again.");
              } finally {
                setIsCachingChildSelectors(false);
-
-               if (pendingNotification) {
-                 notify(pendingNotification.type, pendingNotification.message);
-                 setPendingNotification(null);
-               }
              }
-            }, 100);
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
setTimeout(() => {
try {
const childSelectors =
clientSelectorGenerator.getChildSelectors(
iframeElement.contentDocument as Document,
listSelector
);
clientSelectorGenerator.precomputeChildSelectorMappings(
childSelectors,
iframeElement.contentDocument as Document
);
setCachedChildSelectors([]);
setCachedChildSelectors(childSelectors);
} catch (error) {
console.error("Error during child selector caching:", error);
} finally {
setIsCachingChildSelectors(false);
if (currentSnapshot) {
const iframeElement = document.querySelector(
"#dom-browser-iframe"
) as HTMLIFrameElement;
if (iframeElement?.contentDocument) {
const childSelectors = clientSelectorGenerator.getChildSelectors(
iframeElement.contentDocument,
listSelector
);
setCachedChildSelectors(childSelectors);
if (pendingNotification) {
notify(pendingNotification.type, pendingNotification.message);
setPendingNotification(null);
}
}
}
}, 100);
} else {
setIsCachingChildSelectors(false);
}
}
try {
const childSelectors =
clientSelectorGenerator.getChildSelectors(
iframeElement.contentDocument as Document,
listSelector
);
clientSelectorGenerator.precomputeChildSelectorMappings(
childSelectors,
iframeElement.contentDocument as Document
);
setCachedChildSelectors(childSelectors);
} catch (error) {
console.error("Error during child selector caching:", error);
notify("error", "Failed to cache child selectors. Please try again.");
} finally {
setIsCachingChildSelectors(false);
}
🤖 Prompt for AI Agents
In src/components/browser/BrowserWindow.tsx lines 328 to 356, remove the
arbitrary 100ms setTimeout to avoid race conditions by executing the child
selector caching logic synchronously or using proper async handling. Enhance
error handling by notifying the user of errors instead of only logging to the
console. Also, remove or properly implement the pendingNotification logic since
it is currently never set and results in dead code.

Comment on lines +594 to +597
} catch (error) {
console.error("Error getting matching elements:", error);
return [];
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Improve error handling to distinguish between errors and empty results.

The catch block returns an empty array for all errors, making it difficult to distinguish between legitimate empty results and actual errors.

Consider logging the error type and potentially re-throwing for critical errors:

 } catch (error) {
-  console.error("Error getting matching elements:", error);
+  if (error instanceof SyntaxError || error instanceof DOMException) {
+    console.error("Invalid selector in getAllMatchingElements:", hoveredSelector, error);
+  } else {
+    console.error("Unexpected error in getAllMatchingElements:", error);
+  }
   return [];
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
} catch (error) {
console.error("Error getting matching elements:", error);
return [];
}
} catch (error) {
if (error instanceof SyntaxError || error instanceof DOMException) {
console.error("Invalid selector in getAllMatchingElements:", hoveredSelector, error);
} else {
console.error("Unexpected error in getAllMatchingElements:", error);
}
return [];
}
🤖 Prompt for AI Agents
In src/helpers/clientSelectorGenerator.ts around lines 594 to 597, the catch
block currently returns an empty array for all errors, which conflates actual
errors with legitimate empty results. Modify the error handling to log the error
type and message clearly, and re-throw critical errors instead of returning an
empty array. For non-critical errors, you can return an empty array but ensure
the distinction is clear in the logs.

@RohitR311 RohitR311 changed the title feat: child extraction + deep shadow dom feat(maxun-core): child extraction + deep shadow dom Jul 15, 2025
@RohitR311 RohitR311 added the Scope: Ext Issues/PRs related to core extraction label Jul 15, 2025
@amhsirak amhsirak merged commit e2d4073 into develop Jul 15, 2025
1 check was pending
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
maxun-core/src/browserSide/scraper.js (1)

538-637: Consider performance implications of custom XPath evaluation.

The enhanced evaluateXPath function adds valuable shadow DOM support, but the custom implementation could be significantly slower than native XPath evaluation, especially for complex queries or large DOM trees.

Consider these optimizations:

  1. Caching parsed XPath expressions to avoid repeated parsing:
+const xpathCache = new Map();

+const getCachedXPath = (xpath) => {
+  if (xpathCache.has(xpath)) {
+    return xpathCache.get(xpath);
+  }
+  const parsed = parseXPathExpression(xpath);
+  xpathCache.set(xpath, parsed);
+  return parsed;
+};
  1. Early exit conditions for common cases:
+// Quick check for simple tag-only selectors
+if (!xpath.includes('[') && !xpath.includes('(')) {
+  return document.querySelector(xpath.replace('//', '').replace('/', ' '));
+}
  1. Consider breaking this function into smaller, focused functions for better maintainability and testing.
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2985b41 and bc94690.

📒 Files selected for processing (1)
  • maxun-core/src/browserSide/scraper.js (2 hunks)
🔇 Additional comments (5)
maxun-core/src/browserSide/scraper.js (5)

427-445: LGTM: Well-structured XPath query helper function.

The queryInsideContext function provides a clean interface for querying elements within a specific context using parsed XPath parts. The error handling and filtering logic are appropriate.


447-458: LGTM: Robust XPath parsing implementation.

The parseXPathPart function correctly extracts tag names and conditions from XPath expressions using appropriate regex patterns. The default handling for wildcards and condition parsing is well-implemented.


460-468: LGTM: Simple and effective condition matching.

The elementMatchesConditions function provides a clear interface for checking multiple conditions against an element. The early return pattern is efficient.


470-536: Comprehensive XPath condition handling with good coverage.

The elementMatchesCondition function handles a wide variety of XPath conditions including attributes, text content, class containment, and child counts. The regex patterns are well-crafted and the logic is sound.


1209-1209: Shadow DOM flag propagation verified

The isShadow flag is correctly defined on selector objects (e.g. in ClientListExtractor and clientSelectorGenerator), carried through the UI layers (BrowserStepsContext → RightSidePanel → BrowserWindow), and ultimately passed into the scraper’s fields config and consumed by evaluateXPath. No further changes are needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Scope: Ext Issues/PRs related to core extraction Type: Enhancement Improvements to existing features

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants