feat: better, faster, smarter capture list #675

RohitR311 · 2025-07-06T11:13:15Z

What this PR does?

Improves the capture list action's accuracy in capturing data.

Summary by CodeRabbit

New Features
- Added advanced support for grouping structurally similar DOM elements, enabling group-aware highlighting and selection in the browser window.
- Introduced XPath-based selector generation and extraction, improving accuracy and flexibility for complex and nested DOM structures, including shadow DOM and iframes.
- Enhanced list extraction and highlighting to work seamlessly with grouped elements and improved attribute selection for both text and list modes.
Improvements
- Simplified network resource handling and DOM snapshot logic for better performance and reliability.
- Streamlined client-side list extraction and selector management, with improved user notifications for extraction issues.
- Optimized event handling and highlighter logic to reduce unnecessary processing and provide clearer visual feedback.
- Unified and enhanced scraping logic to support XPath selectors alongside CSS, with improved context traversal including shadow DOM and frames.
- Enhanced styling of rendered DOM snapshots with richer CSS including font embedding and scrollbar hiding.
- Extended global state management to track DOM mode and snapshot data for improved UI responsiveness.
Bug Fixes
- Improved handling of closed pages and edge cases during data extraction and DOM streaming.
Chores
- Added cleanup routines for selector grouping data to maintain optimal performance.

coderabbitai · 2025-07-06T11:13:20Z

Caution

Review failed

The pull request is closed.

Walkthrough

This update refactors backend DOM snapshotting to conditionally handle DOM streaming and screenshot modes with improved lifecycle management and cleanup. The frontend recorder and browser components gain group-aware DOM element highlighting and selection with XPath-based selectors. Extraction helpers unify and enhance XPath support and simplify list data extraction, improving robustness and flexibility in DOM mode.

Changes

File(s)	Change Summary
server/src/browser-management/classes/RemoteBrowser.ts	Added conditional handling of rerender events to emit DOM snapshots or screenshots based on rendering mode. Enhanced lifecycle methods to stop/start DOM streaming and screencast appropriately. Improved cleanup and event handling for DOM streaming and screenshot modes.
src/components/browser/BrowserWindow.tsx	Introduced group element support in DOM mode highlighting and selection, caching child selectors, handling grouped highlights with bounding rectangles and labels, and unified attribute selection flows for text and list modes.
src/components/recorder/DOMBrowserRenderer.tsx	Added support for cached child selectors in highlight generation and selection, refined highlight eligibility especially for groups, optimized event handling for capture modes, and enhanced DOM snapshot rendering with comprehensive CSS including font embedding and scrollbar hiding.
src/components/recorder/RightSidePanel.tsx	Added cleanup calls for selector generator after capture or discard actions, improved logging of selectors and extraction results, and added user notification on empty extraction outcomes.
src/helpers/clientListExtractor.ts	Refactored to unify element querying and extraction logic with robust XPath support, removed table-specific heuristics, simplified list data extraction, and improved attribute value extraction including URL resolution.
src/helpers/clientSelectorGenerator.ts	Added element grouping and fingerprinting system to identify structurally similar groups, switched selector generation to XPath with shadow DOM support, enhanced group-aware highlighter data generation, and added cleanup and utility methods for group and XPath handling.
server/src/workflow-management/classes/Generator.ts	Removed debug log for click coordinates and included `isDOMMode` flag in highlighter event data emitted to client.
maxun-core/src/browserSide/scraper.js	Overhauled scraping logic to support XPath selectors alongside CSS, unified iframe/frame/shadow DOM traversal, enhanced value extraction, added class similarity and fallback selector logic, and improved robustness and debugging output.
src/context/globalInfo.tsx	Added new state and context support for DOM mode and processed snapshots, including `isDOMMode`, `currentSnapshot`, and related setters and update functions for global DOM mode state management.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant BrowserWindow (Frontend)
    participant DOMBrowserRenderer
    participant clientSelectorGenerator
    participant RemoteBrowser (Backend)

    User->>BrowserWindow: Initiate element selection in DOM mode
    BrowserWindow->>DOMBrowserRenderer: Render highlights, pass cached selectors
    DOMBrowserRenderer->>clientSelectorGenerator: Generate highlight data (with group info)
    clientSelectorGenerator-->>DOMBrowserRenderer: Return selector, group info, rects
    DOMBrowserRenderer-->>BrowserWindow: On highlight/select, send group info
    BrowserWindow->>RemoteBrowser: Request DOM snapshot (debounced)
    RemoteBrowser-->>BrowserWindow: Return raw snapshot (no resource embedding)

Possibly related PRs

feat: faster screenshot output preview data #659: Adds direct screenshot capture and socket event handling in RemoteBrowser.ts, related to backend snapshot and streaming changes.
fix: simplify dom render + use minimal css for iframe #670: Modifies DOM snapshot rendering CSS in DOMBrowserRenderer.tsx, related to the enhanced CSS injection and styling changes here.
feat: recorder revamp #654: Introduced comprehensive DOM streaming alongside screencast in RemoteBrowser.ts, which this PR further refines and integrates with conditional snapshotting.

Suggested labels

Scope: Recorder

Suggested reviewers

amhsirak

Poem

A bunny with code on its mind,
Hopped through the DOM, new groups to find.
Gone are the caches, the CSS maze—
Now snapshots are snappy, in simpler ways.
With XPath and highlights, selectors anew,
This rabbit brings structure, for all to review!
🐇✨

📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between bd00405 and 1517f26.

📒 Files selected for processing (2)

server/src/browser-management/classes/RemoteBrowser.ts (9 hunks)
src/components/recorder/DOMBrowserRenderer.tsx (11 hunks)

✨ Finishing Touches

📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Explain this complex logic.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai explain this code block.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and explain its main purpose.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR.
@coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 11

🔭 Outside diff range comments (1)

src/helpers/clientSelectorGenerator.ts (1)

532-987: Significant code duplication in getElementInformation method.

The two branches (lines 539-848 and 850-980) contain nearly identical code for extracting element information. The only difference is the initial element selection method.

Extract the common element information extraction logic into a separate method:

+private extractElementInfo(element: HTMLElement): ElementInfo {
+  const ownerDocument = element.ownerDocument;
+  const frameElement = ownerDocument?.defaultView?.frameElement;
+  const isIframeContent = Boolean(frameElement);
+  const isFrameContent = frameElement?.tagName === "FRAME";
+  
+  const containingShadowRoot = element.getRootNode() as ShadowRoot;
+  const isShadowRoot = containingShadowRoot instanceof ShadowRoot;
+  
+  let info: ElementInfo = {
+    tagName: element?.tagName ?? "",
+    isIframeContent,
+    isFrameContent,
+    isShadowRoot,
+  };
+  
+  // ... rest of the extraction logic ...
+  
+  return info;
+}

public getElementInformation = (
  iframeDoc: Document,
  coordinates: Coordinates,
  listSelector: string,
  getList: boolean
) => {
  try {
    let element: HTMLElement | null = null;
    
    if (!getList || listSelector !== "") {
      // Use existing getDeepestElementFromPoint logic
      const el = getDeepestElementFromPoint(coordinates.x, coordinates.y);
      if (el) {
        const { parentElement } = el;
        element = parentElement?.tagName === "A" ? parentElement : el;
      }
    } else {
      element = this.findGroupedContainerAtPoint(
        coordinates.x,
        coordinates.y,
        iframeDoc
      );
    }
    
    if (!element) return null;
    
    return this.extractElementInfo(element);
  } catch (error) {
    const { message, stack } = error as Error;
    console.error("Error while retrieving selector:", message);
    console.error("Stack:", stack);
  }
};

♻️ Duplicate comments (1)

src/helpers/clientSelectorGenerator.ts (1)

988-1326: Code duplication in getRect method similar to getElementInformation.

This method also contains duplicated logic between the two branches. Consider applying the same refactoring pattern to extract common rectangle calculation logic.

🧹 Nitpick comments (9)

src/components/recorder/RightSidePanel.tsx (1)
250-261: Remove or properly configure debug logging.

These console.log statements appear to be debug logging that should not be in production code. Consider using a proper logging framework with configurable log levels or removing these logs entirely.

If logging is needed for production debugging, use a logging utility:
-          Object.entries(fields).forEach(([key, field]) => {
-            if (field.selectorObj?.selector) {
-              const isFieldXPath =
-                field.selectorObj.selector.startsWith("//") ||
-                field.selectorObj.selector.startsWith("/");
-              console.log(
-                `Field "${key}" selector:`,
-                field.selectorObj.selector,
-                `(XPath: ${isFieldXPath})`
-              );
-            }
-          });
+          if (process.env.NODE_ENV === 'development') {
+            Object.entries(fields).forEach(([key, field]) => {
+              if (field.selectorObj?.selector) {
+                const isFieldXPath =
+                  field.selectorObj.selector.startsWith("//") ||
+                  field.selectorObj.selector.startsWith("/");
+                console.log(
+                  `Field "${key}" selector:`,
+                  field.selectorObj.selector,
+                  `(XPath: ${isFieldXPath})`
+                );
+              }
+            });
+          }
src/helpers/clientListExtractor.ts (2)
91-100: Consider extracting XPath detection logic to reduce duplication.

The XPath detection pattern selector.startsWith("//") || selector.startsWith("/") || selector.startsWith("./") is repeated multiple times. Consider extracting this into a private helper method.

Add a helper method:
+  private isXPathSelector(selector: string): boolean {
+    return selector.startsWith("//") || selector.startsWith("/") || selector.startsWith("./");
+  }
Then use it throughout the method:
-      if (
-        selector.startsWith("//") ||
-        selector.startsWith("/") ||
-        selector.startsWith("./")
-      ) {
+      if (this.isXPathSelector(selector)) {
Also applies to: 123-131, 148-156, 164-177, 186-197

331-355: Good enhancements to text extraction with data attribute fallbacks.

The logic for checking common data attributes is well thought out.

Apply optional chaining as suggested by static analysis:
-          if (dataValue && dataValue.trim()) {
+          if (dataValue?.trim()) {
server/src/browser-management/classes/RemoteBrowser.ts (1)
445-466: Simple and effective debounce mechanism for DOM snapshots.

The 8-second quiet period approach is a good simplification from the previous complex network monitoring.

Consider adding a size limit to prevent excessive memory usage in edge cases:
           this.pendingNetworkRequests.push(url);
+          // Limit array size to prevent memory issues
+          if (this.pendingNetworkRequests.length > 1000) {
+            this.pendingNetworkRequests = this.pendingNetworkRequests.slice(-500);
+          }
src/helpers/clientSelectorGenerator.ts (5)
203-208: Add safeguard against potential circular references in parent traversal.

While circular parent references are rare in the DOM, the depth calculation could theoretically loop infinitely. The current limit of 20 is good, but consider adding an additional check for circular references.
// Calculate element depth
let depth = 0;
let parent = element.parentElement;
+const visited = new Set<Element>();
while (parent && depth < 20) {
+  if (visited.has(parent)) break;
+  visited.add(parent);
  depth++;
  parent = parent.parentElement;
}
479-495: Document the table cell transformation behavior.

The automatic transformation of table cells (TD/TH) to rows (TR) in list mode without a selector is a significant behavior that should be documented.
// In list mode without selector, transform table cells to rows and prioritize grouped elements
if (this.getList === true && this.listSelector === "") {
+  // Transform table cells to their parent rows for better list extraction
+  // This allows selecting entire rows when clicking on individual cells
  const transformedElements: HTMLElement[] = [];

  elementsAtPoint.forEach((element) => {
    if (element.tagName === "TD" || element.tagName === "TH") {
2365-2755: Well-implemented XPath generation, but consider breaking down this large method.

The XPath selector generation is robust with good handling of classes and positions. However, at nearly 400 lines, this method would benefit from being split into smaller, focused methods.

Consider extracting:

XPath generation logic into a separate class/module

Shadow DOM path handling into its own method

The two main branches (with/without listSelector) into separate methods

2850-2887: Good fallback strategy for XPath evaluation.

The fallback XPath evaluation handles common patterns well. However, the warning message uses emoji which might not be appropriate for all environments.
-console.warn("⚠️ Could not parse XPath pattern:", xpath);
+console.warn("Could not parse XPath pattern:", xpath);
and
-console.error("❌ Fallback XPath evaluation also failed:", error);
+console.error("Fallback XPath evaluation also failed:", error);
1-3577: Consider splitting this large file into smaller, focused modules.

At 3577 lines, this file has grown quite large and handles multiple responsibilities:

Element fingerprinting and grouping

XPath selector generation

Element information extraction

Rectangle calculations

Child selector generation

Consider refactoring into separate modules:

ElementGroupAnalyzer - fingerprinting and grouping logic

XPathSelectorGenerator - XPath generation and evaluation

ElementInfoExtractor - element information extraction

DOMTraversalHelper - iframe/shadow DOM traversal utilities

This would improve maintainability, testability, and make the codebase easier to understand.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a75471c and fec6e81.

📒 Files selected for processing (6)

server/src/browser-management/classes/RemoteBrowser.ts (11 hunks)
src/components/browser/BrowserWindow.tsx (14 hunks)
src/components/recorder/DOMBrowserRenderer.tsx (11 hunks)
src/components/recorder/RightSidePanel.tsx (3 hunks)
src/helpers/clientListExtractor.ts (10 hunks)
src/helpers/clientSelectorGenerator.ts (15 hunks)

🧰 Additional context used

🧬 Code Graph Analysis (4)

src/components/recorder/RightSidePanel.tsx (2)

src/helpers/clientListExtractor.ts (1)

clientListExtractor (559-559)

src/helpers/clientSelectorGenerator.ts (1)

clientSelectorGenerator (3576-3576)

src/components/recorder/DOMBrowserRenderer.tsx (1)

src/helpers/clientSelectorGenerator.ts (1)

clientSelectorGenerator (3576-3576)

src/helpers/clientListExtractor.ts (1)

maxun-core/src/browserSide/scraper.js (9)

selector (28-28)

nextElement (451-451)

part (219-219)

attrValue (558-558)

dataAttr (559-559)

style (563-563)

bgImage (564-564)

matches (566-566)

containers (753-753)

src/helpers/clientSelectorGenerator.ts (2)

maxun-core/src/browserSide/scraper.js (10)

depth (587-587)

allElements (692-692)

current (638-638)

parentElement (545-545)

position (734-734)

index (625-625)

selector (28-28)

sibling (626-626)

part (219-219)

lists (954-954)

src/context/browserActions.tsx (1)

ActionType (9-9)

🪛 Biome (1.9.4)

src/helpers/clientListExtractor.ts

[error] 348-348: Change to an optional chain.

Unsafe fix: Change to an optional chain.

(lint/complexity/useOptionalChain)

🔇 Additional comments (13)

src/components/recorder/RightSidePanel.tsx (1)

272-280: Good improvements for user experience and state management.

The warning notification when no data is extracted helps users identify selector issues, and the cleanup call ensures cached group data is properly cleared after list capture completes.

Also applies to: 530-530

src/components/recorder/DOMBrowserRenderer.tsx (2)

212-223: Good performance optimization with early returns.

The early returns when not in capture mode prevent unnecessary DOM traversal and highlight calculations, improving performance during normal browsing.

Also applies to: 356-359

250-279: Clean and well-structured highlight logic.

The simplified highlighting logic with clear conditions for groups, list selectors, and pagination modes makes the code much more maintainable and easier to understand.

src/components/browser/BrowserWindow.tsx (1)

1216-1297: Excellent implementation of group element highlighting.

The visual distinction between individual elements (solid border) and group elements (dashed border with labels) provides clear feedback to users. The real-time coordinate mapping ensures accurate highlighting positions.

src/helpers/clientListExtractor.ts (3)

24-49: Well-implemented XPath evaluation method.

The error handling and owner document retrieval logic are properly implemented.

51-84: Solid implementation for evaluating multiple XPath results.

Proper use of XPath snapshot API with element type filtering.

419-500: Excellent refactoring of the extraction logic.

The simplified approach using indexed XPath selectors is much cleaner than the previous table-specific logic. Good error handling and informative logging.

server/src/browser-management/classes/RemoteBrowser.ts (5)

619-620: Good optimization moving rrweb injection to context level.

This ensures the script is available across all pages without repeated injection.

706-710: Good defensive check for closed pages.

Prevents potential errors when trying to extract data from closed pages.

1600-1604: Proper cleanup of network request timeout.

Good addition to prevent dangling timeouts when stopping DOM streaming.

1623-1652: Comprehensive error handling in shutdown sequence.

The try-catch wrapper ensures cleanup continues even if individual operations fail, which is crucial for proper resource management.

1538-1542: Simplified snapshot capture without resource embedding.

The removal of complex resource processing aligns well with the PR's goal of simplifying the capture mechanism.

src/helpers/clientSelectorGenerator.ts (1)

3492-3496: Verify cleanup usage across all consuming components

We found that clientSelectorGenerator.cleanup() is invoked in the following location, which correctly prevents stale data in that component:

src/components/recorder/RightSidePanel.tsx (called inside the useEffect cleanup)

Please review other parts of the app that import or use clientSelectorGenerator and ensure cleanup() is called appropriately (e.g., in unmount handlers or mode-change effects) to avoid potential memory leaks.

coderabbitai · 2025-07-06T11:17:27Z

src/components/recorder/RightSidePanel.tsx

+            5
          );


🛠️ Refactor suggestion

⚠️ Potential issue

Avoid hardcoding the extraction limit.

The extraction limit is hardcoded to 5, which reduces flexibility. This should be configurable based on user settings or use a named constant.

Consider using the limit from the list step configuration:

+ const listStep = browserSteps.find(step => step.id === currentListId); + const limit = listStep?.limit || 10; // Default to 10 if not set + const extractedData = clientListExtractor.extractListData( iframeDoc, listSelector, fields, - 5 + limit );

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

5

);

const listStep = browserSteps.find(step => step.id === currentListId);

const limit = listStep?.limit || 10; // Default to 10 if not set

const extractedData = clientListExtractor.extractListData(

iframeDoc,

listSelector,

fields,

limit

);

🤖 Prompt for AI Agents

In src/components/recorder/RightSidePanel.tsx around lines 267 to 268, the extraction limit is hardcoded to 5, which limits flexibility. Replace the hardcoded value with a configurable parameter, such as retrieving the limit from the list step configuration or defining a named constant that can be adjusted based on user settings. This will make the extraction limit adaptable without code changes.

coderabbitai · 2025-07-06T11:17:27Z

src/components/recorder/DOMBrowserRenderer.tsx

    selector: string;
    elementInfo: ElementInfo | null;
    childSelectors?: string[];
+    groupInfo?: any;


🛠️ Refactor suggestion

⚠️ Potential issue

Improve type safety by defining proper types for groupInfo.

The groupInfo parameter uses any type, which reduces type safety and makes the code harder to maintain.

Define a proper interface for groupInfo:

+interface GroupInfo { + isGroupElement: boolean; + groupSize: number; + groupElements: HTMLElement[]; + groupFingerprint?: ElementFingerprint; +} interface RRWebDOMBrowserRendererProps { // ... other props onHighlight?: (data: { rect: DOMRect; selector: string; elementInfo: ElementInfo | null; childSelectors?: string[]; - groupInfo?: any; + groupInfo?: GroupInfo; }) => void; onElementSelect?: (data: { rect: DOMRect; selector: string; elementInfo: ElementInfo | null; childSelectors?: string[]; - groupInfo?: any; + groupInfo?: GroupInfo; }) => void;

Also applies to: 117-117

🤖 Prompt for AI Agents

In src/components/recorder/DOMBrowserRenderer.tsx at lines 110 and 117, the groupInfo parameter is typed as any, which reduces type safety. Define a proper TypeScript interface describing the expected structure of groupInfo and replace the any type with this interface to improve maintainability and type safety.

coderabbitai · 2025-07-06T11:17:27Z

src/components/recorder/DOMBrowserRenderer.tsx

+            // Get the group info for the current highlight
+            const highlighterData =
+              clientSelectorGenerator.generateDataForHighlighter(
+                { x: iframeX, y: iframeY },
+                iframeDoc,
+                true,
+                cachedChildSelectors
+              );
+
            onElementSelect({
              rect: currentHighlight.rect,
              selector: currentHighlight.selector,
              elementInfo: currentHighlight.elementInfo,
-              childSelectors: currentHighlight.childSelectors || [],
+              childSelectors:
+                cachedChildSelectors.length > 0
+                  ? cachedChildSelectors
+                  : highlighterData?.childSelectors || [],
+              groupInfo: highlighterData?.groupInfo,
            });


🛠️ Refactor suggestion

⚠️ Potential issue

Remove redundant selector generation call.

The code retrieves highlighter data again to get groupInfo, but this information should already be available from the currentHighlight state that was set during the last mouse move.

Use the existing highlight data:

if (currentHighlight && onElementSelect) { - // Get the group info for the current highlight - const highlighterData = - clientSelectorGenerator.generateDataForHighlighter( - { x: iframeX, y: iframeY }, - iframeDoc, - true, - cachedChildSelectors - ); - onElementSelect({ rect: currentHighlight.rect, selector: currentHighlight.selector, elementInfo: currentHighlight.elementInfo, childSelectors: cachedChildSelectors.length > 0 ? cachedChildSelectors - : highlighterData?.childSelectors || [], - groupInfo: highlighterData?.groupInfo, + : currentHighlight.childSelectors || [], + groupInfo: currentHighlight.groupInfo, }); }

To support this, update the currentHighlight state interface to include groupInfo.

🤖 Prompt for AI Agents

In src/components/recorder/DOMBrowserRenderer.tsx around lines 398 to 416, remove the redundant call to clientSelectorGenerator.generateDataForHighlighter since groupInfo is already present in currentHighlight. Update the currentHighlight state interface to include groupInfo if it does not already, then use currentHighlight.groupInfo directly in the onElementSelect call instead of calling generateDataForHighlighter again.

coderabbitai · 2025-07-06T11:17:27Z

src/components/browser/BrowserWindow.tsx

+          let cleanedSelector = highlighterData.selector;
+
+          setListSelector(cleanedSelector);
+          notify(
+            `info`,
+            t(
+              "browser_window.attribute_modal.notifications.group_select_success",
+              {
+                count: highlighterData.groupInfo.groupSize,
+              }
+            ) ||
+              `Selected group with ${highlighterData.groupInfo.groupSize} similar elements`
+          );
+          setCurrentListId(Date.now());
+          setFields({});
+
+          socket?.emit("setGetList", { getList: true });
+          socket?.emit("listSelector", { selector: cleanedSelector });
+
+          return;
+        }


🛠️ Refactor suggestion

Extract XPath cleaning logic to a utility function.

The selector cleaning logic for removing index patterns is duplicated in multiple places.

Create a utility function:

const cleanSelector = (selector: string): string => { // Remove index patterns for grouping if (selector.includes("[") && selector.match(/\[\d+\]/)) { return selector.replace(/\[\d+\]/g, ""); } // Clean XPath selectors if (selector.includes("/")) { const xpathParts = selector.split("/").filter(part => part); const cleanedParts = xpathParts.map(part => part.replace(/\[\d+\]/g, "")); if (cleanedParts.length > 0) { return "//" + cleanedParts.join("/"); } } return selector; };

Then use it in both locations:

- let cleanedSelector = highlighterData.selector; + const cleanedSelector = cleanSelector(highlighterData.selector);

Also applies to: 988-999

🤖 Prompt for AI Agents

In src/components/browser/BrowserWindow.tsx around lines 738 to 758 and also lines 988 to 999, the logic for cleaning selectors by removing index patterns is duplicated. Extract this logic into a reusable utility function named cleanSelector that removes numeric index patterns from selectors and XPath strings as described. Replace the existing inline cleaning code in both locations with calls to this new cleanSelector function to avoid duplication and improve maintainability.

coderabbitai · 2025-07-06T11:17:27Z

src/components/browser/BrowserWindow.tsx

+    useEffect(() => {
+        document.addEventListener("mousemove", onMouseMove, false);
+        if (socket) {
+            socket.off("highlighter", highlighterHandler);
+            socket.on("highlighter", highlighterHandler);
+        }
+        return () => {
+            document.removeEventListener("mousemove", onMouseMove);
+            if (socket) {
+                socket.off("highlighter", highlighterHandler);
+            }
+        };
+    }, [socket, highlighterHandler, getList, listSelector]);
+


⚠️ Potential issue

Remove duplicate useEffect hook.

This useEffect hook is an exact duplicate of the one at lines 647-660. This will cause event listeners to be registered twice, leading to potential performance issues and unexpected behavior.

Remove this duplicate block entirely.

🤖 Prompt for AI Agents

In src/components/browser/BrowserWindow.tsx around lines 662 to 675, there is a duplicate useEffect hook that repeats the same event listener and socket setup logic as the one at lines 647 to 660. This causes duplicate event listeners and socket handlers. Remove the entire useEffect block from lines 662 to 675 to eliminate the duplication and prevent multiple registrations.

coderabbitai · 2025-07-06T11:17:27Z

src/components/browser/BrowserWindow.tsx

+    const handleClick = (e: React.MouseEvent<HTMLDivElement>) => {
+      if (highlighterData) {
+        let shouldProcessClick = false;
+
+        if (!isDOMMode && canvasRef?.current) {
+          const canvasRect = canvasRef.current.getBoundingClientRect();
+          const clickX = e.clientX - canvasRect.left;
+          const clickY = e.clientY - canvasRect.top;
+          const highlightRect = highlighterData.rect;
+          const mappedRect =
+            coordinateMapper.mapBrowserRectToCanvas(highlightRect);
+
+          shouldProcessClick =
+            clickX >= mappedRect.left &&
+            clickX <= mappedRect.right &&
+            clickY >= mappedRect.top &&
+            clickY <= mappedRect.bottom;
+        } else {
+          shouldProcessClick = true;
+        }
+
+        if (shouldProcessClick) {
+          const options = getAttributeOptions(
+            highlighterData.elementInfo?.tagName || "",
+            highlighterData.elementInfo
+          );
+
+          if (getText === true) {
            if (options.length === 1) {
-                const attribute = options[0].value;
-                const data =
-                    attribute === "href"
-                    ? highlighterData.elementInfo?.url || ""
-                    : attribute === "src"
-                    ? highlighterData.elementInfo?.imageUrl || ""
-                    : highlighterData.elementInfo?.innerText || "";
-
-                addTextStep(
-                    "",
-                    data,
-                    {
-                        selector: highlighterData.selector,
-                        tag: highlighterData.elementInfo?.tagName,
-                        shadow: highlighterData.elementInfo?.isShadowRoot,
-                        attribute,
-                    },
-                    currentTextActionId || `text-${crypto.randomUUID()}`
-                );
+              const attribute = options[0].value;
+              const data =
+                attribute === "href"
+                  ? highlighterData.elementInfo?.url || ""
+                  : attribute === "src"
+                  ? highlighterData.elementInfo?.imageUrl || ""
+                  : highlighterData.elementInfo?.innerText || "";
+
+              addTextStep(
+                "",
+                data,
+                {
+                  selector: highlighterData.selector,
+                  tag: highlighterData.elementInfo?.tagName,
+                  shadow: highlighterData.elementInfo?.isShadowRoot,
+                  attribute,
+                },
+                currentTextActionId || `text-${crypto.randomUUID()}`
+              );
            } else {
-                setAttributeOptions(options);
-                setSelectedElement({
-                    selector: highlighterData.selector,
-                    info: highlighterData.elementInfo,
-                });
-                setShowAttributeModal(true);
+              setAttributeOptions(options);
+              setSelectedElement({
+                selector: highlighterData.selector,
+                info: highlighterData.elementInfo,
+              });
+              setShowAttributeModal(true);
            }
-        }
+          }

-        if (paginationMode && getList) {
+          if (paginationMode && getList) {
            if (
-                paginationType !== "" &&
-                paginationType !== "scrollDown" &&
-                paginationType !== "scrollUp" &&
-                paginationType !== "none"
+              paginationType !== "" &&
+              paginationType !== "scrollDown" &&
+              paginationType !== "scrollUp" &&
+              paginationType !== "none"
            ) {
-                setPaginationSelector(highlighterData.selector);
-                notify(
-                    `info`,
-                    t(
-                    "browser_window.attribute_modal.notifications.pagination_select_success"
-                    )
-                );
-                addListStep(
-                    listSelector!,
-                    fields,
-                    currentListId || 0,
-                    currentListActionId || `list-${crypto.randomUUID()}`,
-                    { type: paginationType, selector: highlighterData.selector }
-                );
-                socket?.emit("setPaginationMode", { pagination: false });
+              setPaginationSelector(highlighterData.selector);
+              notify(
+                `info`,
+                t(
+                  "browser_window.attribute_modal.notifications.pagination_select_success"
+                )
+              );
+              addListStep(
+                listSelector!,
+                fields,
+                currentListId || 0,
+                currentListActionId || `list-${crypto.randomUUID()}`,
+                { type: paginationType, selector: highlighterData.selector }
+              );
+              socket?.emit("setPaginationMode", { pagination: false });
            }
            return;
-        }
+          }

-        if (getList === true && !listSelector) {
+          if (getList === true && !listSelector) {
            let cleanedSelector = highlighterData.selector;
-            if (cleanedSelector.includes("nth-child")) {
-                cleanedSelector = cleanedSelector.replace(/:nth-child\(\d+\)/g, "");
+            if (
+              cleanedSelector.includes("[") &&
+              cleanedSelector.match(/\[\d+\]/)
+            ) {
+              cleanedSelector = cleanedSelector.replace(/\[\d+\]/g, "");
            }

            setListSelector(cleanedSelector);
            notify(
-                `info`,
-                t("browser_window.attribute_modal.notifications.list_select_success")
+              `info`,
+              t(
+                "browser_window.attribute_modal.notifications.list_select_success"
+              )
            );
            setCurrentListId(Date.now());
            setFields({});
+          } else if (getList === true && listSelector && currentListId) {
+            const attribute = options[0].value;
+            const data =
+              attribute === "href"
+                ? highlighterData.elementInfo?.url || ""
+                : attribute === "src"
+                ? highlighterData.elementInfo?.imageUrl || ""
+                : highlighterData.elementInfo?.innerText || "";

-            socket?.emit("setGetList", { getList: true });
-            socket?.emit("listSelector", { selector: cleanedSelector });
-        } else if (getList === true && listSelector && currentListId) {
            if (options.length === 1) {
-                const attribute = options[0].value;
-                let currentSelector = highlighterData.selector;
-
-                if (currentSelector.includes(">")) {
-                    const [firstPart, ...restParts] = currentSelector
-                        .split(">")
-                        .map((p) => p.trim());
-                    const listSelectorRightPart = listSelector
-                        .split(">")
-                        .pop()
-                        ?.trim()
-                        .replace(/:nth-child\(\d+\)/g, "");
-
-                    if (
-                    firstPart.includes("nth-child") &&
-                    firstPart.replace(/:nth-child\(\d+\)/g, "") ===
-                        listSelectorRightPart
-                    ) {
-                    currentSelector = `${firstPart.replace(
-                        /:nth-child\(\d+\)/g,
-                        ""
-                    )} > ${restParts.join(" > ")}`;
-                    }
-                }
+              let currentSelector = highlighterData.selector;
+
+              if (currentSelector.includes("/")) {
+                const xpathParts = currentSelector
+                  .split("/")
+                  .filter((part) => part);
+                const cleanedParts = xpathParts.map((part) => {
+                  return part.replace(/\[\d+\]/g, "");
+                });

-                const data =
-                    attribute === "href"
-                    ? highlighterData.elementInfo?.url || ""
-                    : attribute === "src"
-                    ? highlighterData.elementInfo?.imageUrl || ""
-                    : highlighterData.elementInfo?.innerText || "";
-
-                const newField: TextStep = {
-                    id: Date.now(),
-                    type: "text",
-                    label: `Label ${Object.keys(fields).length + 1}`,
-                    data: data,
-                    selectorObj: {
-                        selector: currentSelector,
-                        tag: highlighterData.elementInfo?.tagName,
-                        shadow: highlighterData.elementInfo?.isShadowRoot,
-                        attribute,
-                    },
-                };
-
-                const updatedFields = {
-                    ...fields,
-                    [newField.id]: newField,
-                };
-
-                setFields(updatedFields);
-
-                if (listSelector) {
-                    addListStep(
-                        listSelector,
-                        updatedFields,
-                        currentListId,
-                        currentListActionId || `list-${crypto.randomUUID()}`,
-                        { type: "", selector: paginationSelector }
-                    );
+                if (cleanedParts.length > 0) {
+                  currentSelector = "//" + cleanedParts.join("/");
                }
+              }
+
+              const newField: TextStep = {
+                id: Date.now(),
+                type: "text",
+                label: `Label ${Object.keys(fields).length + 1}`,
+                data: data,
+                selectorObj: {
+                  selector: currentSelector,
+                  tag: highlighterData.elementInfo?.tagName,
+                  shadow: highlighterData.elementInfo?.isShadowRoot,
+                  attribute,
+                },
+              };
+
+              const updatedFields = {
+                ...fields,
+                [newField.id]: newField,
+              };
+
+              setFields(updatedFields);
+
+              if (listSelector) {
+                addListStep(
+                  listSelector,
+                  updatedFields,
+                  currentListId,
+                  currentListActionId || `list-${crypto.randomUUID()}`,
+                  { type: "", selector: paginationSelector }
+                );
+              }
            } else {
-            setAttributeOptions(options);
-            setSelectedElement({
+              setAttributeOptions(options);
+              setSelectedElement({
                selector: highlighterData.selector,
                info: highlighterData.elementInfo,
-            });
-            setShowAttributeModal(true);
-            }
-        }
-        },
-        [
-            getText,
-            getList,
-            listSelector,
-            paginationMode,
-            paginationType,
-            fields,
-            currentListId,
-            currentTextActionId,
-            currentListActionId,
-            addTextStep,
-            addListStep,
-            notify,
-            socket,
-            t,
-            paginationSelector,
-        ]
-    );
-
-
-    const handleClick = (e: React.MouseEvent<HTMLDivElement>) => {
-        if (highlighterData && canvasRef?.current) {
-            const canvasRect = canvasRef.current.getBoundingClientRect();
-            const clickX = e.clientX - canvasRect.left;
-            const clickY = e.clientY - canvasRect.top;
-
-            const highlightRect = highlighterData.rect;
-
-            const mappedRect = coordinateMapper.mapBrowserRectToCanvas(highlightRect);
-            if (
-                clickX >= mappedRect.left &&
-                clickX <= mappedRect.right &&
-                clickY >= mappedRect.top &&
-                clickY <= mappedRect.bottom
-            ) {
-
-                const options = getAttributeOptions(highlighterData.elementInfo?.tagName || '', highlighterData.elementInfo);
-
-                if (getText === true) {
-                    if (options.length === 1) {
-                        // Directly use the available attribute if only one option is present
-                        const attribute = options[0].value;
-                        const data = attribute === 'href' ? highlighterData.elementInfo?.url || '' :
-                            attribute === 'src' ? highlighterData.elementInfo?.imageUrl || '' :
-                                highlighterData.elementInfo?.innerText || '';
-
-                        addTextStep('', data, {
-                            selector: highlighterData.selector,
-                            tag: highlighterData.elementInfo?.tagName,
-                            shadow: highlighterData.elementInfo?.isShadowRoot,
-                            attribute,
-                        }, currentTextActionId || `text-${crypto.randomUUID()}`);
-                    } else {
-                        // Show the modal if there are multiple options
-                        setAttributeOptions(options);
-                        setSelectedElement({
-                            selector: highlighterData.selector,
-                            info: highlighterData.elementInfo,
-                        });
-                        setShowAttributeModal(true);
-                    }
-                }
-
-                if (paginationMode && getList) {
-                    // Only allow selection in pagination mode if type is not empty, 'scrollDown', or 'scrollUp'
-                    if (paginationType !== '' && paginationType !== 'scrollDown' && paginationType !== 'scrollUp' && paginationType !== 'none') {
-                        setPaginationSelector(highlighterData.selector);
-                        notify(`info`, t('browser_window.attribute_modal.notifications.pagination_select_success'));
-                        addListStep(listSelector!, fields, currentListId || 0, currentListActionId || `list-${crypto.randomUUID()}`, { type: paginationType, selector: highlighterData.selector });
-                        socket?.emit('setPaginationMode', { pagination: false });
-                    }
-                    return;
-                }
-
-                if (getList === true && !listSelector) {
-                    let cleanedSelector = highlighterData.selector;
-                    if (cleanedSelector.includes('nth-child')) {
-                        cleanedSelector = cleanedSelector.replace(/:nth-child\(\d+\)/g, '');
-                    }
-
-                    setListSelector(cleanedSelector);
-                    notify(`info`, t('browser_window.attribute_modal.notifications.list_select_success'));
-                    setCurrentListId(Date.now());
-                    setFields({});
-                } else if (getList === true && listSelector && currentListId) {
-                    const attribute = options[0].value;
-                    const data = attribute === 'href' ? highlighterData.elementInfo?.url || '' :
-                        attribute === 'src' ? highlighterData.elementInfo?.imageUrl || '' :
-                            highlighterData.elementInfo?.innerText || '';
-                    // Add fields to the list
-                    if (options.length === 1) {
-                        const attribute = options[0].value;
-                        let currentSelector = highlighterData.selector;
-
-                        if (currentSelector.includes('>')) {
-                            const [firstPart, ...restParts] = currentSelector.split('>').map(p => p.trim());
-                            const listSelectorRightPart = listSelector.split('>').pop()?.trim().replace(/:nth-child\(\d+\)/g, '');
-
-                            if (firstPart.includes('nth-child') && 
-                                firstPart.replace(/:nth-child\(\d+\)/g, '') === listSelectorRightPart) {
-                                currentSelector = `${firstPart.replace(/:nth-child\(\d+\)/g, '')} > ${restParts.join(' > ')}`;
-                            }
-                        }
-
-                        const newField: TextStep = {
-                            id: Date.now(),
-                            type: 'text',
-                            label: `Label ${Object.keys(fields).length + 1}`,
-                            data: data,
-                            selectorObj: {
-                                selector: currentSelector,
-                                tag: highlighterData.elementInfo?.tagName,
-                                shadow: highlighterData.elementInfo?.isShadowRoot,
-                                attribute
-                            }
-                        };
-
-                        const updatedFields = {
-                            ...fields,
-                            [newField.id]: newField
-                          };
-
-                        setFields(updatedFields);
-
-                        if (listSelector) {
-                            addListStep(
-                                listSelector, 
-                                updatedFields, 
-                                currentListId, 
-                                currentListActionId || `list-${crypto.randomUUID()}`,
-                                { type: '', selector: paginationSelector }
-                            );
-                        }
-
-                    } else {
-                        setAttributeOptions(options);
-                        setSelectedElement({
-                            selector: highlighterData.selector,
-                            info: highlighterData.elementInfo
-                        });
-                        setShowAttributeModal(true);
-                    }
-                }
+              });
+              setShowAttributeModal(true);
            }
+          }
        }
+      }
    };


🛠️ Refactor suggestion

⚠️ Potential issue

Significant code duplication with handleDOMElementSelection.

The handleClick function duplicates most of the logic from handleDOMElementSelection, including text/list selection, attribute handling, and pagination logic. This violates the DRY principle and makes maintenance difficult.

Since handleDOMElementSelection is already designed to handle element selection in DOM mode, consider:

Remove handleClick entirely and use handleDOMElementSelection for both modes

Or extract the common logic into reusable functions

For option 1, update the onClick handler:

- const handleClick = (e: React.MouseEvent<HTMLDivElement>) => { - // ... entire function - }; return ( - <div onClick={handleClick} style={{ width: browserWidth }} id="browser-window"> + <div onClick={(e) => { + if (highlighterData && isDOMMode) { + handleDOMElementSelection(highlighterData); + } + }} style={{ width: browserWidth }} id="browser-window">

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents

In src/components/browser/BrowserWindow.tsx between lines 874 and 1041, the handleClick function duplicates much of the logic found in handleDOMElementSelection, causing maintenance issues. To fix this, remove the handleClick function entirely and update the onClick handler to use handleDOMElementSelection for both DOM and non-DOM modes. Alternatively, extract the shared logic into reusable helper functions and call those from both handlers to avoid duplication.

coderabbitai · 2025-07-06T11:17:27Z

src/helpers/clientListExtractor.ts

+      if (selector.startsWith("//") || selector.startsWith("/")) {
+        return this.evaluateXPathAll(rootElement, selector);


⚠️ Potential issue

Inconsistent XPath detection pattern compared to queryElement.

The XPath detection in queryElementAll only checks for // and / prefixes but misses ./ (relative XPath). This is inconsistent with queryElement which checks all three patterns.

Update all XPath detection patterns to include ./:

- if (selector.startsWith("//") || selector.startsWith("/")) { + if (selector.startsWith("//") || selector.startsWith("/") || selector.startsWith("./")) {

Also applies to: 242-242, 262-262, 270-270, 290-290

🤖 Prompt for AI Agents

In src/helpers/clientListExtractor.ts at lines 215-216, 242, 262, 270, and 290, the XPath detection logic in queryElementAll only checks for selectors starting with '//' or '/', missing the './' prefix for relative XPath expressions. To fix this inconsistency with queryElement, update the conditions to also check if the selector starts with './' in all these locations, ensuring all three XPath patterns ('//', '/', './') are detected properly.

coderabbitai · 2025-07-06T11:17:28Z

src/helpers/clientListExtractor.ts

+  // Helper method for single XPath evaluation
+  private evaluateXPathSingle = (
+    document: Document,
+    xpath: string
+  ): Element | null => {
+    try {
+      const result = document.evaluate(
+        xpath,
+        document,
+        null,
+        XPathResult.FIRST_ORDERED_NODE_TYPE,
+        null
+      );

-            if (element) {
-              const value = this.extractValue(element, attribute);
-              if (value !== null && value !== "") {
-                record[label] = value;
-              } else {
-                console.warn(
-                  `❌ No value for ${label} in container ${containerIndex + 1}`
-                );
-                record[label] = "";
-              }
-            } else {
-              console.warn(
-                `❌ Element not found for ${label} with selector:`,
-                selector
-              );
-              record[label] = "";
-            }
-          }
+      const element = result.singleNodeValue as Element | null;

-          if (Object.values(record).some((value) => value !== "")) {
-            nonTableData.push(record);
-          }
-        }
+      if (!element) {
+        console.warn(`❌ XPath found no element for: ${xpath}`);
      }

-      // Combine and limit results
-      const extractedData = [...tableData, ...nonTableData].slice(0, limit);
-
-      return extractedData;
+      return element;
    } catch (error) {
-      console.error("Error in client-side extractListData:", error);
-      return [];
+      console.error("❌ XPath evaluation failed:", xpath, error);
+      return null;
    }
  };


🛠️ Refactor suggestion

Remove redundant evaluateXPathSingle method.

This method is nearly identical to the existing evaluateXPath method. You can reuse evaluateXPath by passing document as the root element.

Replace the usage on line 464:

- element = this.evaluateXPathSingle(iframeDocument, indexedSelector); + element = this.evaluateXPath(iframeDocument, indexedSelector);

Then remove the evaluateXPathSingle method entirely (lines 531-556).

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

// Helper method for single XPath evaluation

private evaluateXPathSingle = (

document: Document,

xpath: string

): Element | null => {

try {

const result = document.evaluate(

xpath,

document,

null,

XPathResult.FIRST_ORDERED_NODE_TYPE,

null

);

if (element) {

const value = this.extractValue(element, attribute);

if (value !== null && value !== "") {

record[label] = value;

} else {

console.warn(

`❌ No value for ${label} in container ${containerIndex + 1}`

);

record[label] = "";

}

} else {

console.warn(

`❌ Element not found for ${label} with selector:`,

selector

);

record[label] = "";

}

}

const element = result.singleNodeValue as Element | null;

if (Object.values(record).some((value) => value !== "")) {

nonTableData.push(record);

}

}

if (!element) {

console.warn(`❌ XPath found no element for: ${xpath}`);

}

// Combine and limit results

const extractedData = [...tableData, ...nonTableData].slice(0, limit);

return extractedData;

return element;

} catch (error) {

console.error("Error in client-side extractListData:", error);

return [];

console.error("❌ XPath evaluation failed:", xpath, error);

return null;

}

};

// (The entire evaluateXPathSingle method has been removed)

🤖 Prompt for AI Agents

In src/helpers/clientListExtractor.ts from lines 531 to 556, remove the redundant evaluateXPathSingle method as it duplicates the existing evaluateXPath method. Instead, update the code at line 464 to call evaluateXPath with the document as the root element parameter. After replacing all usages, delete the entire evaluateXPathSingle method to avoid duplication and simplify the code.

coderabbitai · 2025-07-06T11:17:28Z

src/helpers/clientSelectorGenerator.ts

+    const matched = document.evaluate(
+      xpath,
+      document,
+      null,
+      XPathResult.ORDERED_NODE_SNAPSHOT_TYPE,
+      null
+    );
+    const matchedSet = new Set<HTMLElement>();
+    for (let i = 0; i < matched.snapshotLength; i++) {
+      matchedSet.add(matched.snapshotItem(i) as HTMLElement);
+    }
+
+    return xpath;
+  }


⚠️ Potential issue

Incomplete XPath validation logic.

The post-validation creates a Set of matched elements but doesn't verify that it matches exactly the group elements.

// 6. Post-validate that XPath matches all elements const matched = document.evaluate( xpath, document, null, XPathResult.ORDERED_NODE_SNAPSHOT_TYPE, null ); const matchedSet = new Set<HTMLElement>(); for (let i = 0; i < matched.snapshotLength; i++) { matchedSet.add(matched.snapshotItem(i) as HTMLElement); } + +// Verify the XPath matches all group elements and only group elements +const groupSet = new Set(elements); +if (matchedSet.size !== groupSet.size) { + console.warn(`XPath matched ${matchedSet.size} elements but group has ${groupSet.size}`); + return xpath; // Still return but with warning +} + +for (const element of elements) { + if (!matchedSet.has(element)) { + console.warn("XPath does not match all group elements"); + break; + } +} return xpath;

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

const matched = document.evaluate(

xpath,

document,

null,

XPathResult.ORDERED_NODE_SNAPSHOT_TYPE,

null

);

const matchedSet = new Set<HTMLElement>();

for (let i = 0; i < matched.snapshotLength; i++) {

matchedSet.add(matched.snapshotItem(i) as HTMLElement);

}

return xpath;

}

const matched = document.evaluate(

xpath,

document,

null,

XPathResult.ORDERED_NODE_SNAPSHOT_TYPE,

null

);

const matchedSet = new Set<HTMLElement>();

for (let i = 0; i < matched.snapshotLength; i++) {

matchedSet.add(matched.snapshotItem(i) as HTMLElement);

}

// Verify the XPath matches all group elements and only group elements

const groupSet = new Set(elements);

if (matchedSet.size !== groupSet.size) {

console.warn(

`XPath matched ${matchedSet.size} elements but group has ${groupSet.size}`

);

return xpath; // Still return but with warning

}

for (const element of elements) {

if (!matchedSet.has(element)) {

console.warn("XPath does not match all group elements");

break;

}

}

return xpath;

}

🤖 Prompt for AI Agents

In src/helpers/clientSelectorGenerator.ts around lines 3417 to 3430, the code collects matched elements into a Set but does not verify if these matched elements exactly correspond to the intended group elements. To fix this, add logic to compare the matchedSet with the expected group elements to ensure they are identical before returning the xpath. If they do not match exactly, handle the discrepancy appropriately, such as returning null or throwing an error.

coderabbitai · 2025-07-06T11:17:28Z

src/helpers/clientSelectorGenerator.ts

+    elementFingerprints.forEach((fingerprint, element) => {
+      if (processedElements.has(element)) return;
+
+      const currentGroup = [element];
+      processedElements.add(element);
+
+      // Find similar elements
+      elementFingerprints.forEach((otherFingerprint, otherElement) => {
+        if (processedElements.has(otherElement)) return;
+
+        const similarity = this.calculateSimilarity(
+          fingerprint,
+          otherFingerprint
+        );
+        if (similarity >= this.groupingConfig.similarityThreshold) {
+          currentGroup.push(otherElement);
+          processedElements.add(otherElement);
+        }
+      });


🛠️ Refactor suggestion

Performance concern: O(n²) complexity in element grouping.

The nested iteration over all elements to find similar ones could be slow on pages with many elements. For 1000 elements, this results in ~500,000 comparisons.

Consider implementing a more efficient grouping algorithm:

Hash-based grouping: First group elements by tag name and normalized classes to reduce comparison space

Locality-sensitive hashing: Use fingerprint signatures to create hash buckets

Early termination: Stop searching once enough group members are found

Example optimization:

-elementFingerprints.forEach((fingerprint, element) => { - if (processedElements.has(element)) return; - - const currentGroup = [element]; - processedElements.add(element); - - // Find similar elements - elementFingerprints.forEach((otherFingerprint, otherElement) => { - if (processedElements.has(otherElement)) return; - - const similarity = this.calculateSimilarity( - fingerprint, - otherFingerprint - ); - if (similarity >= this.groupingConfig.similarityThreshold) { - currentGroup.push(otherElement); - processedElements.add(otherElement); - } - }); +// Group by tag name first to reduce comparison space +const elementsByTag = new Map<string, Array<[HTMLElement, ElementFingerprint]>>(); +elementFingerprints.forEach((fingerprint, element) => { + const tag = fingerprint.tagName; + if (!elementsByTag.has(tag)) { + elementsByTag.set(tag, []); + } + elementsByTag.get(tag)!.push([element, fingerprint]); +}); + +// Process each tag group separately +elementsByTag.forEach((tagElements) => { + tagElements.forEach(([element, fingerprint]) => { + if (processedElements.has(element)) return; + + const currentGroup = [element]; + processedElements.add(element); + + // Only compare within same tag group + for (const [otherElement, otherFingerprint] of tagElements) { + if (processedElements.has(otherElement)) continue; + + const similarity = this.calculateSimilarity(fingerprint, otherFingerprint); + if (similarity >= this.groupingConfig.similarityThreshold) { + currentGroup.push(otherElement); + processedElements.add(otherElement); + } + } + // ... rest of group processing + }); +});

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents

In src/helpers/clientSelectorGenerator.ts around lines 340 to 358, the current nested loops cause O(n²) complexity by comparing every element fingerprint with all others, leading to performance issues on large pages. To fix this, first group elements by tag name and normalized class names to limit comparisons within smaller groups, then apply similarity checks only inside these groups. Optionally, implement locality-sensitive hashing on fingerprints to bucket similar elements and add early termination in the inner loop once a sufficient group size is reached, reducing unnecessary comparisons.

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (4)

src/components/browser/BrowserWindow.tsx (4)
150-150: Still using inline type definitions despite previous feedback.

The past review comments recommended defining proper interfaces for complex types like highlighterData and groupInfo, but this change continues to use inline type definitions. This makes the code harder to maintain and reuse.

Also applies to: 172-176

662-675: Remove duplicate useEffect hook.

This useEffect hook is an exact duplicate of the one at lines 647-660. This will cause event listeners to be registered twice, leading to potential performance issues and unexpected behavior.

Remove this duplicate block entirely.

874-1041: Significant code duplication with handleDOMElementSelection.

The handleClick function duplicates most of the logic from handleDOMElementSelection, including text/list selection, attribute handling, and pagination logic. This violates the DRY principle and makes maintenance difficult.

Since handleDOMElementSelection is already designed to handle element selection in DOM mode, consider:

Remove handleClick entirely and use handleDOMElementSelection for both modes

Or extract the common logic into reusable functions

988-999: Extract XPath cleaning logic to a utility function.

The selector cleaning logic for removing index patterns is duplicated in multiple places.

Create a utility function:
const cleanSelector = (selector: string): string => {
  // Remove index patterns for grouping
  if (selector.includes("[") && selector.match(/\[\d+\]/)) {
    return selector.replace(/\[\d+\]/g, "");
  }
  
  // Clean XPath selectors
  if (selector.includes("/")) {
    const xpathParts = selector.split("/").filter(part => part);
    const cleanedParts = xpathParts.map(part => part.replace(/\[\d+\]/g, ""));
    if (cleanedParts.length > 0) {
      return "//" + cleanedParts.join("/");
    }
  }
  
  return selector;
};

🧹 Nitpick comments (2)

src/components/browser/BrowserWindow.tsx (2)
488-513: Complex but correct group element handling.

The logic for mapping group element coordinates is comprehensive and appears correct. However, this coordinate mapping logic is quite complex and could benefit from extraction to a helper function for better maintainability.

Consider extracting the coordinate mapping logic:
const mapGroupElementCoordinates = (groupElements: HTMLElement[], iframeRect: DOMRect) => {
  return groupElements.map(element => {
    const elementRect = element.getBoundingClientRect();
    return {
      element,
      rect: new DOMRect(
        elementRect.x + iframeRect.left - IFRAME_BODY_PADDING,
        elementRect.y + iframeRect.top - IFRAME_BODY_PADDING,
        elementRect.width,
        elementRect.height
      ),
    };
  });
};
691-871: Consider breaking down this large function.

The handleDOMElementSelection function is quite comprehensive but at ~180 lines, it's quite large and handles multiple distinct concerns (pagination, group selection, list selection, text selection). Consider breaking it down into smaller, more focused functions for better maintainability and testability.

Consider extracting separate handlers:
const handlePaginationSelection = (highlighterData, paginationType) => { ... };
const handleGroupListSelection = (highlighterData) => { ... };
const handleListFieldSelection = (highlighterData) => { ... };
const handleTextSelection = (highlighterData) => { ... };

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fec6e81 and afe52a1.

📒 Files selected for processing (1)

src/components/browser/BrowserWindow.tsx (14 hunks)

🔇 Additional comments (4)

src/components/browser/BrowserWindow.tsx (4)

315-329: Good implementation of child selector caching.

The logic properly clears cached selectors and regenerates them when the list selector changes. The conditional check for currentSnapshot and iframe document ensures the operation only runs when the DOM is available.

442-445: Good performance optimization.

Adding an early return when neither getText nor getList is active prevents unnecessary processing of highlighter data when no capture modes are enabled.

1216-1296: Excellent group highlighting implementation.

The enhanced DOM mode highlighting provides clear visual feedback for both individual elements and group selections. The use of dashed borders for group elements and numbered labels helps users understand they're selecting a list. The conditional rendering logic properly handles different selection states.

1310-1310: Good prop addition for cached selector optimization.

Passing cachedChildSelectors to the DOMBrowserRenderer allows the renderer to leverage the cached selector data, which should improve performance by avoiding repeated selector calculations.

coderabbitai

Actionable comments posted: 2

♻️ Duplicate comments (4)

src/components/browser/BrowserWindow.tsx (4)

150-150: Define proper interfaces for complex types.

The inline type definition for highlighterData makes the code harder to maintain. Consider extracting it to a proper interface as suggested in the previous review.

656-669: Remove duplicate useEffect hook.

This useEffect hook duplicates the one at lines 641-654, causing event listeners to be registered twice.

868-1035: Significant code duplication with handleDOMElementSelection.

The handleClick function duplicates most of the logic from handleDOMElementSelection. Consider removing it and using handleDOMElementSelection for both modes as suggested in the previous review.

982-993: Extract XPath cleaning logic to a utility function.

The selector cleaning logic is duplicated here and in handleDOMElementSelection. Extract it to a reusable utility function as suggested in the previous review.

🧹 Nitpick comments (4)

server/src/browser-management/classes/RemoteBrowser.ts (1)
445-465: Solid debounced network monitoring implementation.

The network request tracking and timeout-based snapshot triggering is well implemented. Consider adding a maximum limit to pendingNetworkRequests array to prevent potential memory issues if many requests accumulate.
  this.pendingNetworkRequests.push(url);
+ 
+ // Prevent memory leaks by limiting tracked requests
+ if (this.pendingNetworkRequests.length > 100) {
+   this.pendingNetworkRequests = this.pendingNetworkRequests.slice(-50);
+ }
maxun-core/src/browserSide/scraper.js (3)
697-703: Use optional chaining for cleaner code.

Consider using optional chaining for better readability.

Apply this diff to use optional chaining:
 for (const attr of dataAttributes) {
   const dataValue = element.getAttribute(attr);
-  if (dataValue && dataValue.trim()) {
+  if (dataValue?.trim()) {
     textContent = dataValue.trim();
     break;
   }
 }
936-960: Consider escaping regex special characters in pattern matching.

The pattern matching logic might fail if listSelector contains regex special characters. Consider escaping the pattern before using it in replace operations.
 const createIndexedXPath = (
   childSelector,
   listSelector,
   containerIndex
 ) => {
+  // Escape regex special characters
+  const escapeRegex = (str) => str.replace(/[.*+?^${}()|[\]\\]/g, '\\$&');
+  
   // Check if the child selector contains the list selector pattern
   if (childSelector.includes(listSelector.replace("//", ""))) {
     // Replace the list selector part with indexed version
     const listPattern = listSelector.replace("//", "");
+    const escapedPattern = escapeRegex(listPattern);
     const indexedListSelector = `(${listSelector})[${containerIndex}]`;

     const indexedSelector = childSelector.replace(
-      `//${listPattern}`,
+      `//${escapedPattern}`,
       indexedListSelector
     );
425-1263: Consider modularizing the scrapeList function for better maintainability.

The scrapeList function is quite large (300+ lines). Consider extracting the XPath-based extraction logic (lines 1000-1047) and CSS-based table extraction logic (lines 1076-1224) into separate helper functions to improve readability and testability.

Example structure:
const extractXPathData = (containers, fields, limit, document) => {
  // XPath extraction logic
};

const extractCSSTableData = (containers, containerFields, limit) => {
  // CSS table extraction logic
};

window.scrapeList = async function ({ listSelector, fields, limit = 10 }) {
  // ... setup code ...
  
  if (isXPathSelector(listSelector)) {
    return extractXPathData(containers, fields, limit, document);
  }
  
  // ... CSS-based extraction ...
};

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 82152c5 and 9a065a3.

📒 Files selected for processing (3)

maxun-core/src/browserSide/scraper.js (2 hunks)
server/src/browser-management/classes/RemoteBrowser.ts (13 hunks)
src/components/browser/BrowserWindow.tsx (17 hunks)

🧰 Additional context used

🪛 Biome (1.9.4)

maxun-core/src/browserSide/scraper.js

[error] 616-616: Unnecessary continue statement

Unsafe fix: Delete the unnecessary continue statement

(lint/correctness/noUnnecessaryContinue)

[error] 699-699: Change to an optional chain.

Unsafe fix: Change to an optional chain.

(lint/complexity/useOptionalChain)

🔇 Additional comments (13)

src/components/browser/BrowserWindow.tsx (4)

308-324: LGTM! Well-implemented child selector caching.

The logic correctly populates cached child selectors when a list selector is set, with proper null checks and state clearing.

428-547: Well-structured group element handling in domHighlighterHandler.

The implementation correctly:

Guards against unnecessary processing when no actions are active

Maps group element rectangles to absolute coordinates

Maintains backward compatibility for non-group elements

Simplifies the highlighter data setting logic

Good refactoring!

1210-1291: Excellent implementation of group element highlighting!

The visual distinction between individual elements (solid border) and group elements (dashed border with labels) provides clear user feedback. The bounds checking ensures highlights stay within viewport boundaries.

1304-1304: Good optimization passing cached child selectors.

Passing cachedChildSelectors to the child component avoids redundant computation and improves performance.

server/src/browser-management/classes/RemoteBrowser.ts (8)

201-203: LGTM! Simplified network monitoring approach.

The replacement of complex network resource caching with a simple debounced mechanism using timeout and request tracking is a good simplification that should improve maintainability.

253-253: Good simplification of snapshot processing.

Returning raw snapshots without complex resource embedding aligns with the simplified approach and should improve performance while maintaining API compatibility.

590-594: Excellent use of rrweb's built-in capabilities.

Leveraging rrweb's inlineImages and collectFonts options instead of custom resource processing is a much cleaner approach that delegates complexity to the well-tested library.

610-618: Robust error handling for navigation scenarios.

The enhanced error handling properly distinguishes between expected navigation/closure events and actual errors, preventing unnecessary error emissions during normal operation.

652-655: Good cleanup practices.

Proper cleanup of the network request timeout prevents memory leaks and ensures no dangling timers remain active after stopping DOM streaming.

706-709: Good defensive programming.

Adding the page closure check prevents runtime errors and provides graceful degradation when attempting to extract data from a closed page.

1536-1536: Excellent robustness improvements.

The additional page closure checks, enhanced error handling in switchOff, and proper DOM/screenshot mode handling in changeTab significantly improve the reliability of the browser management system.

Also applies to: 1675-1703, 1825-1825, 1837-1843

619-619: Bundle File Path Verified – No Action Needed

The rrweb bundle is present at server/src/browser-management/classes/rrweb-bundle.js, and injecting it at the context level is both correct and efficient. Approving these changes.

maxun-core/src/browserSide/scraper.js (1)

963-1047: Well-architected unified scraping implementation!

The separation of XPath and CSS selector handling is clean and maintains backward compatibility while adding powerful new capabilities. The indexed XPath approach for container-based extraction is particularly elegant.

coderabbitai · 2025-07-06T16:21:11Z

server/src/browser-management/classes/RemoteBrowser.ts

+      // For each event, include userId to make sure events are handled for the correct browser
+      logger.log("debug", `Registering editor events for user: ${this.userId}`);

-        this.socket.on(`captureDirectScreenshot:${this.userId}`, async (settings) => {
-          logger.debug(`Direct screenshot capture requested for user ${this.userId}`);
+      this.socket.on(
+        `captureDirectScreenshot:${this.userId}`,
+        async (settings) => {
+          logger.debug(
+            `Direct screenshot capture requested for user ${this.userId}`
+          );
          await this.captureDirectScreenshot(settings);
-        });
+        }
+      );

-        // For backward compatibility
-        this.socket.on('captureDirectScreenshot', async (settings) => {
-          await this.captureDirectScreenshot(settings);
-        });
-
-        // Listen for specific events for this user
-        this.socket.on(`rerender:${this.userId}`, async () => {
-            logger.debug(`Rerender event received for user ${this.userId}`);
-            await this.makeAndEmitScreenshot();
-        });
-
-        // For backward compatibility, also listen to the general event
-        this.socket.on('rerender', async () => {
-            logger.debug(`General rerender event received, checking if for user ${this.userId}`);
-            await this.makeAndEmitScreenshot();
-        });
-
-        this.socket.on(`settings:${this.userId}`, (settings) => {
-            this.interpreterSettings = settings;
-            logger.debug(`Settings updated for user ${this.userId}`);
-        });
-
-        this.socket.on(`changeTab:${this.userId}`, async (tabIndex) => {
-            logger.debug(`Tab change to ${tabIndex} requested for user ${this.userId}`);
-            await this.changeTab(tabIndex);
-        });
-
-        this.socket.on(`addTab:${this.userId}`, async () => {
-            logger.debug(`New tab requested for user ${this.userId}`);
-            await this.currentPage?.context().newPage();
-            const lastTabIndex = this.currentPage ? this.currentPage.context().pages().length - 1 : 0;
-            await this.changeTab(lastTabIndex);
-        });
-
-        this.socket.on(`closeTab:${this.userId}`, async (tabInfo) => {
-            logger.debug(`Close tab ${tabInfo.index} requested for user ${this.userId}`);
-            const page = this.currentPage?.context().pages()[tabInfo.index];
-            if (page) {
-                if (tabInfo.isCurrent) {
-                    if (this.currentPage?.context().pages()[tabInfo.index + 1]) {
-                        // next tab
-                        await this.changeTab(tabInfo.index + 1);
-                    } else {
-                        //previous tab
-                        await this.changeTab(tabInfo.index - 1);
-                    }
-                }
-                await page.close();
-                logger.log(
-                    'debug',
-                    `Tab ${tabInfo.index} was closed for user ${this.userId}, new tab count: ${this.currentPage?.context().pages().length}`
-                );
+      // For backward compatibility
+      this.socket.on("captureDirectScreenshot", async (settings) => {
+        await this.captureDirectScreenshot(settings);
+      });
+
+      // Listen for specific events for this user
+      this.socket.on(`rerender:${this.userId}`, async () => {
+        logger.debug(`Rerender event received for user ${this.userId}`);
+        if (this.renderingMode === "dom") {
+          await this.makeAndEmitDOMSnapshot();
+        } else {
+          await this.makeAndEmitScreenshot();
+        }
+      });
+
+      this.socket.on("rerender", async () => {
+        logger.debug(
+          `General rerender event received, checking if for user ${this.userId}`
+        );
+        if (this.renderingMode === "dom") {
+          await this.makeAndEmitDOMSnapshot();
+        } else {
+          await this.makeAndEmitScreenshot();
+        }
+      });
+
+      this.socket.on(`settings:${this.userId}`, (settings) => {
+        this.interpreterSettings = settings;
+        logger.debug(`Settings updated for user ${this.userId}`);
+      });
+
+      this.socket.on(`changeTab:${this.userId}`, async (tabIndex) => {
+        logger.debug(
+          `Tab change to ${tabIndex} requested for user ${this.userId}`
+        );
+        await this.changeTab(tabIndex);
+      });
+
+      this.socket.on(`addTab:${this.userId}`, async () => {
+        logger.debug(`New tab requested for user ${this.userId}`);
+        await this.currentPage?.context().newPage();
+        const lastTabIndex = this.currentPage
+          ? this.currentPage.context().pages().length - 1
+          : 0;
+        await this.changeTab(lastTabIndex);
+      });
+
+      this.socket.on(`closeTab:${this.userId}`, async (tabInfo) => {
+        logger.debug(
+          `Close tab ${tabInfo.index} requested for user ${this.userId}`
+        );
+        const page = this.currentPage?.context().pages()[tabInfo.index];
+        if (page) {
+          if (tabInfo.isCurrent) {
+            if (this.currentPage?.context().pages()[tabInfo.index + 1]) {
+              // next tab
+              await this.changeTab(tabInfo.index + 1);
            } else {
-                logger.log('error', `Tab index ${tabInfo.index} out of range for user ${this.userId}`);
-            }
-        });
-
-        this.socket.on(`setViewportSize:${this.userId}`, async (data: { width: number, height: number }) => {
-            const { width, height } = data;
-            logger.log('debug', `Viewport size change to width=${width}, height=${height} requested for user ${this.userId}`);
-
-            // Update the browser context's viewport dynamically
-            if (this.context && this.browser) {
-                this.context = await this.browser.newContext({ viewport: { width, height } });
-                logger.log('debug', `Viewport size updated to width=${width}, height=${height} for user ${this.userId}`);
-            }
-        });
-
-        // For backward compatibility, also register the standard events
-        this.socket.on('settings', (settings) => this.interpreterSettings = settings);
-        this.socket.on('changeTab', async (tabIndex) => await this.changeTab(tabIndex));
-        this.socket.on('addTab', async () => {
-            await this.currentPage?.context().newPage();
-            const lastTabIndex = this.currentPage ? this.currentPage.context().pages().length - 1 : 0;
-            await this.changeTab(lastTabIndex);
-        });
-        this.socket.on('closeTab', async (tabInfo) => {
-            const page = this.currentPage?.context().pages()[tabInfo.index];
-            if (page) {
-                if (tabInfo.isCurrent) {
-                    if (this.currentPage?.context().pages()[tabInfo.index + 1]) {
-                        await this.changeTab(tabInfo.index + 1);
-                    } else {
-                        await this.changeTab(tabInfo.index - 1);
-                    }
-                }
-                await page.close();
+              //previous tab
+              await this.changeTab(tabInfo.index - 1);
            }
-        });
-        this.socket.on('setViewportSize', async (data: { width: number, height: number }) => {
-            const { width, height } = data;
-            if (this.context && this.browser) {
-                this.context = await this.browser.newContext({ viewport: { width, height } });
+          }
+          await page.close();
+          logger.log(
+            "debug",
+            `Tab ${tabInfo.index} was closed for user ${
+              this.userId
+            }, new tab count: ${this.currentPage?.context().pages().length}`
+          );
+        } else {
+          logger.log(
+            "error",
+            `Tab index ${tabInfo.index} out of range for user ${this.userId}`
+          );
+        }
+      });
+
+      this.socket.on(
+        `setViewportSize:${this.userId}`,
+        async (data: { width: number; height: number }) => {
+          const { width, height } = data;
+          logger.log(
+            "debug",
+            `Viewport size change to width=${width}, height=${height} requested for user ${this.userId}`
+          );
+
+          // Update the browser context's viewport dynamically
+          if (this.context && this.browser) {
+            this.context = await this.browser.newContext({
+              viewport: { width, height },
+            });
+            logger.log(
+              "debug",
+              `Viewport size updated to width=${width}, height=${height} for user ${this.userId}`
+            );
+          }
+        }
+      );
+
+      // For backward compatibility, also register the standard events
+      this.socket.on(
+        "settings",
+        (settings) => (this.interpreterSettings = settings)
+      );
+      this.socket.on(
+        "changeTab",
+        async (tabIndex) => await this.changeTab(tabIndex)
+      );
+      this.socket.on("addTab", async () => {
+        await this.currentPage?.context().newPage();
+        const lastTabIndex = this.currentPage
+          ? this.currentPage.context().pages().length - 1
+          : 0;
+        await this.changeTab(lastTabIndex);
+      });
+      this.socket.on("closeTab", async (tabInfo) => {
+        const page = this.currentPage?.context().pages()[tabInfo.index];
+        if (page) {
+          if (tabInfo.isCurrent) {
+            if (this.currentPage?.context().pages()[tabInfo.index + 1]) {
+              await this.changeTab(tabInfo.index + 1);
+            } else {
+              await this.changeTab(tabInfo.index - 1);
            }
-        });
+          }
+          await page.close();
+        }
+      });
+      this.socket.on(
+        "setViewportSize",
+        async (data: { width: number; height: number }) => {
+          const { width, height } = data;
+          if (this.context && this.browser) {
+            this.context = await this.browser.newContext({
+              viewport: { width, height },
+            });
+          }
+        }
+      );

-        this.socket.on('extractListData', async (data: { 
-            listSelector: string, 
-            fields: Record<string, any>,
-            currentListId: number, 
-            pagination: any 
+      this.socket.on(
+        "extractListData",
+        async (data: {
+          listSelector: string;
+          fields: Record<string, any>;
+          currentListId: number;
+          pagination: any;
        }) => {
-            if (this.currentPage) {
-                const extractedData = await this.extractListData(
-                    this.currentPage, 
-                    data.listSelector, 
-                    data.fields
-                );
-
-                this.socket.emit('listDataExtracted', {
-                    currentListId: data.currentListId,
-                    data: extractedData
-                });
-            }
-        });
+          if (this.currentPage) {
+            const extractedData = await this.extractListData(
+              this.currentPage,
+              data.listSelector,
+              data.fields
+            );
+
+            this.socket.emit("listDataExtracted", {
+              currentListId: data.currentListId,
+              data: extractedData,
+            });
+          }
+        }
+      );


🛠️ Refactor suggestion

Significant architectural improvement with room for optimization.

The user-specific event handling is a major improvement for multi-user support. However, the extensive code duplication between user-specific and general event handlers could be refactored for better maintainability.

Consider creating a helper function to reduce duplication:

+ private registerEventHandler(eventName: string, handler: (...args: any[]) => void) { + // Register user-specific event + this.socket.on(`${eventName}:${this.userId}`, handler); + // Register general event for backward compatibility + this.socket.on(eventName, handler); + }

Then use it to replace the duplicated event registrations. Also consider reducing the verbose logging for production environments.

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents

In server/src/browser-management/classes/RemoteBrowser.ts between lines 1289 and 1463, there is significant duplication in socket event handlers for user-specific and general events. To fix this, create a helper function that registers an event handler with optional userId scoping, then replace the duplicated event registrations by calling this helper with appropriate parameters. Additionally, refactor the logging to be less verbose or conditional based on environment to improve maintainability and performance.

coderabbitai · 2025-07-06T16:21:11Z

maxun-core/src/browserSide/scraper.js

+            } catch (e) {
+              console.warn(
+                `Cannot access ${element.tagName.toLowerCase()} content:`,
+                e
+              );
+              continue;
+            }
+          } else {


⚠️ Potential issue

Remove unnecessary continue statement.

The continue statement at line 616 is unnecessary as it's the last statement in the loop iteration.

Apply this diff to remove the unnecessary continue:

} catch (e) { console.warn( `Cannot access ${element.tagName.toLowerCase()} content:`, e ); - continue; }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

} catch (e) {

console.warn(

`Cannot access ${element.tagName.toLowerCase()} content:`,

e

);

continue;

}

} else {

} catch (e) {

console.warn(

`Cannot access ${element.tagName.toLowerCase()} content:`,

e

);

}

} else {

🧰 Tools

🪛 Biome (1.9.4)

[error] 616-616: Unnecessary continue statement

Unsafe fix: Delete the unnecessary continue statement

(lint/correctness/noUnnecessaryContinue)

🤖 Prompt for AI Agents

In maxun-core/src/browserSide/scraper.js around lines 611 to 618, the continue statement inside the catch block is redundant because it is the last statement in the loop iteration. Remove the continue statement to clean up the code without changing its behavior.

coderabbitai

Actionable comments posted: 0

🔭 Outside diff range comments (1)

src/components/recorder/DOMBrowserRenderer.tsx (1)

880-890: Fix undefined variable reference.

The code references minimalCSS which is not defined in the current scope. It should use the cssInjection variable that was constructed earlier.

Apply this fix:

         if (headTagRegex.test(rebuiltHTML)) {
           rebuiltHTML = rebuiltHTML.replace(
             "<head>",
-            `<head><base href="${snapshotData.baseUrl}">${minimalCSS}`
+            `<head>${cssInjection}`
           );
         } else if (rebuiltHTML.includes("<html>")) {
           rebuiltHTML = rebuiltHTML.replace(
             "<html>",
-            `<html><head><base href="${snapshotData.baseUrl}">${minimalCSS}</head>`
+            `<html><head>${cssInjection}</head>`
           );
         }

♻️ Duplicate comments (3)

src/components/recorder/RightSidePanel.tsx (1)

263-263: Avoid hardcoding the extraction limit.

The extraction limit is still hardcoded to 5, which reduces flexibility. This should be configurable based on user settings or use a named constant.

Consider using the limit from the list step configuration:
+          const listStep = browserSteps.find(step => step.id === currentListId);
+          const limit = listStep?.limit || 10; // Default to 10 if not set
+
           const extractedData = clientListExtractor.extractListData(
             iframeDoc,
             listSelector,
             fields,
-            5
+            limit
           );

src/components/recorder/DOMBrowserRenderer.tsx (2)

110-110: Improve type safety by defining proper types for groupInfo.

The groupInfo parameter uses any type, which reduces type safety and makes the code harder to maintain.

Define a proper interface for groupInfo:

+interface GroupInfo {
+  isGroupElement: boolean;
+  groupSize: number;
+  groupElements: HTMLElement[];
+  groupFingerprint?: ElementFingerprint;
+}

 interface RRWebDOMBrowserRendererProps {
   // ... other props
   onHighlight?: (data: {
     rect: DOMRect;
     selector: string;
     elementInfo: ElementInfo | null;
     childSelectors?: string[];
-    groupInfo?: any;
+    groupInfo?: GroupInfo;
   }) => void;
   onElementSelect?: (data: {
     rect: DOMRect;
     selector: string;
     elementInfo: ElementInfo | null;
     childSelectors?: string[];
-    groupInfo?: any;
+    groupInfo?: GroupInfo;
   }) => void;

Also applies to: 117-117

398-416: Remove redundant selector generation call.

The code retrieves highlighter data again to get groupInfo, but this information should already be available from the currentHighlight state that was set during the last mouse move.

Use the existing highlight data:

 if (currentHighlight && onElementSelect) {
-  // Get the group info for the current highlight
-  const highlighterData =
-    clientSelectorGenerator.generateDataForHighlighter(
-      { x: iframeX, y: iframeY },
-      iframeDoc,
-      true,
-      cachedChildSelectors
-    );
-
   onElementSelect({
     rect: currentHighlight.rect,
     selector: currentHighlight.selector,
     elementInfo: currentHighlight.elementInfo,
     childSelectors:
       cachedChildSelectors.length > 0
         ? cachedChildSelectors
-        : highlighterData?.childSelectors || [],
-    groupInfo: highlighterData?.groupInfo,
+        : currentHighlight.childSelectors || [],
+    groupInfo: currentHighlight.groupInfo,
   });
 }

To support this, update the currentHighlight state interface to include groupInfo.

🧹 Nitpick comments (1)

src/components/recorder/DOMBrowserRenderer.tsx (1)
800-870: Consider simplifying the comprehensive CSS injection.

While the enhanced CSS provides good styling coverage, the complexity might introduce unexpected side effects. Consider breaking this into smaller, more focused CSS blocks for easier maintenance.

The current implementation handles fonts, external stylesheets, scrollbar hiding, and general styling all in one large block. Consider modularizing:
+const createFontCSS = (fonts) => fonts.map(font => `...`).join('\n');
+const createScrollbarCSS = () => `/* Hide scrollbars */ ...`;
+const createBaseCSS = () => `/* Base styles */ ...`;
+
+const enhancedCSS = [
+  createBaseCSS(),
+  createScrollbarCSS(),
+  ...(snapshotData.resources.fonts?.length > 0 ? [createFontCSS(snapshotData.resources.fonts)] : []),
+  ...(snapshotData.resources.stylesheets?.length > 0 ? [externalCSS] : [])
+].join('\n\n');

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9a065a3 and bd00405.

📒 Files selected for processing (3)

src/components/recorder/DOMBrowserRenderer.tsx (11 hunks)
src/components/recorder/RightSidePanel.tsx (11 hunks)
src/context/globalInfo.tsx (6 hunks)

🧰 Additional context used

🧬 Code Graph Analysis (1)

src/components/recorder/DOMBrowserRenderer.tsx (1)

src/helpers/clientSelectorGenerator.ts (1)

clientSelectorGenerator (3576-3576)

🪛 Biome (1.9.4)

src/components/recorder/RightSidePanel.tsx

[error] 100-100: Change to an optional chain.

Unsafe fix: Change to an optional chain.

(lint/complexity/useOptionalChain)

🔇 Additional comments (15)

src/components/recorder/RightSidePanel.tsx (6)

25-25: LGTM: Good addition of selector generator import.

The import is properly added and will be used for cleanup operations throughout the file.

57-57: Excellent: Improved state management with global store.

The integration with updateDOMMode from the global store provides better centralized state management for DOM mode and snapshot data.

88-88: LGTM: Consistent global DOM mode usage.

All DOM mode state updates now properly use the global updateDOMMode function, ensuring consistent state management across the application.

Also applies to: 94-94, 101-101, 116-116

246-257: Helpful debugging enhancement for XPath selectors.

The logging provides valuable insights into field selector types and XPath detection, which will aid in troubleshooting extraction issues.

268-276: Excellent user feedback for extraction issues.

The warning notification when no data is extracted provides valuable feedback to users about potential selector problems, improving the user experience.

460-460: Proper cleanup implementation.

The clientSelectorGenerator.cleanup() calls are strategically placed after capture operations complete or are discarded, ensuring proper state reset and preventing memory leaks.

Also applies to: 529-529, 659-659, 690-690, 709-709

src/context/globalInfo.tsx (4)

30-63: Well-structured ProcessedSnapshot interface.

The interface comprehensively covers all aspects of snapshot data including resources, metadata, and processing statistics. The typing is thorough and will help prevent runtime errors.

124-128: Excellent extension of GlobalInfo interface.

The addition of DOM mode state and snapshot management properties follows good TypeScript practices with proper nullable types and clear method signatures.

158-159: Proper state initialization.

The default values and state hooks are correctly implemented with appropriate initial values (false for isDOMMode and null for currentSnapshot).

Also applies to: 186-187

212-222: Smart DOM mode update logic.

The updateDOMMode function handles the relationship between DOM mode and snapshot data correctly - clearing the snapshot when DOM mode is disabled and allowing optional snapshot updates.

src/components/recorder/DOMBrowserRenderer.tsx (5)

101-101: Good addition of cachedChildSelectors support.

The new prop enables cached child selectors to influence highlight generation, which aligns with the PR's goal of improving capture accuracy.

Also applies to: 150-150

212-222: Efficient early exit optimization.

The early return when not in capture mode prevents unnecessary processing and improves performance, which is a good optimization.

228-229: Proper integration of cached child selectors.

The integration of cachedChildSelectors into the highlight data generation ensures consistent selector behavior across the highlighting system.

250-275: Improved highlighting logic for group elements.

The refactored logic properly handles group element highlighting in the first phase and child selector validation in the second phase, with helpful debug logging.

356-358: Good performance optimization.

Skipping mouse move processing when not in capture mode reduces unnecessary computation and improves overall performance.

RohitR311 added 6 commits July 6, 2025 16:14

feat: revamp selector generator with xpath logic

8bf929f

feat: add xpath highlighting logic

315cf94

feat: add child selector caching, revamp dom highlighting

33d9522

feat: extract data via xpath logic

369b56f

feat: revamp client list extractor with xpath logic

5f1084c

feat: rm network monitoring and caching

fec6e81

RohitR311 added Type: Enhancement Improvements to existing features Status: Work In Progess This issue/PR is actively being worked on labels Jul 6, 2025

coderabbitai bot reviewed Jul 6, 2025

View reviewed changes

fix: list selection translation

afe52a1

coderabbitai bot reviewed Jul 6, 2025

View reviewed changes

RohitR311 added 6 commits July 6, 2025 17:00

feat: add dom mode check

82152c5

feat: add update dom mode

a677cb9

feat: update register editor events

be42c1d

feat: xpath support for core extraction

9a065a3

feat: cleanup on discard and confirm

330997f

feat: add dom update methods

f21ad1e

RohitR311 requested a review from amhsirak July 6, 2025 16:18

RohitR311 removed the Status: Work In Progess This issue/PR is actively being worked on label Jul 6, 2025

coderabbitai bot reviewed Jul 6, 2025

View reviewed changes

amhsirak approved these changes Jul 6, 2025

View reviewed changes

Merge branch 'develop' into smart-list

bd00405

coderabbitai bot reviewed Jul 6, 2025

View reviewed changes

Merge branch 'develop' into smart-list

1517f26

RohitR311 merged commit 9681f93 into develop Jul 6, 2025
1 check was pending

coderabbitai bot mentioned this pull request Jul 15, 2025

feat(maxun-core): child extraction + deep shadow dom #694

Merged

-          );
+          const listStep = browserSteps.find(step => step.id === currentListId);
+          const limit = listStep?.limit || 10; // Default to 10 if not set
+          const extractedData = clientListExtractor.extractListData(
+            iframeDoc,
+            listSelector,
+            fields,
+            limit
+          );

		if (selector.startsWith("//") \|\| selector.startsWith("/")) {
		return this.evaluateXPathAll(rootElement, selector);

feat: better, faster, smarter capture list #675

feat: better, faster, smarter capture list #675

Uh oh!

Conversation

RohitR311 commented Jul 6, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Jul 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

Changes

Sequence Diagram(s)

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jul 6, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jul 6, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jul 6, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jul 6, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jul 6, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jul 6, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jul 6, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jul 6, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jul 6, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jul 6, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jul 6, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jul 6, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

RohitR311 commented Jul 6, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jul 6, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)