Skip to content

Conversation

@amhsirak
Copy link
Member

@amhsirak amhsirak commented Dec 20, 2024

Summary by CodeRabbit

  • New Features

    • Enhanced iframe scraping capabilities across multiple components.
    • Improved selector handling for complex web page structures.
    • Added support for traversing and extracting data from nested iframes.
    • Introduced new filtering for selectors to refine workflow execution.
  • Bug Fixes

    • Resolved issues with URL resolution and element retrieval in iframe contexts.
    • Improved cross-origin access error handling during web scraping.
  • Refactor

    • Significant updates to selector and scraping logic.
    • Enhanced type definitions for more robust iframe selector management.

@amhsirak amhsirak marked this pull request as draft December 20, 2024 13:02
@amhsirak amhsirak added the Type: Feature New features label Dec 20, 2024
@coderabbitai
Copy link

coderabbitai bot commented Dec 20, 2024

Walkthrough

This pull request introduces comprehensive enhancements for handling iframes across multiple components of the application. The changes span browser-side scraping, server-side selector management, and type definitions, focusing on improving the ability to traverse and extract data from complex DOM structures containing iframes and shadow DOMs. The modifications introduce new methods for iframe selector handling, element retrieval, and cross-origin access management, creating a more robust mechanism for interacting with nested iframe content.

Changes

File Change Summary
maxun-core/src/browserSide/scraper.js Enhanced findAllElements for iframe and shadow DOM support, modified getElementValue for relative URL resolution, refactored scrapeSchema and scrapeList for better data gathering.
maxun-core/src/interpret.ts Added removeSpecialSelectors method to filter out selectors with :>> or >>, updated runLoop to utilize this new method.
server/src/types/index.ts Introduced IframeSelector interface, updated Selectors to include iframeSelector and allow null for shadowSelector.
server/src/workflow-management/selector.ts Added getDeepestElementFromPoint for improved element retrieval in nested iframes, expanded element information with iframe context details.
server/src/workflow-management/utils.ts Updated getBestSelectorForAction to prioritize iframeSelector.
src/components/organisms/BrowserWindow.tsx Added isIframeContent property to ElementInfo, modified highlighterHandler to manage iframe content.

Sequence Diagram

sequenceDiagram
    participant Browser
    participant Scraper
    participant Interpreter
    participant Selector

    Browser->>Scraper: Initiate scraping
    Scraper->>Scraper: Find elements with iframe selectors
    Scraper->>Interpreter: Process workflow
    Interpreter->>Interpreter: Remove special selectors
    Interpreter->>Selector: Retrieve element information
    Selector->>Selector: Traverse nested iframes
    Selector-->>Interpreter: Return element details
    Interpreter-->>Scraper: Provide processed data
    Scraper-->>Browser: Return scraped content
Loading

Possibly related PRs

  • feat: shadow dom selection #296: The changes in this PR enhance the scraper.js file to improve scraping capabilities for elements within shadow DOMs, which directly relates to the modifications made in the main PR that also focus on handling shadow DOM contexts.

Suggested Labels

Status: In Review

Poem

🐰 Hopping through iframes with glee,
Selectors dancing, wild and free!
Nested content, no longer a maze,
Our scraper leaps through browsing ways.
Code rabbit's magic makes data sing! 🌐


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@amhsirak amhsirak added the Status: Work In Progess This issue/PR is actively being worked on label Dec 20, 2024
@amhsirak amhsirak marked this pull request as ready for review January 4, 2025 16:38
Copy link
Member Author

@amhsirak amhsirak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@RohitR311 resolve merge conflicts

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (17)
src/components/organisms/BrowserWindow.tsx (4)

16-16: Add JSDoc comment for clarity.
Consider adding a short JSDoc comment explaining the purpose of the isIframeContent property. This helps maintainers and IDEs quickly understand why an element might be flagged as iframe content.


121-123: Replace console.log with a proper logger or remove debug statements.
Leaving plain console logs in production code can clutter the console and potentially leak internal details.


132-132: Clarify the code comment.
Ensure that the comment precisely reflects the condition. It might be helpful to mention the effect or purpose of skipping the types 'none', 'scrollDown', and 'scrollUp'.


141-161: Validate iframe logic for correctness.
This block introduces sophisticated checks for iframe selectors and nested DOM structures. It appears logically sound for capturing iframe child elements. However, consider extracting repeated checks (e.g., data.selector.includes(':>>')) into a dedicated utility function for readability and maintainability.

maxun-core/src/browserSide/scraper.js (6)

191-196: Refactor suggestion for early returns.
In window.scrapeSchema, the code seems readable, but consider placing utility logic in separate helper functions to keep the main function concise.

Also applies to: 203-203


248-248: Unnecessary continue statement.
As flagged by static analysis, the continue; statement might not be needed. Removing it will not change the logic because there's no subsequent code in this loop iteration.

-            continue;
🧰 Tools
🪛 Biome (1.9.4)

[error] 248-248: Unnecessary continue statement

Unsafe fix: Delete the unnecessary continue statement

(lint/correctness/noUnnecessaryContinue)


284-355: Efficient fallback approach.
This fallback logic for scrapeSchema is a useful safeguard when MBE-based scraping misses elements. The code is large; a future refactoring might improve readability.


367-427: Good approach for iframe querying in scrapeList.
The new queryIframe and queryIframeAll functions are well-structured. This enables robust element retrieval within nested iframes. Consider adding more unit tests to ensure edge cases (deeply nested iframes, cross-origin frames) are handled.

🧰 Tools
🪛 Biome (1.9.4)

[error] 416-416: Unnecessary continue statement

Unsafe fix: Delete the unnecessary continue statement

(lint/correctness/noUnnecessaryContinue)


Line range hint 482-515: Check for performance overhead in deeper iframes.
Looping through each child and verifying table or TH elements might become costly if the list is large. If performance is a concern, caching or short-circuiting might help.


416-416: Unnecessary continue statement.
As per static analysis, removing this statement can simplify the loop.

-        continue;
🧰 Tools
🪛 Biome (1.9.4)

[error] 416-416: Unnecessary continue statement

Unsafe fix: Delete the unnecessary continue statement

(lint/correctness/noUnnecessaryContinue)

server/src/workflow-management/selector.ts (5)

484-528: Similar duplication.
The repeated getDeepestElementFromPoint logic might be consolidated. Repeated code can drift out of sync.


1796-1797: Streamline code.
Static analysis flagged this as an unnecessary continue;. Removing it cleans up the loop.

-            continue;
🧰 Tools
🪛 Biome (1.9.4)

[error] 1796-1797: Unnecessary continue statement

Unsafe fix: Delete the unnecessary continue statement

(lint/correctness/noUnnecessaryContinue)


1828-1828: Remove superfluous continue.
Similar to the previous case, removing this continues to yield the same flow.

-        continue;
🧰 Tools
🪛 Biome (1.9.4)

[error] 1828-1828: Unnecessary continue statement

Unsafe fix: Delete the unnecessary continue statement

(lint/correctness/noUnnecessaryContinue)


1845-1846: Another unnecessary continue.
Following the same pattern, consider removing for clarity.

-            continue;
🧰 Tools
🪛 Biome (1.9.4)

[error] 1845-1846: Unnecessary continue statement

Unsafe fix: Delete the unnecessary continue statement

(lint/correctness/noUnnecessaryContinue)


1874-1874: Unneeded continue statement.
Same reasoning; removing it doesn’t affect the loop logic.

-                continue;
🧰 Tools
🪛 Biome (1.9.4)

[error] 1874-1874: Unnecessary continue statement

Unsafe fix: Delete the unnecessary continue statement

(lint/correctness/noUnnecessaryContinue)

maxun-core/src/interpret.ts (2)

669-681: Ensure robust filtering and coverage for iframe-like notations.
Currently, the removeIframeSelectors method filters out selectors containing ":>>". This achieves the immediate goal of excluding iframe selectors, but consider other potential iframe-like notations or future expansions (e.g., stricter substring checks, case-sensitivity, or partial usage like :>) if the codebase or requirements might evolve. Adding unit tests to verify all expected and edge-case notations can provide additional confidence.


684-686: Consider using a more modern or type-safe cloning approach.
Using JSON.parse(JSON.stringify(workflow)) may lead to the loss of any non-serializable data (e.g., dates, custom classes). If your workflow includes such fields in the future, you could consider using structuredClone (in environments that support it) or another library-based deep-cloning method to preserve object fidelity.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 170e4f1 and 5e775fa.

📒 Files selected for processing (6)
  • maxun-core/src/browserSide/scraper.js (7 hunks)
  • maxun-core/src/interpret.ts (1 hunks)
  • server/src/types/index.ts (3 hunks)
  • server/src/workflow-management/selector.ts (15 hunks)
  • server/src/workflow-management/utils.ts (1 hunks)
  • src/components/organisms/BrowserWindow.tsx (2 hunks)
🧰 Additional context used
🪛 Biome (1.9.4)
src/components/organisms/BrowserWindow.tsx

[error] 138-138: Change to an optional chain.

Unsafe fix: Change to an optional chain.

(lint/complexity/useOptionalChain)

maxun-core/src/browserSide/scraper.js

[error] 248-248: Unnecessary continue statement

Unsafe fix: Delete the unnecessary continue statement

(lint/correctness/noUnnecessaryContinue)


[error] 416-416: Unnecessary continue statement

Unsafe fix: Delete the unnecessary continue statement

(lint/correctness/noUnnecessaryContinue)

server/src/workflow-management/selector.ts

[error] 1796-1797: Unnecessary continue statement

Unsafe fix: Delete the unnecessary continue statement

(lint/correctness/noUnnecessaryContinue)


[error] 1828-1828: Unnecessary continue statement

Unsafe fix: Delete the unnecessary continue statement

(lint/correctness/noUnnecessaryContinue)


[error] 1845-1846: Unnecessary continue statement

Unsafe fix: Delete the unnecessary continue statement

(lint/correctness/noUnnecessaryContinue)


[error] 1874-1874: Unnecessary continue statement

Unsafe fix: Delete the unnecessary continue statement

(lint/correctness/noUnnecessaryContinue)

🔇 Additional comments (19)
src/components/organisms/BrowserWindow.tsx (5)

127-128: Approve additional child selectors validation.
This check for valid child selectors looks appropriate and robust.


139-139: Comment clarity approved.
This line helps future readers understand the highlight logic.


163-163: Clear, concise comment.
This comment clarifies that we reset the highlighter if no valid child is found. Looks good.


167-167: Essential comment for list selector initialization.
Retaining this comment is helpful for maintainers who need to revisit how the listSelector mechanism is initialized.


171-171: Confirm non-list fallback comment.
This line clarifies how the fallback is handled outside the list-based logic.

maxun-core/src/browserSide/scraper.js (4)

207-257: Check for cross-origin access and potential fallback handling.
The findAllElements function robustly accounts for iframes, which is great. Ensure that any cross-origin frames produce graceful fallbacks, so partial results are still valid.

🧰 Tools
🪛 Biome (1.9.4)

[error] 248-248: Unnecessary continue statement

Unsafe fix: Delete the unnecessary continue statement

(lint/correctness/noUnnecessaryContinue)


258-283: Robust URL resolution logic.
The getElementValue function does a good job of handling relative URLs in iframes. Verify that it covers all relevant attributes beyond href and src—particularly for custom data attributes if needed.


Line range hint 444-466: Fine-grained logic for table imports.
This table-ancestor logic is very detailed. Consider simpler approaches if possible, but overall it's correct for specialized table structures.


557-572: Similar container detection.
Avoid unintentionally scraping unrelated containers if the layout changes. Possibly add heuristics or checks to confirm container similarity beyond class-based heuristics.

server/src/workflow-management/selector.ts (5)

Line range hint 80-101: Accurate property definitions for element info.
Storing iframeIndex and frameHierarchy is a helpful approach for debugging and re-targeting.


172-220: Check for nested table or body logic.
The large block checks if an element is a table or crosses the body. Confirm that the final break condition is always correct for complex iframes or edge-case elements.


374-420: Coordinates adjustment for multi-frame elements.
This code is well-structured to re-map element positions from an iframe to the top window. Test thoroughly if iFrames are offset or scrolled.


588-599: Neat rectangular toJSON logic.
Exposing bounding rects with a toJSON() method is a thoughtful approach for debugging or server calls.


Line range hint 1371-1473: Non-unique selectors for iframes.
The approach of generating partial selectors (like :nth-child()) can be fragile in dynamic DOMs. Verify that ephemeral changes (e.g., ads injecting extra elements) won’t break these partial selectors.

server/src/workflow-management/utils.ts (1)

15-19: Good fallback for iframe selectors.
Prioritizing iframeSelector.full is consistent with the new nested DOM handling. Make sure the rest of the system respects this if the user’s environment restricts iframes.

server/src/types/index.ts (3)

132-135: Well-defined iframe selector interface.
Defining IframeSelector clarifies how iframe-based selectors differ from standard selectors, improving maintainability and type safety.


151-151: Great extension for iframes.
Adding the iframeSelector property to Selectors ensures thorough coverage for nested elements. This is logically consistent with the rest of the PR.


165-165: Strengthen type usage.
Using selectors: Selectors instead of a generic object is a best practice, ensuring consistency and discoverability of properties like iframeSelector.

maxun-core/src/interpret.ts (1)

669-681: Validate that removing iframe selectors meets workflow expectations.
Stripping selectors mid-workflow can sometimes create unexpected behavior if subsequent steps rely on them. It might be beneficial to log or otherwise track which selectors were removed before returning the modified workflow. This would aid debugging in scenarios where the removal of selectors inadvertently breaks a downstream action.

Comment on lines 26 to 74
// Helper function to find elements within iframes, handling nested cases
const getDeepestElementFromPoint = (x: number, y: number): HTMLElement | null => {
// First, get the element at the clicked coordinates in the main document
let element = document.elementFromPoint(x, y) as HTMLElement;
if (!element) return null;

// If it's not an iframe, return the element as is
if (element.tagName !== 'IFRAME') return element;

// Initialize tracking variables for iframe traversal
let currentIframe = element as HTMLIFrameElement;
let deepestElement = element;

// Continue traversing while we have a valid iframe
while (currentIframe) {
try {
// Convert the coordinates from main document space to iframe's local space
const iframeRect = currentIframe.getBoundingClientRect();
const iframeX = x - iframeRect.left;
const iframeY = y - iframeRect.top;

// Get the iframe's document object - this gives us access to the iframe's content
const iframeDocument = currentIframe.contentDocument || currentIframe.contentWindow?.document;
if (!iframeDocument) break;

// Find the element at the transformed coordinates within the iframe
const iframeElement = iframeDocument.elementFromPoint(iframeX, iframeY) as HTMLElement;

// If no element found or it's the same as current, stop traversing
if (!iframeElement) break;

// Update our tracking of the deepest element
deepestElement = iframeElement;

// If we found another iframe, continue traversing through it
if (iframeElement.tagName === 'IFRAME') {
currentIframe = iframeElement as HTMLIFrameElement;
} else {
// If it's not an iframe, we've reached the deepest level
break;
}
} catch (error) {
// Handle potential cross-origin security restrictions
console.warn('Cannot access iframe content:', error);
break;
}
}
return deepestElement;
};
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Thorough iframe element-from-point logic.
This multi-level iframe traversal for getDeepestElementFromPoint is quite detailed. Ensure that repeated code remains in sync if similar functions exist. Consider extracting into a shared helper.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
server/src/workflow-management/selector.ts (1)

27-89: 🛠️ Refactor suggestion

Consolidate duplicate getDeepestElementFromPoint implementations.

The function is implemented multiple times with similar logic. As noted in a previous review, this should be consolidated into a shared helper to ensure the implementations remain in sync.

Consider moving this to a shared utility file that can be imported where needed.

Also applies to: 1193-1261

🧹 Nitpick comments (3)
src/components/organisms/BrowserWindow.tsx (2)

121-123: Remove debug logging statements.

These console.log statements should be removed before merging.

-  console.log("LIST SELECTOR", listSelector);
-  console.log("DATA SELECTOR", data.selector);
-  console.log("CHILD SELECTORS", data.childSelectors);

141-161: Consider refactoring the selector validation logic.

The nested conditions for handling different selector types (iframe, shadow DOM, mixed) make the code hard to follow. Consider extracting these checks into separate helper functions.

+ const isValidIframeChild = (selector: string, childSelectors: string[]) => {
+   return childSelectors.some(childSelector => 
+     selector.includes(':>>') && 
+     childSelector.split(':>>').some(part => 
+       selector.includes(part.trim())
+     )
+   );
+ };

+ const isValidMixedSelector = (selector: string, childSelectors: string[]) => {
+   const selectorParts = selector.split(':>>').map(part => part.trim());
+   return selectorParts.some(part => 
+     childSelectors.some(childSelector => 
+       childSelector.includes(part)
+     )
+   );
+ };

  if (data.elementInfo?.isIframeContent && data.childSelectors) {
-   const isIframeChild = data.childSelectors.some(childSelector => 
-     data.selector.includes(':>>') &&
-     childSelector.split(':>>').some(part => 
-       data.selector.includes(part.trim())
-     )
-   );
+   const isIframeChild = isValidIframeChild(data.selector, data.childSelectors);
    setHighlighterData(isIframeChild ? data : null);
  } else if (data.selector.includes(':>>') && hasValidChildSelectors) {
-   const selectorParts = data.selector.split(':>>').map(part => part.trim());
-   const isValidMixedSelector = selectorParts.some(part => 
-     data.childSelectors!.some(childSelector => 
-       childSelector.includes(part)
-     )
-   );
+   const isValid = isValidMixedSelector(data.selector, data.childSelectors!);
-   setHighlighterData(isValidMixedSelector ? data : null);
+   setHighlighterData(isValid ? data : null);
  }
maxun-core/src/browserSide/scraper.js (1)

208-273: Consider caching DOM queries and adding early exits.

The function performs well but could be optimized:

  1. Cache querySelector results when reused
  2. Add early exits for empty results
  3. Consider using WeakMap for caching across multiple calls
+ const elementCache = new WeakMap();
+ const getCachedElements = (root, selector) => {
+   if (!elementCache.has(root)) {
+     elementCache.set(root, new Map());
+   }
+   const cache = elementCache.get(root);
+   if (!cache.has(selector)) {
+     cache.set(selector, Array.from(root.querySelectorAll(selector)));
+   }
+   return cache.get(selector);
+ };

  if (!config.selector.includes('>>') && !config.selector.includes(':>>')) {
-   return Array.from(document.querySelectorAll(config.selector));
+   return getCachedElements(document, config.selector);
  }
🧰 Tools
🪛 Biome (1.9.4)

[error] 264-264: Unnecessary continue statement

Unsafe fix: Delete the unnecessary continue statement

(lint/correctness/noUnnecessaryContinue)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5e775fa and d1e8239.

📒 Files selected for processing (6)
  • maxun-core/src/browserSide/scraper.js (4 hunks)
  • maxun-core/src/interpret.ts (1 hunks)
  • server/src/types/index.ts (2 hunks)
  • server/src/workflow-management/selector.ts (21 hunks)
  • server/src/workflow-management/utils.ts (1 hunks)
  • src/components/organisms/BrowserWindow.tsx (4 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
  • server/src/workflow-management/utils.ts
  • maxun-core/src/interpret.ts
  • server/src/types/index.ts
🧰 Additional context used
🪛 Biome (1.9.4)
src/components/organisms/BrowserWindow.tsx

[error] 138-138: Change to an optional chain.

Unsafe fix: Change to an optional chain.

(lint/complexity/useOptionalChain)

maxun-core/src/browserSide/scraper.js

[error] 264-264: Unnecessary continue statement

Unsafe fix: Delete the unnecessary continue statement

(lint/correctness/noUnnecessaryContinue)


[error] 453-453: Unnecessary continue statement

Unsafe fix: Delete the unnecessary continue statement

(lint/correctness/noUnnecessaryContinue)


[error] 552-552: The assignment should not be in an expression.

The use of assignments in expressions is confusing.
Expressions are often considered as side-effect free.

(lint/suspicious/noAssignInExpressions)

server/src/workflow-management/selector.ts

[error] 2071-2072: Unnecessary continue statement

Unsafe fix: Delete the unnecessary continue statement

(lint/correctness/noUnnecessaryContinue)


[error] 2154-2154: Unnecessary continue statement

Unsafe fix: Delete the unnecessary continue statement

(lint/correctness/noUnnecessaryContinue)

🔇 Additional comments (3)
src/components/organisms/BrowserWindow.tsx (1)

15-15: LGTM! Well-typed interface extension.

The addition of isIframeContent property to the ElementInfo interface properly supports iframe content detection.

maxun-core/src/browserSide/scraper.js (1)

275-297: LGTM! Robust URL resolution implementation.

The function properly handles URL resolution in iframe contexts and includes safe property access. The implementation is thorough and handles edge cases well.

server/src/workflow-management/selector.ts (1)

1267-1331: LGTM! Well-structured selector generation logic.

The implementation provides comprehensive handling of both iframe and shadow DOM contexts with:

  • Clear separation of concerns in helper functions
  • Proper error handling and logging
  • Good depth limiting to prevent infinite recursion

Comment on lines +384 to +479
// Enhanced query function to handle both iframe and shadow DOM
const queryElement = (rootElement, selector) => {
if (!selector.includes('>>') && !selector.includes(':>>')) {
return rootElement.querySelector(selector);
}

const parts = selector.split('>>').map(part => part.trim());
let currentElement = rootElement;

for (let i = 0; i < parts.length; i++) {
if (!currentElement) return null;
}

if (!currentElement.querySelector && !currentElement.shadowRoot) {
currentElement = document.querySelector(parts[i]);
continue;
}
const parts = selector.split(/(?:>>|:>>)/).map(part => part.trim());
let currentElement = rootElement;

for (let i = 0; i < parts.length; i++) {
if (!currentElement) return null;

// Handle iframe traversal
if (currentElement.tagName === 'IFRAME') {
try {
const iframeDoc = currentElement.contentDocument || currentElement.contentWindow.document;
currentElement = iframeDoc.querySelector(parts[i]);
continue;
} catch (e) {
console.warn('Cannot access iframe content:', e);
return null;
}
}

let nextElement = currentElement.querySelector(parts[i]);
// Try regular DOM first
let nextElement = currentElement.querySelector(parts[i]);

if (!nextElement && currentElement.shadowRoot) {
nextElement = currentElement.shadowRoot.querySelector(parts[i]);
}
// Try shadow DOM if not found
if (!nextElement && currentElement.shadowRoot) {
nextElement = currentElement.shadowRoot.querySelector(parts[i]);
}

if (!nextElement) {
const allChildren = Array.from(currentElement.children || []);
for (const child of allChildren) {
if (child.shadowRoot) {
nextElement = child.shadowRoot.querySelector(parts[i]);
if (nextElement) break;
}
}
}
// Check children's shadow roots if still not found
if (!nextElement) {
const children = Array.from(currentElement.children || []);
for (const child of children) {
if (child.shadowRoot) {
nextElement = child.shadowRoot.querySelector(parts[i]);
if (nextElement) break;
}
}
}

currentElement = nextElement;
}
currentElement = nextElement;
}

return currentElement;
return currentElement;
};

const queryShadowDOMAll = (rootElement, selector) => {
if (!selector.includes('>>')) {
// Enhanced query all function for both contexts
const queryElementAll = (rootElement, selector) => {
if (!selector.includes('>>') && !selector.includes(':>>')) {
return rootElement.querySelectorAll(selector);
}
}

const parts = selector.split('>>').map(part => part.trim());
let currentElements = [rootElement];

for (const part of parts) {
const nextElements = [];

for (const element of currentElements) {
if (element.querySelectorAll) {
nextElements.push(...element.querySelectorAll(part));
}

if (element.shadowRoot) {
nextElements.push(...element.shadowRoot.querySelectorAll(part));
}

const children = Array.from(element.children || []);
for (const child of children) {
if (child.shadowRoot) {
nextElements.push(...child.shadowRoot.querySelectorAll(part));
}
}
}

currentElements = nextElements;
}

return currentElements;
const parts = selector.split(/(?:>>|:>>)/).map(part => part.trim());
let currentElements = [rootElement];

for (const part of parts) {
const nextElements = [];

for (const element of currentElements) {
// Handle iframe traversal
if (element.tagName === 'IFRAME') {
try {
const iframeDoc = element.contentDocument || element.contentWindow.document;
nextElements.push(...iframeDoc.querySelectorAll(part));
} catch (e) {
console.warn('Cannot access iframe content:', e);
continue;
}
} else {
// Regular DOM elements
if (element.querySelectorAll) {
nextElements.push(...element.querySelectorAll(part));
}

// Shadow DOM elements
if (element.shadowRoot) {
nextElements.push(...element.shadowRoot.querySelectorAll(part));
}

// Check children's shadow roots
const children = Array.from(element.children || []);
for (const child of children) {
if (child.shadowRoot) {
nextElements.push(...child.shadowRoot.querySelectorAll(part));
}
}
}
}

currentElements = nextElements;
}

return currentElements;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Reduce code duplication in iframe traversal logic.

The iframe traversal logic is duplicated across multiple implementations of getDeepestElementFromPoint. Consider extracting this into a shared utility function.

+ const traverseIframe = (iframe, x, y, depth = 0) => {
+   const MAX_DEPTH = 4;
+   if (depth >= MAX_DEPTH) return null;
+   
+   try {
+     const rect = iframe.getBoundingClientRect();
+     const localX = x - rect.left;
+     const localY = y - rect.top;
+     
+     const doc = iframe.contentDocument || iframe.contentWindow?.document;
+     if (!doc) return null;
+     
+     return doc.elementFromPoint(localX, localY);
+   } catch (error) {
+     console.warn('Cannot access iframe content:', error);
+     return null;
+   }
+ };

  const queryElement = (rootElement, selector) => {
    if (!selector.includes('>>') && !selector.includes(':>>')) {
      return rootElement.querySelector(selector);
    }
    
    // ... existing code ...
    
    if (currentElement.tagName === 'IFRAME') {
-     try {
-       const iframeDoc = currentElement.contentDocument || currentElement.contentWindow.document;
-       currentElement = iframeDoc.querySelector(parts[i]);
-       continue;
-     } catch (e) {
-       console.warn('Cannot access iframe content:', e);
-       return null;
-     }
+     const iframeElement = traverseIframe(currentElement, x, y);
+     if (!iframeElement) return null;
+     currentElement = iframeElement;
+     continue;
    }
  };

Committable suggestion skipped: line range outside the PR's diff.

🧰 Tools
🪛 Biome (1.9.4)

[error] 453-453: Unnecessary continue statement

Unsafe fix: Delete the unnecessary continue statement

(lint/correctness/noUnnecessaryContinue)

@RohitR311 RohitR311 merged commit c132f27 into develop Jan 4, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Status: Work In Progess This issue/PR is actively being worked on Type: Feature New features

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants