-
Notifications
You must be signed in to change notification settings - Fork 1.1k
fix: extract clean data #822
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughRefactors selector generation in Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor User
participant Page
participant SelectorGen as clientSelectorGenerator
participant Shadow as Shadow/Subtree
User->>Page: Click / Point(x,y)
Page->>SelectorGen: getDeepestElementFromPoint(x,y)
SelectorGen->>Page: elementsFromPoint(x,y)
alt deepest is meaningful
SelectorGen-->>Page: return deepestElement
else not meaningful
SelectorGen->>Shadow: findAtomicChildAtPoint(deepest, bbox)
Shadow-->>SelectorGen: atomicMeaningfulDescendant?
alt found
SelectorGen-->>Page: return atomicDescendant
else not found
SelectorGen-->>Page: return original deepest
end
end
sequenceDiagram
autonumber
participant Gen as Selector Builder
participant Target as TargetElement
participant Lists as OtherListElements
participant Utils as Path/Attr Utils
Gen->>Utils: getElementPath(Target)
Gen->>Lists: findCorrespondingElement(root, path)*
note over Gen,Lists: Align competitor elements by DOM path
loop Candidate attributes/classes
Gen->>Utils: isAttributeCommonAcrossLists(attrName, attrValue, Lists)
alt common across lists
Gen-->>Gen: emit attribute predicate
else not common
Gen-->>Gen: skip predicate
end
end
Gen-->>Target: Final selector
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60–90 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 inconclusive)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
🔇 Additional comments (2)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/helpers/clientSelectorGenerator.ts (1)
976-993
: Invert grouped parent/child filtering to keep leaf-most elements.filterParentChildGroupedElements currently keeps parents that contain other grouped elements. Usually we want the deepest (leaf) grouped elements to avoid duplicate highlights.
Apply this diff:
private filterParentChildGroupedElements( groupedElements: HTMLElement[] ): HTMLElement[] { const result: HTMLElement[] = []; for (const element of groupedElements) { const hasGroupedChild = groupedElements.some( (other) => other !== element && element.contains(other) ); - if (hasGroupedChild) { - result.push(element); - } + if (!hasGroupedChild) { + result.push(element); + } } return result.length > 0 ? result : groupedElements; }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/helpers/clientSelectorGenerator.ts
(9 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
src/helpers/clientSelectorGenerator.ts (1)
maxun-core/src/browserSide/scraper.js (1)
MAX_DEPTH
(955-955)
🔇 Additional comments (1)
src/helpers/clientSelectorGenerator.ts (1)
2552-2552
: Ensure MAX_DEPTH divergence is intentional
MAX_DEPTH increased from 4→20 in clientSelectorGenerator.ts, while upstream scraper and server selectors cap at 5/4. If this deeper traversal interacts with those components, confirm it won’t degrade performance or cause selector mismatches—consider aligning or documenting the rationale.
if (!addPositionToAll) { | ||
const meaningfulAttrs = ["role", "type", "name", "src", "aria-label"]; | ||
const meaningfulAttrs = ["role", "type"]; | ||
for (const attrName of meaningfulAttrs) { | ||
if (element.hasAttribute(attrName)) { | ||
const value = element.getAttribute(attrName)!.replace(/'/g, "\\'"); | ||
return `${tagName}[@${attrName}='${value}']`; | ||
const isCommonAttribute = this.isAttributeCommonAcrossLists( | ||
element, | ||
attrName, | ||
value, | ||
otherListElements | ||
); | ||
if (isCommonAttribute) { | ||
return `${tagName}[@${attrName}='${value}']`; | ||
} | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cross-list attribute check maps by absolute path; compute relative path to the list root and fail closed.
isAttributeCommonAcrossLists builds a path from the document root (getElementPath) then applies it starting from other list roots. This mismaps descendants and treats “corresponding element not found” as implicitly OK, letting non-common attributes pass. Compute a path relative to the current list root and require a match in every other list.
Apply these diffs:
@@
- if (!addPositionToAll) {
- const meaningfulAttrs = ["role", "type"];
+ if (!addPositionToAll) {
+ const meaningfulAttrs = ["role", "type"];
for (const attrName of meaningfulAttrs) {
if (element.hasAttribute(attrName)) {
const value = element.getAttribute(attrName)!.replace(/'/g, "\\'");
- const isCommonAttribute = this.isAttributeCommonAcrossLists(
- element,
- attrName,
- value,
- otherListElements
- );
+ const isCommonAttribute = rootElement
+ ? this.isAttributeCommonAcrossLists(
+ element,
+ rootElement,
+ attrName,
+ value,
+ otherListElements
+ )
+ : false;
if (isCommonAttribute) {
return `${tagName}[@${attrName}='${value}']`;
}
}
}
}
@@
- const testId = element.getAttribute("data-testid");
+ const testId = element.getAttribute("data-testid");
if (testId && !addPositionToAll) {
- const isCommon = this.isAttributeCommonAcrossLists(
- element,
- "data-testid",
- testId,
- otherListElements
- );
+ const isCommon = rootElement
+ ? this.isAttributeCommonAcrossLists(
+ element,
+ rootElement,
+ "data-testid",
+ testId,
+ otherListElements
+ )
+ : false;
if (isCommon) {
return `${tagName}[@data-testid='${testId}']`;
}
}
@@
- if (element.id && !element.id.match(/^\d/) && !addPositionToAll) {
- const isCommon = this.isAttributeCommonAcrossLists(
- element,
- "id",
- element.id,
- otherListElements
- );
+ if (element.id && !element.id.match(/^\d/) && !addPositionToAll) {
+ const isCommon = rootElement
+ ? this.isAttributeCommonAcrossLists(
+ element,
+ rootElement,
+ "id",
+ element.id,
+ otherListElements
+ )
+ : false;
if (isCommon) {
return `${tagName}[@id='${element.id}']`;
}
}
@@
- if (!addPositionToAll) {
+ if (!addPositionToAll) {
for (const attr of Array.from(element.attributes)) {
if (
attr.name.startsWith("data-") &&
attr.name !== "data-testid" &&
attr.name !== "data-mx-id" &&
attr.value
) {
- const isCommon = this.isAttributeCommonAcrossLists(
- element,
- attr.name,
- attr.value,
- otherListElements
- );
+ const isCommon = rootElement
+ ? this.isAttributeCommonAcrossLists(
+ element,
+ rootElement,
+ attr.name,
+ attr.value,
+ otherListElements
+ )
+ : false;
if (isCommon) {
return `${tagName}[@${attr.name}='${attr.value}']`;
}
}
}
}
@@
- private isAttributeCommonAcrossLists(
- targetElement: HTMLElement,
- attrName: string,
- attrValue: string,
- otherListElements: HTMLElement[]
- ): boolean {
- if (otherListElements.length === 0) {
- return true;
- }
-
- const targetPath = this.getElementPath(targetElement);
-
- for (const otherListElement of otherListElements) {
- const correspondingElement = this.findCorrespondingElement(
- otherListElement,
- targetPath
- );
- if (correspondingElement) {
- const otherValue = correspondingElement.getAttribute(attrName);
- if (otherValue !== attrValue) {
- return false;
- }
- }
- }
-
- return true;
- }
+ private isAttributeCommonAcrossLists(
+ targetElement: HTMLElement,
+ rootElement: HTMLElement,
+ attrName: string,
+ attrValue: string,
+ otherListElements: HTMLElement[]
+ ): boolean {
+ if (otherListElements.length === 0) return true;
+ const relPath = this.getElementPath(targetElement, rootElement);
+ if (!relPath) return false; // cannot map reliably relative to root
+ for (const otherRoot of otherListElements) {
+ const corresponding = this.findCorrespondingElement(otherRoot, relPath);
+ if (!corresponding) return false;
+ if (corresponding.getAttribute(attrName) !== attrValue) return false;
+ }
+ return true;
+ }
@@
- private getElementPath(element: HTMLElement): number[] {
- const path: number[] = [];
- let current: HTMLElement | null = element;
-
- while (current && current.parentElement) {
- const siblings = Array.from(current.parentElement.children);
- path.unshift(siblings.indexOf(current));
- current = current.parentElement;
- }
-
- return path;
- }
+ private getElementPath(
+ element: HTMLElement,
+ rootElement: HTMLElement
+ ): number[] | null {
+ const path: number[] = [];
+ let current: HTMLElement | null = element;
+ // Walk up until rootElement (crossing shadow hosts if needed)
+ while (current && current !== rootElement) {
+ const parent =
+ current.parentElement ||
+ ((current.getRootNode() as ShadowRoot).host as HTMLElement | null);
+ if (!parent) return null;
+ const siblings = Array.from(parent.children);
+ const idx = siblings.indexOf(current);
+ if (idx < 0) return null;
+ path.unshift(idx);
+ current = parent;
+ }
+ return current === rootElement ? path : null;
+ }
Also applies to: 2717-2728, 2730-2740, 2749-2759, 2926-2953, 2954-2966, 2967-2983
🤖 Prompt for AI Agents
In src/helpers/clientSelectorGenerator.ts around lines 2699-2715 (and also apply
same change to ranges 2717-2728, 2730-2740, 2749-2759, 2926-2953, 2954-2966,
2967-2983): the current isAttributeCommonAcrossLists logic builds element paths
from the document root and then queries other list roots using those absolute
paths, which can map to wrong descendants and treats "not found" as success;
change it to compute the element path relative to the current list root (strip
or build the path starting from the list root node) and when checking
otherListElements require that the corresponding relative path resolves to an
element and that its attribute value equals the expected value in every other
list (i.e., treat missing matches as failure), returning false if any list does
not have a matching element or attribute value; update all mentioned code ranges
to use the relative-path lookup and strict matching.
if (!this.isMeaningfulElementCached(deepestElement)) { | ||
const atomicChild = this.findAtomicChildAtPoint(deepestElement, x, y); | ||
if (atomicChild) { | ||
return atomicChild; | ||
} | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion | 🟠 Major
Atomic-child fallback misses shadow DOM descendants.
findAtomicChildAtPoint only walks light DOM children. If the meaningful descendant is in a shadow root, the fallback won’t find it.
Apply this diff:
- if (!this.isMeaningfulElementCached(deepestElement)) {
- const atomicChild = this.findAtomicChildAtPoint(deepestElement, x, y);
+ if (!this.isMeaningfulElementCached(deepestElement)) {
+ const atomicChild = this.findAtomicChildAtPoint(deepestElement, x, y);
if (atomicChild) {
return atomicChild;
}
}
And enhance traversal:
private findAtomicChildAtPoint(
parent: HTMLElement,
x: number,
y: number
): HTMLElement | null {
const stack: HTMLElement[] = [parent];
const visited = new Set<HTMLElement>();
while (stack.length > 0) {
const element = stack.pop()!;
if (visited.has(element)) continue;
visited.add(element);
if (element !== parent && this.isMeaningfulElementCached(element)) {
const rect = element.getBoundingClientRect();
if (x >= rect.left && x <= rect.right && y >= rect.top && y <= rect.bottom) {
return element;
}
}
- for (let i = element.children.length - 1; i >= 0; i--) {
+ // Light DOM children (iterate from topmost in stacking order approximation)
+ for (let i = element.children.length - 1; i >= 0; i--) {
const child = element.children[i] as HTMLElement;
const rect = child.getBoundingClientRect();
if (x >= rect.left && x <= rect.right && y >= rect.top && y <= rect.bottom) {
stack.push(child);
}
}
+ // Shadow DOM children (if any)
+ if ((element as HTMLElement).shadowRoot) {
+ const shadowChildren = (element as HTMLElement).shadowRoot!.children;
+ for (let i = shadowChildren.length - 1; i >= 0; i--) {
+ const child = shadowChildren[i] as HTMLElement;
+ const rect = child.getBoundingClientRect();
+ if (x >= rect.left && x <= rect.right && y >= rect.top && y <= rect.bottom) {
+ stack.push(child);
+ }
+ }
+ }
}
return null;
}
Also applies to: 4003-4034
🤖 Prompt for AI Agents
In src/helpers/clientSelectorGenerator.ts around lines 3993-3999 (and similarly
for 4003-4034), the atomic-child fallback uses findAtomicChildAtPoint which only
traverses light DOM children and therefore misses meaningful descendants inside
shadow roots; update the traversal to also check shadow DOM by: when examining a
candidate element at the point, if it has a shadowRoot call
shadowRoot.elementFromPoint(x,y) (or shadowRoot.deep element lookup) and
continue descending into that shadow root’s composed tree (including following
assigned nodes for slots and checking shadow-inserted children) before falling
back to light DOM children; ensure the function returns the deepest atomic
descendant found in either the shadowRoot or light children and mirror this
shadow-aware logic in the other affected block (4003-4034).
What this PR does?
Summary by CodeRabbit
New Features
Bug Fixes
Performance
Refactor