-
Notifications
You must be signed in to change notification settings - Fork 1.1k
feat: faster output preview #583
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
""" WalkthroughThe changes export the Changes
Sequence Diagram(s)sequenceDiagram
participant RightPanel as RightSidePanel
participant Provider as BrowserStepsProvider
participant State as browserSteps State
RightPanel->>Provider: Call updateListStepLimit(listId, limit)
Provider->>State: Map steps, update ListStep with matching listId
State-->>Provider: Return updated steps array
Provider-->>RightPanel: Context updated, UI reflects new limit
sequenceDiagram
participant Client as Editor Client
participant Server as RemoteBrowser
participant Page as Playwright Page
Client->>Server: emit 'extractListData' with listSelector, fields, currentListId
Server->>Page: extractListData(listSelector, fields, limit)
Page-->>Server: return extracted list data
Server->>Client: emit 'listDataExtracted' with data and currentListId
Assessment against linked issues
Suggested reviewers
Poem
📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (1)
🔇 Additional comments (3)
✨ Finishing Touches
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
src/context/browserSteps.tsx (1)
146-158: Fix indentation inconsistency and consider edge cases.The implementation is functional but has inconsistent indentation (2 spaces vs 4 spaces used elsewhere in the file). Also, consider adding validation for the limit parameter.
const updateListStepLimit = (listId: number, limit: number) => { setBrowserSteps(prevSteps => - prevSteps.map(step => { - if (step.type === 'list' && step.id === listId) { - return { - ...step, - limit: limit - }; - } - return step; - }) + prevSteps.map(step => { + if (step.type === 'list' && step.id === listId) { + return { + ...step, + limit // object property shorthand + }; + } + return step; + }) ); };
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/context/browserSteps.tsx(4 hunks)
🔇 Additional comments (3)
src/context/browserSteps.tsx (3)
29-29: Appropriate export of BrowserStep type.Exporting this type is a good idea since it will make it accessible to other components that need to work with browser steps.
47-47: Function signature looks good.The signature is consistent with other context methods and follows the naming convention of the codebase.
184-184: Good addition to context provider value.The method is correctly added to the context provider value, making it accessible to consumers of the context.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🔭 Outside diff range comments (1)
src/components/recorder/RightSidePanel.tsx (1)
412-412: 🛠️ Refactor suggestionUpdate dependency array in useCallback.
The handleConfirmListCapture function now uses the getLatestListStep function and updateListStepLimit, but they aren't included in the dependency array of the useCallback. This could lead to stale closures if these dependencies change.
}, [captureStage, paginationType, limitType, customLimit, startPaginationMode, setShowPaginationOptions, setCaptureStage, getListSettingsObject, notify, stopPaginationMode, startLimitMode, setShowLimitOptions, stopLimitMode, setIsCaptureListConfirmed, stopCaptureAndEmitGetListSettings, t]); + }, [captureStage, paginationType, limitType, customLimit, startPaginationMode, setShowPaginationOptions, setCaptureStage, getListSettingsObject, notify, stopPaginationMode, startLimitMode, setShowLimitOptions, stopLimitMode, setIsCaptureListConfirmed, stopCaptureAndEmitGetListSettings, browserSteps, updateListStepLimit, t]);
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/components/recorder/RightSidePanel.tsx(4 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
src/components/recorder/RightSidePanel.tsx (1)
src/context/browserSteps.tsx (2)
useBrowserSteps(192-198)BrowserStep(29-29)
🪛 Biome (1.9.4)
src/components/recorder/RightSidePanel.tsx
[error] 396-396: Other switch clauses can erroneously access this declaration.
Wrap the declaration in a block to restrict its access to the switch clause.
The declaration is defined in this switch clause:
Unsafe fix: Wrap the declaration in a block.
(lint/correctness/noSwitchDeclarations)
🔇 Additional comments (4)
src/components/recorder/RightSidePanel.tsx (4)
10-10: Good addition of the BrowserStep type import.The BrowserStep type import is necessary for typing the new getLatestListStep function parameter.
72-72: Good inclusion of updateListStepLimit in the hook destructuring.Properly destructured the updateListStepLimit function from the useBrowserSteps hook to enable updating of list step limits.
352-357: Well-implemented helper function.The getLatestListStep function is well-designed with:
- Proper filtering for list-type steps
- Null-check safety for empty results
- Clear sorting by descending ID to get the most recent step
This helper keeps the code DRY by extracting the list step finding logic.
396-399: Good implementation of limit updating for the latest list step.This code efficiently:
- Finds the most recent list step using the new helper function
- Updates its limit property with the parsed limit value
- Does this before completing the capture process
This enhancement improves the responsiveness of the output preview by immediately updating the list step limit.
🧰 Tools
🪛 Biome (1.9.4)
[error] 396-396: Other switch clauses can erroneously access this declaration.
Wrap the declaration in a block to restrict its access to the switch clause.The declaration is defined in this switch clause:
Unsafe fix: Wrap the declaration in a block.
(lint/correctness/noSwitchDeclarations)
|
|
||
| const latestListStep = getLatestListStep(browserSteps); | ||
| if (latestListStep) { | ||
| updateListStepLimit(latestListStep.id, limit); | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix the switch case declaration scope issue.
The static analysis tool correctly identifies a potential issue with the declaration scope in the switch statement. Variables declared in a switch case without a block are accessible in other cases, which could lead to unintended behavior.
Apply this fix:
case 'limit':
if (!limitType || (limitType === 'custom' && !customLimit)) {
notify('error', t('right_panel.errors.select_limit'));
return;
}
const limit = limitType === 'custom' ? parseInt(customLimit) : parseInt(limitType);
if (isNaN(limit) || limit <= 0) {
notify('error', t('right_panel.errors.invalid_limit'));
return;
}
+ {
const latestListStep = getLatestListStep(browserSteps);
if (latestListStep) {
updateListStepLimit(latestListStep.id, limit);
}
+ }
stopLimitMode();
setShowLimitOptions(false);📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| const latestListStep = getLatestListStep(browserSteps); | |
| if (latestListStep) { | |
| updateListStepLimit(latestListStep.id, limit); | |
| } | |
| { | |
| const latestListStep = getLatestListStep(browserSteps); | |
| if (latestListStep) { | |
| updateListStepLimit(latestListStep.id, limit); | |
| } | |
| } |
🧰 Tools
🪛 Biome (1.9.4)
[error] 396-396: Other switch clauses can erroneously access this declaration.
Wrap the declaration in a block to restrict its access to the switch clause.
The declaration is defined in this switch clause:
Unsafe fix: Wrap the declaration in a block.
(lint/correctness/noSwitchDeclarations)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (3)
server/src/browser-management/classes/RemoteBrowser.ts (3)
727-765:findSimilarElementsperforms a full-document scan – risk of O(N²) on large pagesThe helper queries all elements of the same tag across:
• main document
• every shadow DOM
• every iframe / frameand then runs a class-set similarity comparison on each one.
On long feeds (think social networks) this can easily touch tens of thousands of nodes and freeze the browser.Possible optimisations:
- Bail out early once
containers.length ≥ limit.- Restrict the search scope (e.g. sibling or parent section).
- Cache
classList→ element arrays per tag to avoid repeated DOM walks.Worth profiling with DevTools’ Performance tab before shipping.
543-547: Nit: remove Biome-flagged unnecessarycontinueStatic analysis (
noUnnecessaryContinue) spotted acontinuethat immediately jumps to the next loop iteration although no code follows it.
Dropping it makes the intent clearer and removes dead control flow.🧰 Tools
🪛 Biome (1.9.4)
[error] 545-545: Unnecessary continue statement
Unsafe fix: Delete the unnecessary continue statement
(lint/correctness/noUnnecessaryContinue)
448-449: Magic number5for default limitThe hard-coded default limit (
limit: number = 5) is sprinkled in multiple places.
Consider elevating it to a named constant (DEFAULT_LIST_LIMIT) to convey semantics and keep eventual tweaks centralised.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
server/src/browser-management/classes/RemoteBrowser.ts(2 hunks)
🧰 Additional context used
🪛 Biome (1.9.4)
server/src/browser-management/classes/RemoteBrowser.ts
[error] 545-545: Unnecessary continue statement
Unsafe fix: Delete the unnecessary continue statement
(lint/correctness/noUnnecessaryContinue)
| /** | ||
| * Extract data from a list of elements on a page | ||
| * @param page - Playwright Page object | ||
| * @param listSelector - CSS selector for the list container | ||
| * @param fields - Record of field configurations | ||
| * @param limit - Maximum number of items to extract (default: 5) | ||
| * @returns Promise<Array<Record<string, string>>> - Array of extracted data objects | ||
| */ | ||
| private async extractListData( | ||
| page: Page, | ||
| listSelector: string, | ||
| fields: Record<string, { | ||
| label: string; | ||
| selectorObj: { | ||
| selector: string; | ||
| attribute: string; | ||
| }; | ||
| }>, | ||
| limit: number = 5 | ||
| ): Promise<Array<Record<string, string>>> { | ||
| return await page.evaluate( | ||
| async ({ listSelector, fields, limit }: { | ||
| listSelector: string; | ||
| fields: Record<string, { | ||
| label: string; | ||
| selectorObj: { | ||
| selector: string; | ||
| attribute: string; | ||
| }; | ||
| }>; | ||
| limit: number; | ||
| }) => { | ||
| const convertedFields: Record<string, { | ||
| selector: string; | ||
| attribute: string; | ||
| }> = {}; | ||
|
|
||
| for (const [key, field] of Object.entries(fields)) { | ||
| convertedFields[field.label] = { | ||
| selector: field.selectorObj.selector, | ||
| attribute: field.selectorObj.attribute | ||
| }; | ||
| } | ||
|
|
||
| const queryElement = (rootElement: Element | Document, selector: string): Element | null => { | ||
| if (!selector.includes('>>') && !selector.includes(':>>')) { | ||
| return rootElement.querySelector(selector); | ||
| } | ||
|
|
||
| const parts = selector.split(/(?:>>|:>>)/).map(part => part.trim()); | ||
| let currentElement: Element | Document | null = rootElement; | ||
|
|
||
| for (let i = 0; i < parts.length; i++) { | ||
| if (!currentElement) return null; | ||
|
|
||
| if ((currentElement as Element).tagName === 'IFRAME' || (currentElement as Element).tagName === 'FRAME') { | ||
| try { | ||
| const frameElement = currentElement as HTMLIFrameElement | HTMLFrameElement; | ||
| const frameDoc = frameElement.contentDocument || frameElement.contentWindow?.document; | ||
| if (!frameDoc) return null; | ||
| currentElement = frameDoc.querySelector(parts[i]); | ||
| continue; | ||
| } catch (e) { | ||
| console.warn(`Cannot access ${(currentElement as Element).tagName.toLowerCase()} content:`, e); | ||
| return null; | ||
| } | ||
| } | ||
|
|
||
| let nextElement: Element | null = null; | ||
|
|
||
| if ('querySelector' in currentElement) { | ||
| nextElement = currentElement.querySelector(parts[i]); | ||
| } | ||
|
|
||
| if (!nextElement && 'shadowRoot' in currentElement && (currentElement as Element).shadowRoot) { | ||
| nextElement = (currentElement as Element).shadowRoot!.querySelector(parts[i]); | ||
| } | ||
|
|
||
| if (!nextElement && 'children' in currentElement) { | ||
| const children: any = Array.from((currentElement as Element).children || []); | ||
| for (const child of children) { | ||
| if (child.shadowRoot) { | ||
| nextElement = child.shadowRoot.querySelector(parts[i]); | ||
| if (nextElement) break; | ||
| } | ||
| } | ||
| } | ||
|
|
||
| currentElement = nextElement; | ||
| } | ||
|
|
||
| return currentElement as Element | null; | ||
| }; | ||
|
|
||
| const queryElementAll = (rootElement: Element | Document, selector: string): Element[] => { | ||
| if (!selector.includes('>>') && !selector.includes(':>>')) { | ||
| return Array.from(rootElement.querySelectorAll(selector)); | ||
| } | ||
|
|
||
| const parts = selector.split(/(?:>>|:>>)/).map(part => part.trim()); | ||
| let currentElements: (Element | Document)[] = [rootElement]; | ||
|
|
||
| for (const part of parts) { | ||
| const nextElements: Element[] = []; | ||
|
|
||
| for (const element of currentElements) { | ||
| if ((element as Element).tagName === 'IFRAME' || (element as Element).tagName === 'FRAME') { | ||
| try { | ||
| const frameElement = element as HTMLIFrameElement | HTMLFrameElement; | ||
| const frameDoc = frameElement.contentDocument || frameElement.contentWindow?.document; | ||
| if (frameDoc) { | ||
| nextElements.push(...Array.from(frameDoc.querySelectorAll(part))); | ||
| } | ||
| } catch (e) { | ||
| console.warn(`Cannot access ${(element as Element).tagName.toLowerCase()} content:`, e); | ||
| continue; | ||
| } | ||
| } else { | ||
| if ('querySelectorAll' in element) { | ||
| nextElements.push(...Array.from(element.querySelectorAll(part))); | ||
| } | ||
|
|
||
| if ('shadowRoot' in element && (element as Element).shadowRoot) { | ||
| nextElements.push(...Array.from((element as Element).shadowRoot!.querySelectorAll(part))); | ||
| } | ||
|
|
||
| if ('children' in element) { | ||
| const children = Array.from((element as Element).children || []); | ||
| for (const child of children) { | ||
| if (child.shadowRoot) { | ||
| nextElements.push(...Array.from(child.shadowRoot.querySelectorAll(part))); | ||
| } | ||
| } | ||
| } | ||
| } | ||
| } | ||
|
|
||
| currentElements = nextElements; | ||
| } | ||
|
|
||
| return currentElements as Element[]; | ||
| }; | ||
|
|
||
| function extractValue(element: Element, attribute: string): string | null { | ||
| if (!element) return null; | ||
|
|
||
| const baseURL = element.ownerDocument?.location?.href || window.location.origin; | ||
|
|
||
| if (element.shadowRoot) { | ||
| const shadowContent = element.shadowRoot.textContent; | ||
| if (shadowContent?.trim()) { | ||
| return shadowContent.trim(); | ||
| } | ||
| } | ||
|
|
||
| if (attribute === 'innerText') { | ||
| return (element as HTMLElement).innerText.trim(); | ||
| } else if (attribute === 'innerHTML') { | ||
| return element.innerHTML.trim(); | ||
| } else if (attribute === 'src' || attribute === 'href') { | ||
| if (attribute === 'href' && element.tagName !== 'A') { | ||
| const parentElement = element.parentElement; | ||
| if (parentElement && parentElement.tagName === 'A') { | ||
| const parentHref = parentElement.getAttribute('href'); | ||
| if (parentHref) { | ||
| try { | ||
| return new URL(parentHref, baseURL).href; | ||
| } catch (e) { | ||
| return parentHref; | ||
| } | ||
| } | ||
| } | ||
| } | ||
|
|
||
| const attrValue = element.getAttribute(attribute); | ||
| const dataAttr = attrValue || element.getAttribute('data-' + attribute); | ||
|
|
||
| if (!dataAttr || dataAttr.trim() === '') { | ||
| if (attribute === 'src') { | ||
| const style = window.getComputedStyle(element); | ||
| const bgImage = style.backgroundImage; | ||
| if (bgImage && bgImage !== 'none') { | ||
| const matches = bgImage.match(/url\(['"]?([^'")]+)['"]?\)/); | ||
| return matches ? new URL(matches[1], baseURL).href : null; | ||
| } | ||
| } | ||
| return null; | ||
| } | ||
|
|
||
| try { | ||
| return new URL(dataAttr, baseURL).href; | ||
| } catch (e) { | ||
| console.warn('Error creating URL from', dataAttr, e); | ||
| return dataAttr; // Return the original value if URL construction fails | ||
| } | ||
| } | ||
| return element.getAttribute(attribute); | ||
| } | ||
|
|
||
| function findTableAncestor(element: Element): { type: string; element: Element } | null { | ||
| let currentElement: Element | null = element; | ||
| const MAX_DEPTH = 5; | ||
| let depth = 0; | ||
|
|
||
| while (currentElement && depth < MAX_DEPTH) { | ||
| if (currentElement.getRootNode() instanceof ShadowRoot) { | ||
| currentElement = (currentElement.getRootNode() as ShadowRoot).host; | ||
| continue; | ||
| } | ||
|
|
||
| if (currentElement.tagName === 'TD') { | ||
| return { type: 'TD', element: currentElement }; | ||
| } else if (currentElement.tagName === 'TR') { | ||
| return { type: 'TR', element: currentElement }; | ||
| } | ||
|
|
||
| if (currentElement.tagName === 'IFRAME' || currentElement.tagName === 'FRAME') { | ||
| try { | ||
| const frameElement = currentElement as HTMLIFrameElement | HTMLFrameElement; | ||
| currentElement = frameElement.contentDocument?.body || null; | ||
| } catch (e) { | ||
| return null; | ||
| } | ||
| } else { | ||
| currentElement = currentElement.parentElement; | ||
| } | ||
| depth++; | ||
| } | ||
| return null; | ||
| } | ||
|
|
||
| function getCellIndex(td: Element): number { | ||
| if (td.getRootNode() instanceof ShadowRoot) { | ||
| const shadowRoot = td.getRootNode() as ShadowRoot; | ||
| const allCells = Array.from(shadowRoot.querySelectorAll('td')); | ||
| return allCells.indexOf(td as HTMLTableCellElement); | ||
| } | ||
|
|
||
| let index = 0; | ||
| let sibling = td; | ||
| while (sibling = sibling.previousElementSibling as Element) { | ||
| index++; | ||
| } | ||
| return index; | ||
| } | ||
|
|
||
| function hasThElement(row: Element, tableFields: Record<string, { selector: string; attribute: string }>): boolean { | ||
| for (const [_, { selector }] of Object.entries(tableFields)) { | ||
| const element = queryElement(row, selector); | ||
| if (element) { | ||
| let current: Element | ShadowRoot | Document | null = element; | ||
| while (current && current !== row) { | ||
| if (current.getRootNode() instanceof ShadowRoot) { | ||
| current = (current.getRootNode() as ShadowRoot).host; | ||
| continue; | ||
| } | ||
|
|
||
| if ((current as Element).tagName === 'TH') return true; | ||
|
|
||
| if ((current as Element).tagName === 'IFRAME' || (current as Element).tagName === 'FRAME') { | ||
| try { | ||
| const frameElement = current as HTMLIFrameElement | HTMLFrameElement; | ||
| current = frameElement.contentDocument?.body || null; | ||
| } catch (e) { | ||
| break; | ||
| } | ||
| } else { | ||
| current = (current as Element).parentElement; | ||
| } | ||
| } | ||
| } | ||
| } | ||
| return false; | ||
| } | ||
|
|
||
| function filterRowsBasedOnTag(rows: Element[], tableFields: Record<string, { selector: string; attribute: string }>): Element[] { | ||
| for (const row of rows) { | ||
| if (hasThElement(row, tableFields)) { | ||
| return rows; | ||
| } | ||
| } | ||
| return rows.filter(row => { | ||
| const directTH = row.getElementsByTagName('TH').length === 0; | ||
| const shadowTH = row.shadowRoot ? | ||
| row.shadowRoot.querySelector('th') === null : true; | ||
| return directTH && shadowTH; | ||
| }); | ||
| } | ||
|
|
||
| function calculateClassSimilarity(classList1: string[], classList2: string[]): number { | ||
| const set1 = new Set(classList1); | ||
| const set2 = new Set(classList2); | ||
| const intersection = new Set([...set1].filter(x => set2.has(x))); | ||
| const union = new Set([...set1, ...set2]); | ||
| return intersection.size / union.size; | ||
| } | ||
|
|
||
| function findSimilarElements(baseElement: Element, similarityThreshold: number = 0.7): Element[] { | ||
| const baseClasses = Array.from(baseElement.classList); | ||
| if (baseClasses.length === 0) return []; | ||
|
|
||
| const allElements: Element[] = []; | ||
|
|
||
| allElements.push(...Array.from(document.getElementsByTagName(baseElement.tagName))); | ||
|
|
||
| if (baseElement.getRootNode() instanceof ShadowRoot) { | ||
| const shadowHost = (baseElement.getRootNode() as ShadowRoot).host; | ||
| allElements.push(...Array.from(shadowHost.getElementsByTagName(baseElement.tagName))); | ||
| } | ||
|
|
||
| const frames = [ | ||
| ...Array.from(document.getElementsByTagName('iframe')), | ||
| ...Array.from(document.getElementsByTagName('frame')) | ||
| ]; | ||
|
|
||
| for (const frame of frames) { | ||
| try { | ||
| const frameElement = frame as HTMLIFrameElement | HTMLFrameElement; | ||
| const frameDoc = frameElement.contentDocument || frameElement.contentWindow?.document; | ||
| if (frameDoc) { | ||
| allElements.push(...Array.from(frameDoc.getElementsByTagName(baseElement.tagName))); | ||
| } | ||
| } catch (e) { | ||
| console.warn(`Cannot access ${frame.tagName.toLowerCase()} content:`, e); | ||
| } | ||
| } | ||
|
|
||
| return allElements.filter(element => { | ||
| if (element === baseElement) return false; | ||
| const similarity = calculateClassSimilarity( | ||
| baseClasses, | ||
| Array.from(element.classList) | ||
| ); | ||
| return similarity >= similarityThreshold; | ||
| }); | ||
| } | ||
|
|
||
| let containers = queryElementAll(document, listSelector); | ||
|
|
||
| if (containers.length === 0) return []; | ||
|
|
||
| if (limit > 1 && containers.length === 1) { | ||
| const baseContainer = containers[0]; | ||
| const similarContainers = findSimilarElements(baseContainer); | ||
|
|
||
| if (similarContainers.length > 0) { | ||
| const newContainers = similarContainers.filter(container => | ||
| !container.matches(listSelector) | ||
| ); | ||
| containers = [...containers, ...newContainers]; | ||
| } | ||
| } | ||
|
|
||
| const containerFields = containers.map(() => ({ | ||
| tableFields: {} as Record<string, { | ||
| selector: string; | ||
| attribute: string; | ||
| tableContext?: string; | ||
| cellIndex?: number; | ||
| }>, | ||
| nonTableFields: {} as Record<string, { | ||
| selector: string; | ||
| attribute: string; | ||
| }> | ||
| })); | ||
|
|
||
| containers.forEach((container, containerIndex) => { | ||
| for (const [label, field] of Object.entries(convertedFields)) { | ||
| const sampleElement = queryElement(container, field.selector); | ||
|
|
||
| if (sampleElement) { | ||
| const ancestor = findTableAncestor(sampleElement); | ||
| if (ancestor) { | ||
| containerFields[containerIndex].tableFields[label] = { | ||
| ...field, | ||
| tableContext: ancestor.type, | ||
| cellIndex: ancestor.type === 'TD' ? getCellIndex(ancestor.element) : -1 | ||
| }; | ||
| } else { | ||
| containerFields[containerIndex].nonTableFields[label] = field; | ||
| } | ||
| } else { | ||
| containerFields[containerIndex].nonTableFields[label] = field; | ||
| } | ||
| } | ||
| }); | ||
|
|
||
| const tableData: Array<Record<string, string>> = []; | ||
| const nonTableData: Array<Record<string, string>> = []; | ||
|
|
||
| for (let containerIndex = 0; containerIndex < containers.length; containerIndex++) { | ||
| const container = containers[containerIndex]; | ||
| const { tableFields } = containerFields[containerIndex]; | ||
|
|
||
| if (Object.keys(tableFields).length > 0) { | ||
| const firstField = Object.values(tableFields)[0]; | ||
| const firstElement = queryElement(container, firstField.selector); | ||
| let tableContext: Element | null = firstElement; | ||
|
|
||
| while (tableContext && tableContext.tagName !== 'TABLE' && tableContext !== container) { | ||
| if (tableContext.getRootNode() instanceof ShadowRoot) { | ||
| tableContext = (tableContext.getRootNode() as ShadowRoot).host; | ||
| continue; | ||
| } | ||
|
|
||
| if (tableContext.tagName === 'IFRAME' || tableContext.tagName === 'FRAME') { | ||
| try { | ||
| const frameElement = tableContext as HTMLIFrameElement | HTMLFrameElement; | ||
| tableContext = frameElement.contentDocument?.body || null; | ||
| } catch (e) { | ||
| break; | ||
| } | ||
| } else { | ||
| tableContext = tableContext.parentElement; | ||
| } | ||
| } | ||
|
|
||
| if (tableContext) { | ||
| const rows: Element[] = []; | ||
|
|
||
| rows.push(...Array.from(tableContext.getElementsByTagName('TR'))); | ||
|
|
||
| if (tableContext.tagName === 'IFRAME' || tableContext.tagName === 'FRAME') { | ||
| try { | ||
| const frameElement = tableContext as HTMLIFrameElement | HTMLFrameElement; | ||
| const frameDoc = frameElement.contentDocument || frameElement.contentWindow?.document; | ||
| if (frameDoc) { | ||
| rows.push(...Array.from(frameDoc.getElementsByTagName('TR'))); | ||
| } | ||
| } catch (e) { | ||
| console.warn(`Cannot access ${tableContext.tagName.toLowerCase()} rows:`, e); | ||
| } | ||
| } | ||
|
|
||
| const processedRows = filterRowsBasedOnTag(rows, tableFields); | ||
|
|
||
| for (let rowIndex = 0; rowIndex < Math.min(processedRows.length, limit); rowIndex++) { | ||
| const record: Record<string, string> = {}; | ||
| const currentRow = processedRows[rowIndex]; | ||
|
|
||
| for (const [label, { selector, attribute, cellIndex }] of Object.entries(tableFields)) { | ||
| let element: Element | null = null; | ||
|
|
||
| if (cellIndex !== undefined && cellIndex >= 0) { | ||
| let td: Element | null = currentRow.children[cellIndex] || null; | ||
|
|
||
| if (!td && currentRow.shadowRoot) { | ||
| const shadowCells = currentRow.shadowRoot.children; | ||
| if (shadowCells && shadowCells.length > cellIndex) { | ||
| td = shadowCells[cellIndex]; | ||
| } | ||
| } | ||
|
|
||
| if (td) { | ||
| element = queryElement(td, selector); | ||
|
|
||
| if (!element && selector.split(/(?:>>|:>>)/).pop()?.includes('td:nth-child')) { | ||
| element = td; | ||
| } | ||
|
|
||
| if (!element) { | ||
| const tagOnlySelector = selector.split('.')[0]; | ||
| element = queryElement(td, tagOnlySelector); | ||
| } | ||
|
|
||
| if (!element) { | ||
| let currentElement: Element | null = td; | ||
| while (currentElement && currentElement.children.length > 0) { | ||
| let foundContentChild = false; | ||
| for (const child of Array.from(currentElement.children)) { | ||
| if (extractValue(child, attribute)) { | ||
| currentElement = child; | ||
| foundContentChild = true; | ||
| break; | ||
| } | ||
| } | ||
| if (!foundContentChild) break; | ||
| } | ||
| element = currentElement; | ||
| } | ||
| } | ||
| } else { | ||
| element = queryElement(currentRow, selector); | ||
| } | ||
|
|
||
| if (element) { | ||
| const value = extractValue(element, attribute); | ||
| if (value !== null) { | ||
| record[label] = value; | ||
| } | ||
| } | ||
| } | ||
|
|
||
| if (Object.keys(record).length > 0) { | ||
| tableData.push(record); | ||
| } | ||
| } | ||
| } | ||
| } | ||
| } | ||
|
|
||
| for (let containerIndex = 0; containerIndex < containers.length; containerIndex++) { | ||
| if (nonTableData.length >= limit) break; | ||
|
|
||
| const container = containers[containerIndex]; | ||
| const { nonTableFields } = containerFields[containerIndex]; | ||
|
|
||
| if (Object.keys(nonTableFields).length > 0) { | ||
| const record: Record<string, string> = {}; | ||
|
|
||
| for (const [label, { selector, attribute }] of Object.entries(nonTableFields)) { | ||
| const relativeSelector = selector.split(/(?:>>|:>>)/).slice(-1)[0]; | ||
| const element = queryElement(container, relativeSelector); | ||
|
|
||
| if (element) { | ||
| const value = extractValue(element, attribute); | ||
| if (value !== null) { | ||
| record[label] = value; | ||
| } | ||
| } | ||
| } | ||
|
|
||
| if (Object.keys(record).length > 0) { | ||
| nonTableData.push(record); | ||
| } | ||
| } | ||
| } | ||
|
|
||
| const scrapedData = [...tableData, ...nonTableData].slice(0, limit); | ||
| return scrapedData; | ||
| }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
extractListData is a 500-line page.evaluate — unmaintainable & hard to test
Embedding such a large, multi-purpose scraping engine inside page.evaluate has several downsides:
- Readability/maintainability: one method now accounts for ~45 % of the file.
- No type safety or linting inside the browser context; runtime errors surface only in production.
- Hot-reloading & coverage tools can’t exercise the in-page code.
- Bundle size & perf: the whole string is sent over CDP on every call.
Recommended refactor (can be incremental):
• Move the browser-side code to a dedicated script (e.g. extractListData.browser.ts) and load it once with page.addInitScript / evaluateOnNewDocument.
• Keep a thin wrapper here that forwards parameters and returns the result.
• Write unit / integration tests for the extraction logic in isolation.
This will dramatically improve readability, enable proper TS checks, and reduce wire traffic.
🧰 Tools
🪛 Biome (1.9.4)
[error] 545-545: Unnecessary continue statement
Unsafe fix: Delete the unnecessary continue statement
(lint/correctness/noUnnecessaryContinue)
| this.socket.on('extractListData', async (data: { | ||
| listSelector: string, | ||
| fields: Record<string, any>, | ||
| currentListId: number, | ||
| pagination: any | ||
| }) => { | ||
| if (this.currentPage) { | ||
| const extractedData = await this.extractListData( | ||
| this.currentPage, | ||
| data.listSelector, | ||
| data.fields | ||
| ); | ||
|
|
||
| this.socket.emit('listDataExtracted', { | ||
| currentListId: data.currentListId, | ||
| data: extractedData | ||
| }); | ||
| } | ||
| }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing limit propagation breaks the UI-requested page size
The client is already sending a pagination object, yet the listener discards it and always calls extractListData with the default limit of 5.
Users who change the limit in the UI will therefore see silently truncated results, defeating the very “faster output preview” goal of this PR.
Forward the requested limit (and consider making pagination optional in the method’s signature).
- const extractedData = await this.extractListData(
- this.currentPage,
- data.listSelector,
- data.fields
- );
+ const extractedData = await this.extractListData(
+ this.currentPage,
+ data.listSelector,
+ data.fields,
+ // Default to 5 only when the caller provides nothing
+ data.pagination?.limit ?? 5
+ );Committable suggestion skipped: line range outside the PR's diff.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (1)
src/components/browser/BrowserWindow.tsx (1)
255-267: Consider adding error handling for socket operations.The implementation of socket event listeners and emitters lacks error handling. Consider adding error handling to gracefully manage connection issues or failed operations.
// For the socket event listener socket.on('listDataExtracted', (response) => { + try { const { currentListId, data } = response; updateListStepData(currentListId, data); + } catch (error) { + console.error('Error processing list data:', error); + // Optionally notify the user + notify('error', 'Failed to process list data'); + } }); // For socket emissions +try { socket?.emit('extractListData', { listSelector, fields: updatedFields, currentListId, pagination: { type: '', selector: paginationSelector } }); +} catch (error) { + console.error('Error sending list data for extraction:', error); + // Optionally notify the user + notify('error', 'Failed to send data for extraction'); +}Also applies to: 404-410, 476-482
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/components/browser/BrowserWindow.tsx(4 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
src/components/browser/BrowserWindow.tsx (1)
src/context/browserSteps.tsx (1)
useBrowserSteps(192-198)
🔇 Additional comments (4)
src/components/browser/BrowserWindow.tsx (4)
90-90: LGTM: Hook updated to include the newupdateListStepDatamethod.The addition of
updateListStepDatato the destructured methods fromuseBrowserStepsis appropriate for the new functionality being implemented.
255-267: Socket event listener for real-time list data updates looks good.This new
useEffectproperly handles the socket event lifecycle by adding a listener for 'listDataExtracted' and cleaning it up on unmount. This implementation enables real-time updates of list data from the server.
404-410: New socket emission for list data extraction looks good.This addition enables the server to extract list data based on the current state, which is part of the "faster output preview" feature. The data is sent before adding the list step, which ensures synchronization between client and server.
476-482: New socket emission for list data extraction looks good.This addition for the attribute selection case correctly emits the 'extractListData' event with all necessary data to update the server-side state.
| const updatedFields = { | ||
| ...fields, | ||
| [newField.id]: newField | ||
| }; | ||
|
|
||
| setFields(updatedFields); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Consider using functional state updates to prevent race conditions.
Similar to the earlier instance, the state update has been changed from a functional update pattern to directly spreading the current state, which could lead to race conditions.
-const updatedFields = {
- ...fields,
- [newField.id]: newField
-};
-
-setFields(updatedFields);
+setFields(prevFields => ({
+ ...prevFields,
+ [newField.id]: newField
+}));
// Then use the callback pattern when passing to socket.emit, similar to the earlier suggestion| const updatedFields = { | ||
| ...fields, | ||
| [newField.id]: newField | ||
| }; | ||
|
|
||
| setFields(updatedFields); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Consider using functional state updates to prevent race conditions.
The state update has been changed from a functional update pattern to directly spreading the current state. This approach could lead to race conditions if multiple state updates occur concurrently.
-const updatedFields = {
- ...fields,
- [newField.id]: newField
-};
-
-setFields(updatedFields);
+setFields(prevFields => ({
+ ...prevFields,
+ [newField.id]: newField
+}));
// Then use the callback pattern when passing to socket.emit
+const getUpdatedFields = () => ({
+ ...fields,
+ [newField.id]: newField
+});
// Later in the code:
+socket?.emit('extractListData', {
+ listSelector,
+ fields: getUpdatedFields(),
+ currentListId,
+ pagination: { type: '', selector: paginationSelector }
+});Committable suggestion skipped: line range outside the PR's diff.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (1)
src/components/run/InterpretationLog.tsx (1)
341-382: Potential mismatch between header labels and row keys
row[field.label]assumes that every data row uses the human-readable label as its object key.
If the backend sends canonical keys (e.g.,field.name/field.id) this will render empty cells.Consider using the field’s canonical identifier when indexing:
- {row[field.label]} + {row[field.key ?? field.name ?? field.label]}…and ensure
fieldsincludes that identifier.Also,
Object.values(captureListData[captureListPage]?.fields || {})does not guarantee column order.
If the backend provides an ordered array, prefer that overObject.values.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/components/run/InterpretationLog.tsx(7 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
src/components/run/InterpretationLog.tsx (1)
src/context/browserSteps.tsx (1)
useBrowserSteps(208-214)
| const updateActiveTab = useCallback(() => { | ||
| const availableTabs = getAvailableTabs(); | ||
|
|
||
| if (captureListData.length > 0 && availableTabs.findIndex(tab => tab.id === 'captureList') !== -1) { | ||
| setActiveTab(availableTabs.findIndex(tab => tab.id === 'captureList')); | ||
| } else if (captureTextData.length > 0 && availableTabs.findIndex(tab => tab.id === 'captureText') !== -1) { | ||
| setActiveTab(availableTabs.findIndex(tab => tab.id === 'captureText')); | ||
| } else if (screenshotData.length > 0 && availableTabs.findIndex(tab => tab.id === 'captureScreenshot') !== -1) { | ||
| setActiveTab(availableTabs.findIndex(tab => tab.id === 'captureScreenshot')); | ||
| } | ||
| }, [captureListData.length, captureTextData.length, screenshotData.length]); | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
updateActiveTab never re-evaluates when its inputs change
Because updateActiveTab is memoised with useCallback, it will not run automatically once captureListData / captureTextData / screenshotData are updated; it is only executed manually from the earlier effect (which itself suffers from timing issues).
A safer pattern is:
// Remove the useCallback wrapper altogether
const updateActiveTab = () => { ... };
// Recalculate whenever any of the data arrays change.
useEffect(updateActiveTab, [
captureListData.length,
captureTextData.length,
screenshotData.length,
]);This guarantees the visible tab always reflects the latest available datasets.
| useEffect(() => { | ||
| if (activeActionId !== null) { | ||
| const textSteps = browserSteps.filter(step => step.type === 'text'); | ||
| if (textSteps.length > 0) { | ||
| const textDataRow: Record<string, string> = {}; | ||
| textSteps.forEach(step => { | ||
| textDataRow[step.label] = step.data; | ||
| }); | ||
|
|
||
| setCaptureTextData([textDataRow]); | ||
| } | ||
| } else if (type === 'captureText') { | ||
| if (Array.isArray(data)) { | ||
| setCaptureTextData(data); | ||
| } else { | ||
| setCaptureTextData([data]); | ||
| } | ||
| if (captureTextData.length === 0) { | ||
| const availableTabs = getAvailableTabs(); | ||
| const tabIndex = availableTabs.findIndex(tab => tab.id === 'captureText'); | ||
| if (tabIndex !== -1) setActiveTab(tabIndex); | ||
|
|
||
| const listSteps = browserSteps.filter(step => step.type === 'list'); | ||
| if (listSteps.length > 0) { | ||
| setCaptureListData(listSteps); | ||
| } | ||
|
|
||
| updateActiveTab(); | ||
| } | ||
|
|
||
| scrollLogToBottom(); | ||
| }, [captureListData.length, captureTextData.length, t]); | ||
| }, [activeActionId, browserSteps, t]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Filter browserSteps by the active action ID and defer tab–selection until after state updates
activeActionId is only used to trigger this useEffect, yet it’s never employed in the filtering logic.
Consequently, every list/text step across all actions is shown whenever any action becomes active.
In addition, updateActiveTab() is invoked before setCaptureListData / setCaptureTextData complete, so the tab can be computed from stale (still-empty) arrays.
-const textSteps = browserSteps.filter(step => step.type === 'text');
+const textSteps = browserSteps.filter(
+ step => step.type === 'text' && step.actionId === activeActionId
+);
...
-const listSteps = browserSteps.filter(step => step.type === 'list');
+const listSteps = browserSteps.filter(
+ step => step.type === 'list' && step.actionId === activeActionId
+);
- updateActiveTab();
+ // Let the state setters finish first; a separate effect will switch tabs.Follow-up: add a second useEffect that watches [captureListData, captureTextData, screenshotData] and calls updateActiveTab() there.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| useEffect(() => { | |
| if (activeActionId !== null) { | |
| const textSteps = browserSteps.filter(step => step.type === 'text'); | |
| if (textSteps.length > 0) { | |
| const textDataRow: Record<string, string> = {}; | |
| textSteps.forEach(step => { | |
| textDataRow[step.label] = step.data; | |
| }); | |
| setCaptureTextData([textDataRow]); | |
| } | |
| } else if (type === 'captureText') { | |
| if (Array.isArray(data)) { | |
| setCaptureTextData(data); | |
| } else { | |
| setCaptureTextData([data]); | |
| } | |
| if (captureTextData.length === 0) { | |
| const availableTabs = getAvailableTabs(); | |
| const tabIndex = availableTabs.findIndex(tab => tab.id === 'captureText'); | |
| if (tabIndex !== -1) setActiveTab(tabIndex); | |
| const listSteps = browserSteps.filter(step => step.type === 'list'); | |
| if (listSteps.length > 0) { | |
| setCaptureListData(listSteps); | |
| } | |
| updateActiveTab(); | |
| } | |
| scrollLogToBottom(); | |
| }, [captureListData.length, captureTextData.length, t]); | |
| }, [activeActionId, browserSteps, t]); | |
| useEffect(() => { | |
| if (activeActionId !== null) { | |
| const textSteps = browserSteps.filter( | |
| step => step.type === 'text' && step.actionId === activeActionId | |
| ); | |
| if (textSteps.length > 0) { | |
| const textDataRow: Record<string, string> = {}; | |
| textSteps.forEach(step => { | |
| textDataRow[step.label] = step.data; | |
| }); | |
| setCaptureTextData([textDataRow]); | |
| } | |
| const listSteps = browserSteps.filter( | |
| step => step.type === 'list' && step.actionId === activeActionId | |
| ); | |
| if (listSteps.length > 0) { | |
| setCaptureListData(listSteps); | |
| } | |
| // Let the state setters finish first; a separate effect will switch tabs. | |
| } | |
| }, [activeActionId, browserSteps, t]); |
What this PR does?
Improves performance of the existing output preview functionality to get faster data previews.
Closes: #567
Summary by CodeRabbit