Feat: Add image extraction for webpages (media parsing) #818

Aman-Raj-bat · 2025-10-03T14:00:04Z

Media Parsing: Image Extraction

Extracts all image URLs and alt text from a given webpage.
Converts relative URLs to absolute URLs.
Removes duplicate images.
Returns an array of objects: { url, altText }.

Closes Issue #164

Summary by CodeRabbit

New Features
- Extract images from a webpage URL with alt text; handle responsive sources, resolve relative links, and deduplicate.
- On-click media capture in the browser: capture images and PDFs, extract text (including OCR for images and PDF text), and forward captured media/text to the active session/workflow.
Behavior Changes
- Iframe-originated media messages are now strictly origin-validated before forwarding.
- Fetch and processing errors surface clear messages; invalid media URLs generate warnings.
Chores
- Added HTTP/HTML parsing, OCR, and PDF libraries.

coderabbitai · 2025-10-03T14:00:20Z

Walkthrough

Adds a media extraction pipeline: new server-side mediaParser.extractImages(url); in-page scraper that extracts image/PDF text (alt/title, OCR, pdf.js) and posts maxun:media-extracted; front-end relay emits dom:media-extracted to the server; socket input handler forwards to the active generator; WorkflowGenerator records/emit media events; dependency updates.

Changes

Cohort / File(s)	Summary of Changes
Media parsing module `mediaParser.js`	New async `extractImages(url)` — validates input, fetches HTML with `axios` (10s timeout, 10MB limits, custom User-Agent, up to 5 redirects), requires HTML content-type, parses with `cheerio`, extracts images from `<img>` (`src`/`srcset`) and `<picture>`/`<source>` (`srcset`), resolves absolute URLs, deduplicates (ignores `data:`), preserves `altText`, logs warnings for invalid URLs, returns `[{ url, altText }]`, logs and throws on failure.
Client-side media scraper `maxun-core/src/browserSide/scraper.js`	Adds click handler and helpers to extract media: image text via `alt/title` or Tesseract OCR fallback, PDF text via `pdfjs-dist` across pages, builds structural selector (via `GetSelectorStructural` if present), posts `maxun:media-extracted` to parent with `{ url, tag, selector, extractedText }`; dynamically loads `tesseract.js` and `pdfjs-dist`.
Front-end relay `src/components/recorder/DOMBrowserRenderer.tsx`	Adds effect listening for `message` events of type `maxun:media-extracted`; strictly validates origin matches recorded iframe (and that `data.url` has same origin when snapshot.baseUrl present), constructs payload `{ url, tag, selector, extractedText }`, and emits `dom:media-extracted` on the socket; cleans up listener on unmount.
Server socket input handler `server/src/browser-management/inputHandlers.ts`	Adds `onMediaExtracted` wrapper and `handleMediaExtracted` to forward media extraction payloads to the active generator with page-closed checks, error handling/logging, and registers `dom:media-extracted` socket listener.
Workflow generator `server/src/workflow-management/classes/Generator.ts`	Adds public method `handleMediaExtracted(data, page)` that appends a media event to `workflowRecord.events` (type, url, tag, selector, extractedText, timestamp) and attempts to emit `workflow:media-added` to the client; errors are caught and logged.
Dependencies `package.json`	Bumps `axios` to `^1.12.2`; adds `cheerio` `^1.1.2`, `pdfjs-dist` `^5.4.296`, and `tesseract.js` `^6.0.1`.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant U as User (click)
    participant B as In-page Scraper
    participant P as Parent Window (iframe)
    participant R as DOMBrowserRenderer
    participant S as Socket / Server
    participant H as Input Handler
    participant G as WorkflowGenerator

    U->>B: click element (img / iframe / object)
    alt image
        B->>B: resolve url, read alt/title or OCR (tesseract)
    else pdf/frame
        B->>B: fetch/read PDF text (pdfjs)
    end
    B->>P: postMessage {type: "maxun:media-extracted", url, tag, selector, extractedText}
    P->>R: message received (origin checked)
    R->>S: socket.emit "dom:media-extracted" {url, tag, selector, extractedText}
    S->>H: receive dom:media-extracted
    H->>G: call handleMediaExtracted(data, page)
    G->>G: append media event to workflowRecord.events (timestamp)
    G-->>S: emit "workflow:media-added" (best-effort)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

chore: bump @axios from 0.26.0 to 1.9.0 #637 — related axios dependency bump and context for axios usage in the new media parsing code.

Suggested labels

Type: Feature, Scope: Recorder

Suggested reviewers

amhsirak
RohitR311

Poem

I hop through pages, whiskers keen,
sniff srcsets, alt text, every scene.
Axios fetches, Cheerio peers,
Tesseract reads what sight once feared.
A rabbit cheers: media found — hop, click, glean! 🐇📸

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check	✅ Passed	The title succinctly highlights the primary feature added—image extraction for webpages via media parsing—mirroring the main functional change implemented in the PR and is clear and concise.
Docstring Coverage	✅ Passed	No functions found in the changes. Docstring coverage check skipped.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3a68e60 and 9768844.

📒 Files selected for processing (1)

src/components/recorder/DOMBrowserRenderer.tsx (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

src/components/recorder/DOMBrowserRenderer.tsx

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Aman-Raj-bat · 2025-10-03T14:02:03Z

Worked on Issue #164 to add image extraction for webpages. Let me know if any improvements are needed!

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (6)

mediaParser.js (6)

1-3: Consider module system consistency.

The code uses CommonJS (require) syntax. If the rest of the codebase uses ES6 modules (import/export), consider updating for consistency.

-const axios = require('axios');
-const cheerio = require('cheerio');
-const { URL } = require('url');
+import axios from 'axios';
+import * as cheerio from 'cheerio';
+import { URL } from 'url';

5-6: Add input validation for the URL parameter.

The function does not validate the url parameter before using it. Invalid or malicious input could cause unexpected behavior.

 async function extractImages(url) {
+    if (!url || typeof url !== 'string') {
+        throw new TypeError('URL must be a non-empty string');
+    }
     try {

40-43: Improve error handling and logging.

Current issues:

console.log is not appropriate for production error logging. Use a proper logging framework (e.g., winston, which is already a dependency).
Returning an empty array silently swallows errors, making debugging difficult for callers. Consider re-throwing the error or returning a Result/Either type.

If you want to maintain the current behavior of returning an empty array, at least use proper logging:

     } catch (error) {
-        console.log('Oops! Something went wrong while fetching images:', error.message);
+        // Assuming winston logger is available
+        logger.error('Failed to extract images from URL', { url, error: error.message });
         return [];
     }

Alternatively, consider letting the error propagate to the caller:

     } catch (error) {
-        console.log('Oops! Something went wrong while fetching images:', error.message);
-        return [];
+        throw new Error(`Failed to extract images from ${url}: ${error.message}`);
     }

7-8: Configure axios with timeout and headers for reliability.

The current implementation lacks:

Timeout: Requests could hang indefinitely
User-Agent: Some websites block requests without a proper User-Agent header
Response size limits: Large responses could cause memory issues
Redirect handling: Axios follows redirects by default, but this should be explicit

-        // 1. Fetch HTML
-        const { data: html } = await axios.get(url);
+        // 1. Fetch HTML with proper configuration
+        const { data: html } = await axios.get(url, {
+            timeout: 10000, // 10 second timeout
+            maxContentLength: 10 * 1024 * 1024, // 10MB limit
+            maxBodyLength: 10 * 1024 * 1024,
+            headers: {
+                'User-Agent': 'Mozilla/5.0 (compatible; MaxunBot/1.0; +https://maxun.dev)'
+            },
+            maxRedirects: 5
+        });

10-11: LGTM with optional enhancement.

The cheerio loading is correct. Optionally, you could add parsing options for better HTML handling:

-        // 2. Load HTML into cheerio
-        const $ = cheerio.load(html);
+        // 2. Load HTML into cheerio with lenient parsing
+        const $ = cheerio.load(html, {
+            decodeEntities: true,
+            normalizeWhitespace: false
+        });

21-28: Good URL resolution logic.

The URL resolution correctly handles relative paths and silently skips invalid URLs.

Consider also extracting URLs from the srcset attribute for responsive images:

         $('img').each((index, element) => {
             let src = $(element).attr('src');
+            const srcset = $(element).attr('srcset');
             const alt = $(element).attr('alt') || '';
 
             if (src) {
                 // ... existing src handling ...
             }
+            
+            // Handle srcset for responsive images
+            if (srcset) {
+                const srcsetUrls = srcset.split(',').map(s => s.trim().split(/\s+/)[0]);
+                for (const srcsetUrl of srcsetUrls) {
+                    try {
+                        const absoluteUrl = new URL(srcsetUrl, url).href;
+                        if (!seen.has(absoluteUrl)) {
+                            seen.add(absoluteUrl);
+                            images.push({ url: absoluteUrl, altText: alt });
+                        }
+                    } catch {
+                        // skip invalid URLs
+                    }
+                }
+            }
         });

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 95a4d3c and 3a102be.

📒 Files selected for processing (2)

mediaParser.js (1 hunks)
package.json (2 hunks)

🔇 Additional comments (5)

mediaParser.js (3)

13-14: LGTM!

Appropriate data structures: array for results and Set for efficient deduplication.

16-20: LGTM!

The loop structure and attribute extraction are correct. Good handling of missing alt text with a default empty string.

30-35: LGTM!

Deduplication logic is correct and efficient. The result object structure { url, altText } matches the PR requirements.

package.json (2)

31-31: cheerio version verified Cheerio ^1.1.2 is available on npm.

27-27: Axios ^1.12.2 exists and addresses all known vulnerabilities.

mediaParser.js

coderabbitai · 2025-10-03T14:03:55Z

package.json

    "styled-components": "^5.3.3",
    "swagger-jsdoc": "^6.2.8",
    "swagger-ui-express": "^5.0.1",
+    "tesseract.js": "^6.0.1",


⚠️ Potential issue | 🟡 Minor

Unused dependency: tesseract.js is not referenced in the current implementation.

The tesseract.js library was added but is not imported or used in mediaParser.js. Consider removing it unless it's planned for future use (e.g., OCR on images).

If tesseract.js is intended for future functionality, consider adding a comment in the code or creating a follow-up issue to track its implementation.

🤖 Prompt for AI Agents

In package.json around line 84, the dependency "tesseract.js": "^6.0.1" is unused in the codebase (not imported in mediaParser.js); remove the tesseract.js entry from package.json and run npm/yarn install to update lockfile, or if it is intended for future OCR work, leave the dependency but add a short TODO comment in the relevant module or create a follow-up issue referencing this dependency and its planned usage so it is tracked.

…better error handling

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (2)

mediaParser.js (2)

12-22: Consider validating Content-Type before parsing.

The fetch configuration is solid, but the function doesn't verify that the response is HTML. Parsing non-HTML content (JSON, XML, binary data) could lead to unexpected results or waste processing.

Add a Content-Type check after the axios call:

         const { data: html } = await axios.get(url, {
             timeout: 10000,
             maxContentLength: 10 * 1024 * 1024,
             maxBodyLength: 10 * 1024 * 1024,
             headers: {
                 'User-Agent': 'Mozilla/5.0 (compatible; MaxunBot/1.0; +https://maxun.dev)'
             },
             maxRedirects: 5
         });
+
+        // Validate Content-Type
+        const contentType = response.headers['content-type'] || '';
+        if (!contentType.includes('text/html')) {
+            throw new Error(`Expected HTML but got ${contentType}`);
+        }

Note: You'll need to capture the full response object instead of destructuring just data:

-        const { data: html } = await axios.get(url, {
+        const response = await axios.get(url, {
             timeout: 10000,
             maxContentLength: 10 * 1024 * 1024,
             maxBodyLength: 10 * 1024 * 1024,
             headers: {
                 'User-Agent': 'Mozilla/5.0 (compatible; MaxunBot/1.0; +https://maxun.dev)'
             },
             maxRedirects: 5
         });
+        
+        const contentType = response.headers['content-type'] || '';
+        if (!contentType.includes('text/html')) {
+            throw new Error(`Expected HTML but got ${contentType}`);
+        }
+        
+        const html = response.data;

50-66: Consider filtering empty srcset URLs.

The srcset parsing logic is mostly correct, but malformed srcset attributes (e.g., trailing commas) could result in empty strings that get passed to new URL(), which would resolve to the base URL.

Filter out empty URLs before processing:

             const srcset = $(element).attr('srcset');
             if (srcset) {
-                const srcsetUrls = srcset.split(',').map(s => s.trim().split(/\s+/)[0]);
+                const srcsetUrls = srcset.split(',')
+                    .map(s => s.trim().split(/\s+/)[0])
+                    .filter(Boolean); // Remove empty strings
                 for (const srcsetUrl of srcsetUrls) {
                     try {
                         const absoluteUrl = new URL(srcsetUrl, url).href;
                         if (!seen.has(absoluteUrl)) {
                             seen.add(absoluteUrl);
                             images.push({ url: absoluteUrl, altText: alt });
                         }
                     } catch {
                         logger.warn(`Invalid srcset URL: ${srcsetUrl}`);
                     }
                 }
             }

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3a102be and df2efee.

📒 Files selected for processing (1)

mediaParser.js (1 hunks)

🔇 Additional comments (4)

mediaParser.js (4)

6-10: LGTM: Input validation is appropriate.

The function correctly validates that the URL is a non-empty string and throws a TypeError for invalid input.

24-28: LGTM: Cheerio configuration is appropriate.

The cheerio load options correctly decode HTML entities (useful for alt text) while preserving original whitespace.

30-48: LGTM: src handling is correct.

The function properly:

Extracts the src attribute and alt text

Resolves relative URLs to absolute URLs

Deduplicates images using a Set

Handles invalid URLs gracefully with logging

68-73: LGTM: Error handling is appropriate.

The function correctly logs errors with context and throws a descriptive error to propagate failures upstream, which is better than silently returning an empty array.

- Switch to ES6 module syntax (import/export) - Add URL input validation - Configure axios for timeout, headers, content limits - Validate Content-Type before parsing HTML - Handle srcset with empty URL filtering - Remove duplicate images - Use Winston logger for warnings and errors - Add human-friendly JSDoc comments

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (1)

mediaParser.js (1)
101-101: Critical: Missing module export.

The extractImages function is still not exported, making it unusable by other modules. This issue was flagged in a previous review and remains unresolved.

Add an export statement at the end of the file:
     }
 }
+
+export { extractImages };
Or for CommonJS:
     }
 }
+
+module.exports = { extractImages };

🧹 Nitpick comments (2)

mediaParser.js (2)
52-92: Consider handling edge cases.

The image extraction logic is solid with proper deduplication and URL resolution. However, consider these optional refinements:
Data URLs: The current implementation includes data: URLs, which may bloat the results. Consider filtering them if not needed:
if (absoluteUrl.startsWith('data:')) continue;
<picture> elements: Modern responsive images often use <picture> with <source> tags. Consider extracting from <source srcset="..."> within <picture> elements for completeness.
94-100: Preserve error stack trace.

The error handling logs and rethrows correctly, but the original error's stack trace is lost when creating a new Error. Consider preserving it for better debugging:
-        throw new Error(`Failed to extract images from ${url}: ${error.message}`);
+        error.message = `Failed to extract images from ${url}: ${error.message}`;
+        throw error;
Or use the newer cause option (Node 16.9+):
-        throw new Error(`Failed to extract images from ${url}: ${error.message}`);
+        throw new Error(`Failed to extract images from ${url}`, { cause: error });

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between df2efee and caaf131.

📒 Files selected for processing (1)

mediaParser.js (1 hunks)

🔇 Additional comments (5)

mediaParser.js (5)

1-4: LGTM!

Imports are correct. Cheerio is properly imported as a namespace (required for calling cheerio.load()), and axios is at the latest stable version.

Based on learnings.

6-19: LGTM!

JSDoc is comprehensive and accurately describes the function's behavior, parameters, return type, and error conditions.

20-24: LGTM!

Input validation correctly checks for non-empty string and throws the appropriate TypeError.

26-42: LGTM with a consideration.

The axios configuration is appropriate for web scraping with reasonable limits and a custom User-Agent. The Content-Type validation ensures HTML is received.

One consideration: The 10-second timeout may be short for slow-loading pages. If this becomes an issue in production, consider making it configurable or increasing it.

44-50: LGTM!

Cheerio is loaded correctly with appropriate options. The decodeEntities: true option ensures HTML entities are decoded properly, which is important for alt text extraction.

Based on learnings.

- Export extractImages function for module usage - Preserve original error stack trace using 'cause' - Validate Content-Type before parsing HTML - Filter out data: URLs to avoid bloating results - Handle <picture> <source> tags for responsive images - Deduplicate images - Maintain Winston logging for errors and warnings - Human-friendly JSDoc comments and ES6 syntax

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (2)

mediaParser.js (2)

27-37: Consider adding explicit status code validation.

The axios configuration is solid with appropriate timeout, size limits, and User-Agent. However, axios will throw for non-2xx status codes by default, which may not provide the clearest error messages.

Consider adding explicit status validation for better error handling:

     const response = await axios.get(url, {
         timeout: 10000,
         maxContentLength: 10 * 1024 * 1024,
         maxBodyLength: 10 * 1024 * 1024,
         headers: {
             'User-Agent': 'Mozilla/5.0 (compatible; MaxunBot/1.0; +https://maxun.dev)'
         },
-        maxRedirects: 5
+        maxRedirects: 5,
+        validateStatus: (status) => status >= 200 && status < 300
     });

Alternatively, you could catch and re-throw with a clearer message for HTTP errors:

try {
    const response = await axios.get(url, { /* config */ });
} catch (axiosError) {
    if (axios.isAxiosError(axiosError) && axiosError.response) {
        throw new Error(`HTTP ${axiosError.response.status}: Failed to fetch ${url}`);
    }
    throw axiosError;
}

92-112: Consider linking source URLs to their picture's img alt text.

The extraction logic for <picture> elements works correctly, but source URLs are assigned an empty altText. In HTML, the <img> inside <picture> contains the alt text that applies to all sources.

This is acceptable given the complexity of associating sources with their corresponding img, but if you want more complete alt text coverage, consider this enhancement:

// Extract <picture> elements with their sources and img
$('picture').each((i, pictureElement) => {
    const $picture = $(pictureElement);
    const alt = $picture.find('img').attr('alt') || '';
    
    $picture.find('source').each((j, sourceElement) => {
        const srcset = $(sourceElement).attr('srcset');
        if (srcset) {
            const srcsetUrls = srcset.split(',')
                .map(s => s.trim().split(/\s+/)[0])
                .filter(Boolean);

            for (const srcsetUrl of srcsetUrls) {
                try {
                    const absoluteUrl = new URL(srcsetUrl, url).href;
                    if (!seen.has(absoluteUrl) && !absoluteUrl.startsWith('data:')) {
                        seen.add(absoluteUrl);
                        images.push({ url: absoluteUrl, altText: alt });
                    }
                } catch {
                    logger.warn(`Invalid srcset URL in <source>: ${srcsetUrl}`);
                }
            }
        }
    });
});

This would apply the picture's img alt text to all source URLs.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between caaf131 and d24496c.

📒 Files selected for processing (1)

mediaParser.js (1 hunks)

🔇 Additional comments (8)

mediaParser.js (8)

3-6: LGTM with minor note on URL import.

The imports are correctly structured. Note that in Node.js v10+, URL is available as a global, so the import from 'url' is valid but technically optional. Keeping it explicit is fine for clarity and compatibility.

8-21: Excellent documentation.

The JSDoc comment clearly describes the function's behavior, parameters, return type, and error conditions. This will help maintainers and users of this module.

22-25: LGTM!

Input validation correctly checks for non-empty string and throws the appropriate TypeError for invalid input.

39-43: LGTM!

Content-type validation correctly ensures HTML content before parsing. The default to empty string handles missing headers gracefully, and .includes() properly matches content-type with charset parameters.

45-49: LGTM!

Cheerio is loaded with appropriate options. Decoding entities is necessary for proper HTML parsing, and preserving whitespace ensures alt text accuracy.

54-90: LGTM! Robust image extraction logic.

The extraction logic correctly:

Handles both src and srcset attributes

Converts relative URLs to absolute using the URL constructor

Deduplicates images using a Set

Filters out data URLs

Preserves alt text across srcset variants

Logs invalid URLs without crashing

The srcset parsing correctly splits by comma and extracts the URL portion (before width/pixel descriptors).

114-119: LGTM! Excellent error handling.

The error handling correctly wraps the original error with context (the URL) while preserving the stack trace using the cause property. This is a modern best practice for error chaining.

122-123: LGTM! Export issue resolved.

The function is now properly exported using ES6 named export syntax, addressing the critical issue from the previous review. The module is now usable by other modules.

RohitR311

Thank you for the contribution. However, this PR addresses a different problem than what was requested in issue #164.  

The feature needs to enable extraction of data from web hosted media files like images, PDFs or other document types.
The feature must be integrated with the actual robot recording flow (while training the robot) and should not be a standalone script. It should be inside the recorder.

For example, if there is an image on a webpage with any written content, the user must be able to extract that content from the image.

Aman-Raj-bat · 2025-10-07T17:28:52Z

Thanks for the feedback! I understand that this PR doesn’t address the requested feature in issue #164 . I’ll work on updating it to extract data from web-hosted media files and integrate it directly into the robot recording flow. I’ll submit an updated PR soon.

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (1)

package.json (1)

31-85: Remove unused tesseract.js dependency.

tesseract.js is still not imported anywhere—the browser scraper loads it from a CDN instead—so keeping it in dependencies only bloats install surface and supply-chain risk. Drop it until we actually bundle it.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d24496c and 6624bba.

📒 Files selected for processing (5)

maxun-core/src/browserSide/scraper.js (1 hunks)
package.json (3 hunks)
server/src/browser-management/inputHandlers.ts (2 hunks)
server/src/workflow-management/classes/Generator.ts (1 hunks)
src/components/recorder/DOMBrowserRenderer.tsx (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

server/src/browser-management/inputHandlers.ts (2)

server/src/workflow-management/classes/Generator.ts (1)

handleMediaExtracted (156-180)

server/src/browser-management/classes/RemoteBrowser.ts (1)

RemoteBrowser (128-2029)

src/components/recorder/DOMBrowserRenderer.tsx

Feat: Add image extraction for webpages (media parsing)

3a102be

coderabbitai bot reviewed Oct 3, 2025

View reviewed changes

Refactor: Improve image extraction with ES6, validation, srcset, and …

df2efee

…better error handling

coderabbitai bot reviewed Oct 3, 2025

View reviewed changes

amhsirak added the Status: In Review This PR/issue is being reviewed label Oct 6, 2025

amhsirak requested a review from RohitR311 October 6, 2025 12:05

amhsirak removed the Status: In Review This PR/issue is being reviewed label Oct 6, 2025

RohitR311 reviewed Oct 6, 2025

View reviewed changes

feat(issue getmaxun#164): integrate media extraction into recorder flow

6624bba

coderabbitai bot reviewed Oct 10, 2025

View reviewed changes

src/components/recorder/DOMBrowserRenderer.tsx Outdated Show resolved Hide resolved

Aman-Raj-bat added 2 commits October 11, 2025 00:02

feat(issue getmaxun#164): integrate media extraction into recorder flow

3a68e60

more fixes for the image parser

9768844

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feat: Add image extraction for webpages (media parsing) #818

Feat: Add image extraction for webpages (media parsing) #818

Uh oh!

Aman-Raj-bat commented Oct 3, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Oct 3, 2025 •

edited

Loading

Uh oh!

Aman-Raj-bat commented Oct 3, 2025

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

coderabbitai bot Oct 3, 2025

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot left a comment

Uh oh!

RohitR311 left a comment

Uh oh!

Aman-Raj-bat commented Oct 7, 2025

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Feat: Add image extraction for webpages (media parsing) #818

Are you sure you want to change the base?

Feat: Add image extraction for webpages (media parsing) #818

Uh oh!

Conversation

Aman-Raj-bat commented Oct 3, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Media Parsing: Image Extraction

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Oct 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

Pre-merge checks and finishing touches

Uh oh!

Aman-Raj-bat commented Oct 3, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Oct 3, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

RohitR311 left a comment

Choose a reason for hiding this comment

Uh oh!

Aman-Raj-bat commented Oct 7, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Aman-Raj-bat commented Oct 3, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Oct 3, 2025 •

edited

Loading