Skip to content

Refactor image saving with forked process to reduce memory usage, improvements to xpath scraper handling#3099

Merged
dgtlmoon merged 15 commits into
masterfrom
playwright-fixes
Apr 11, 2025
Merged

Refactor image saving with forked process to reduce memory usage, improvements to xpath scraper handling#3099
dgtlmoon merged 15 commits into
masterfrom
playwright-fixes

Conversation

@dgtlmoon
Copy link
Copy Markdown
Owner

@dgtlmoon dgtlmoon commented Apr 10, 2025

Looks like the only guaranteed way to escape PIL's memory issues is to sub-process it, results were pretty solid after that

Final After 7bb7299 subprocessing the image snapshot chunk builder

Max 218Mb after 300 seconds

image


At/after 89e4759 after refactoring the xpath_data save from playwright

Max 1Gigabyte after 300 seconds

image


Current master @ cdfb3f2

Max 870Mb after 300 seconds

image

@dgtlmoon dgtlmoon changed the title Use a single chunk of return for the xpath data to save playwright objects returning a huge amount Use a single chunk of return for the xpath data to save playwright objects returning a huge amount, refactor image saving with forked process Apr 10, 2025
@dgtlmoon dgtlmoon changed the title Use a single chunk of return for the xpath data to save playwright objects returning a huge amount, refactor image saving with forked process Refactor image saving with forked process to reduce memory, improvements to xpath scraper handling Apr 10, 2025
@dgtlmoon dgtlmoon changed the title Refactor image saving with forked process to reduce memory, improvements to xpath scraper handling Refactor image saving with forked process to reduce memory usage, improvements to xpath scraper handling Apr 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant