Skip to content

[Draft] Memory fixes for large playwright screenshots with the new full page screenshot mechanism#3089

Closed
xconverge wants to merge 6 commits into
dgtlmoon:masterfrom
xconverge:memoryFixes
Closed

[Draft] Memory fixes for large playwright screenshots with the new full page screenshot mechanism#3089
xconverge wants to merge 6 commits into
dgtlmoon:masterfrom
xconverge:memoryFixes

Conversation

@xconverge
Copy link
Copy Markdown
Contributor

@xconverge xconverge commented Apr 5, 2025

More investigation from #3035

I have bisected a memory issue that started in this changeset 0.49.3...0.49.4

I have no issues with 0.49.3, but with 0.49.4 my RAM usage spikes and keeps growing to >1GB in a few hours

I am able to reproduce this with a single watch (screenshot warning at the top says it is 77000px tall) that is using playwright and uses the new screenshot mechanism introduced in #2999

If I adjust playwright.py to always do

self.screenshot = self.page.screenshot(type='jpeg', full_page=True, quality=int(os.getenv("SCREENSHOT_QUALITY", 30)))

and never do the new

self.screenshot = capture_stitched_together_full_page(self.page)

the memory seems to stay stable and not exhibit any issues

I haven't been able to fully get this under control yet, but is it an option to potentially revert/disable capture_stitched_together_full_page for now and go back to the old way?

@dgtlmoon
Copy link
Copy Markdown
Owner

dgtlmoon commented Apr 5, 2025

now and go back to the old way?

i dont know, there was many pages that werent loading at all because of the long screenshots causing playwright to crash :/

@xconverge
Copy link
Copy Markdown
Contributor Author

xconverge commented Apr 5, 2025

Yea I figured you did it for a reason, just wanted to check! Is there an environment var or way for me to just not use screenshots if I don't need it?

@dgtlmoon
Copy link
Copy Markdown
Owner

dgtlmoon commented Apr 5, 2025

I think the answer here is

  • Certainly continue with finding out why PIL is not releasing that memory, it definitely SHOULD
  • set a HARD limit in pixels which is on the docker-compose.yml default env vars of say 30,000pixels
  • also make sure the xpath scraper JS code stops at 30000 pixels or whatever it is
  • add some red warning text "IMAGE WAS CROPPED TO 30,000 PIXELS!"

@xconverge
Copy link
Copy Markdown
Contributor Author

xconverge commented Apr 5, 2025

I think the answer here is

* Certainly continue with finding out why PIL is not releasing that memory, it definitely SHOULD

* set a HARD limit in pixels which is on the docker-compose.yml default env vars of say 30,000pixels

* also make sure the xpath scraper JS code stops at 30000 pixels or whatever it is

* add some red warning text "IMAGE WAS CROPPED TO 30,000 PIXELS!"

I think this PR addresses almost all of the PIL memory stuff and it looks good to me, I have been running it for some time and it is better I believe with the same functionality.

The pixel limit is already 32000, and you already added the warning text that it was cropped to 32000

I think this PR can be merged as-is, but it's up to you

I don't know how the xpath scraper stuff works so that should probably be a separate PR

Moving forward, I will probably personally be finding a different source for the content of my problematic watch, now that I understand the problem, that is not long and doesn't require playwright to avoid this altogether, but I wanted to at least open something you could look at.

Also, not FULLY related but have you considered .webp instead of (or in addition to) .jpeg support? I don't think it is necessary for this (or switching to something like pyvips), but both could be helpful tools in the toolbox moving forward, just a thought

@xconverge xconverge changed the title [Draft] Memory fixes for large playwright screenshots with the new full page screenshot mechanism Memory fixes for large playwright screenshots with the new full page screenshot mechanism Apr 5, 2025
@xconverge xconverge changed the title Memory fixes for large playwright screenshots with the new full page screenshot mechanism [Draft] Memory fixes for large playwright screenshots with the new full page screenshot mechanism Apr 6, 2025
@xconverge xconverge closed this Apr 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants