[Draft] Memory fixes for large playwright screenshots with the new full page screenshot mechanism#3089
[Draft] Memory fixes for large playwright screenshots with the new full page screenshot mechanism#3089xconverge wants to merge 6 commits into
Conversation
i dont know, there was many pages that werent loading at all because of the long screenshots causing playwright to crash :/ |
|
Yea I figured you did it for a reason, just wanted to check! Is there an environment var or way for me to just not use screenshots if I don't need it? |
|
I think the answer here is
|
I think this PR addresses almost all of the PIL memory stuff and it looks good to me, I have been running it for some time and it is better I believe with the same functionality. The pixel limit is already 32000, and you already added the warning text that it was cropped to 32000 I think this PR can be merged as-is, but it's up to you I don't know how the xpath scraper stuff works so that should probably be a separate PR Moving forward, I will probably personally be finding a different source for the content of my problematic watch, now that I understand the problem, that is not long and doesn't require playwright to avoid this altogether, but I wanted to at least open something you could look at. Also, not FULLY related but have you considered .webp instead of (or in addition to) .jpeg support? I don't think it is necessary for this (or switching to something like pyvips), but both could be helpful tools in the toolbox moving forward, just a thought |
More investigation from #3035
I have bisected a memory issue that started in this changeset 0.49.3...0.49.4
I have no issues with 0.49.3, but with 0.49.4 my RAM usage spikes and keeps growing to >1GB in a few hours
I am able to reproduce this with a single watch (screenshot warning at the top says it is 77000px tall) that is using playwright and uses the new screenshot mechanism introduced in #2999
If I adjust playwright.py to always do
self.screenshot = self.page.screenshot(type='jpeg', full_page=True, quality=int(os.getenv("SCREENSHOT_QUALITY", 30)))and never do the new
self.screenshot = capture_stitched_together_full_page(self.page)the memory seems to stay stable and not exhibit any issues
I haven't been able to fully get this under control yet, but is it an option to potentially revert/disable capture_stitched_together_full_page for now and go back to the old way?