Memory fixes for large playwright screenshots#3092
Conversation
|
|
||
| MAX_TOTAL_HEIGHT = SCREENSHOT_SIZE_STITCH_THRESHOLD*4 # Maximum total height for the final image (When in stitch mode) | ||
| MAX_CHUNK_HEIGHT = 4000 # Height per screenshot chunk | ||
| MAX_TOTAL_HEIGHT = 16000 # Maximum total height for the final image (When in stitch mode), not worth going over this for now |
There was a problem hiding this comment.
I'm sorry, but these kinds of comments never get re-visited and just confuse or worry the next person who works on this.
| MAX_TOTAL_HEIGHT = 16000 # Maximum total height for the final image (When in stitch mode), not worth going over this for now | |
| # Maximum total height for the final image (When in stitch mode). | |
| # We limit this to 16000px due to the huge amount of RAM that was being used | |
| # Example: 16000 × 1400 × 3 = 67,200,000 bytes ≈ 64.1 MB (not including buffers in PIL etc) | |
| MAX_TOTAL_HEIGHT = int(os.getenv("SCREENSHOT_MAX_HEIGHT", 16000)) |
There was a problem hiding this comment.
Please add SCREENSHOT_MAX_HEIGHT=16000 (commented out with a note about memory) in docker-compose.yml
There was a problem hiding this comment.
I think that should be a separate PR because it is a larger change than you are suggesting
If I set SCREENSHOT_MAX_HEIGHT in my env to 4000, I would expect the screenshot in all cases (not just this function) to be 4000. Not just stitched images which is an implementation detail.
If I made the env var SCREENSHOT_STITCHED_MAX_HEIGHT that would be ok but then is kind of unnecessarily specific.
If you are ok with me just doing your suggested change in this one file of
MAX_TOTAL_HEIGHT = int(os.getenv("SCREENSHOT_MAX_HEIGHT", 16000))
I can also just do it
There was a problem hiding this comment.
I think that should be a separate PR because it is a larger change than you are suggesting
yeah but it just creates work for someone else, please merge in that suggestions
i cant merge this is with mysterious open comments, it has to be something understandable to someone else
There was a problem hiding this comment.
There are no open comments I don't think. This is a bug fix for a regression.
|
|
||
| # Capture only the visible area using clip | ||
| with io.BytesIO( | ||
| page.screenshot( |
There was a problem hiding this comment.
And you are 100% sure that this solved the memory problem?
There was a problem hiding this comment.
This PR in it's entirety has solved the issue on my system/installation, yes.
|
Roughly it looks OK to me, so using the context stuff solved the memory problems? |
Yes and not storing a list of all chunks. All 4 bullet points in the PR description help. |
ok so that was the main problem? |
but if this was the main issue, surely when the function returns it would clean off that memory because its not in scope anymore thats the part i dont get or maybe the python PIL kept the reference? or cache in there? |
|
I dont know, it is just how it is with pillow I believe https://pillow.readthedocs.io/en/stable/reference/open_files.html "Users of the library should use a context manager or call Image.Image.close() on any image opened with a filename or Path object to ensure that the underlying file is closed." Again, the solution is all changes of the PR, not just one line or concept. The list of images instead of processing as we go would certainly work, as long as the image gets closed in the end (it wasn't). The max memory used for a screenshot would be significantly higher in that scenario though (and as long as we iterated over them and closed them, would be freed). |
|
OK amazing, thanks for your work! |
Are you ok with this bullet point? "Use stitching screenshot method at all times now (instead of just when > 8000px) for consistency, repeatability, and to easily use the above environment var for max height. This is less efficient than self.page.screenshot(type='jpeg', full_page=True, quality=40) which can be done natively in playwright, but ends up being similar if we have to do the full_page screenshot and THEN crop it using SCREENSHOT_MAX_HEIGHT after anyways" I think it is the right call but don't feel strongly and could easily be convinced to revert that change. Either way, I think having a single function "get screenshot" is going to help, if you wanted to do an optimization within it where we check to see if we can just do a Edit: I went ahead and added the optimization. It is free, closer to what was there before, and still supports us allowing a user to limit the max height of a screenshot without needing to spread logic around in different places. |
…od when we don't need to clip the image and it is below 8000px
fine to me, i dont think its worth discussing more, lets make the change and move on :) |
so what now, its ready to merge? |
|
here is a really long page that could be a good test https://adguard.com/en/versions/windows/nightly.html |
|
It's still not clear which type of memory you are reporting yet. heap size vs resident size in memray Covers: Memory requested from the OS by the program. Includes: Active allocations (in use) Inactive/freed memory that hasn’t yet been returned to the OS ✅ Useful for understanding how much memory your Python objects are using overall. 🟨 Resident Size OS-Level View: This is what the operating system sees as "used" memory. Includes: All loaded code, stack, and heap memory actually in RAM May include memory from other sources like native libraries, threads, etc. ✅ Useful when you're debugging actual system memory pressure or trying to understand real-world memory impact. ANALOGY - Heap Size | Size of your bookshelf If playwright "expanded its heap size" then ofcourse it will stay at that size |
|
So memory usage still grows even when the screenshot stuff is commented out |
|
Ok, comment out these lines, HUGE improvement in memory handling changedetection.io/changedetectionio/content_fetchers/playwright.py Lines 191 to 193 in 456c6e3 |
|
I can 100% prove this is something todo playwright, add the |
|
If you want to run your own tests, compare 0.49.4 and 0.49.3. This is a good change and not worse. I have been running it for days. I never claimed to fix every memory problem the application has for big pages, just the bullet points above. Close the PR if you want, but I am done working on it |
|
@xconverge claud also recommends this.. I'll try it |
|
I think those could help and not hurt I also think pyvips would be better than PIL but I didn't want to complicate your project and add a new dependency |



This is a cleaner implementation than #3089 for #3035 I think
Key changes: