-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improved performance and security for ContentStream_readInlineImage #331
Conversation
This can be optimized further by searching directly for |
@MartinThoma Since his pull request fixes issue #329, a possible denial-of-service security issue, it might be worth looking at rather sooner than later. |
Thank you for pointing that out 👍 The issue exists for several years now. I prefer preventing regressions instead of fixing existing issues for the moment. To ensure that, I'm increasing the test coverage. I will check if the code you've introduced is covered / how to cover it. |
I think you can create a test file with any image of your choice using ReportLab: from reportlab.pdfgen import canvas
c = canvas.Canvas("test.pdf")
c.drawInlineImage("test.png", 100, 100, 100, 100)
c.drawString(200, 100, "Test")
c.showPage()
c.save() I think that's what I did to create the intentionally broken PDF in issue #329: Create one with ReportLab and then edit it manually so that it triggers the bug. But since it's been 5 years since I analyzed the problem I've forgotten all details about it (and had stopped using PyPDF2 because it was unmaintained). |
@MartinThoma Was this pull request closed automatically because the target branch was deleted? |
Huh. Weird. I can only say that I didn't close it on purpose. Also, according to github the renaming should automatically change the target of all PRs. And I still see many open PRs 🤔 |
I also cannot click on re-open |
There were 72 PRs before, now there are only 67 PRs. Seems like github accidentially closed 5. Is it possible for you to execute this locally:
and re-create the PR? I'm very sorry about the inconvenience :-/ |
I think the problem is that I deleted my fork of this repository months ago because I lost hope that the pull request would ever be applied. GitHub probably closed all pull requests for the At last the actual changes didn't get lost, so I could reapply the patch to a new fork and re-create it as PR #740. |
Thank you so much for doing it 🙏 I would have done it myself once I found the time. Your PR will for sure get merged this year; I just cannot commit to a specific time at the moment. Too many open topics (both in PyPDF2, but also in my private live / work) |
Credits to Sebastian Krause for creating the PDF: #331 (comment) Co-authored-by: Sebastian Krause <[email protected]>
Credits to Sebastian Krause for creating the PDF: #331 (comment) Co-authored-by: Sebastian Krause <[email protected]>
This change has been tested with Python 2.6, 2.7 and 3.5.
It fixes #329 by raising an exception when the stream ends and we haven't the end token for the inline image.
It also fixes #330 by using a more efficient parsing algorithm. For large inline images this change speeds up this method by many orders of magnitude:
find()
method to check for theE
the token. Only when the token is found it falls back to the normal algorithm that detects the end of the inline image.data
it usesBytesIO
to collect the output which support much faster appends.