-
Notifications
You must be signed in to change notification settings - Fork 291
Fix regression introduced in 3592fc8 where slicing the stream to the length breaks decoding #1192
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…length breaks decoding
|
Thanks, I don't have time to review much anymore but I think it makes sense not to trust the declared |
|
Thanks! I will look into this |
|
Do you need to drop the length check on both levels? |
|
@rhuijben that makes sense yes, ill push a pr for tgat |
I see you pushed a fix for the flate filter. What I asked was if you see the problems at both levels? I'm trying to find where it is safe to do the trimming. Just assuming that whitespace is part of the stream is not always the right thing to do... but just trimming isn't either, or the tests wouldn't have failed. |
|
@rhuijben my bad - I misunderstood your earlier comment.
I only saw issue at top level (in
Yes, I believe this is safer for now, in line with the above.
Sadly, I don't think it's going to be easy. Going back to the Pdf 2.0 specifications:
From this, my understanding is that the check should be be done in the One possible approach would be to try decoding the stream first without any trimming. If this decoding fails, we trim the stream and try decoding again... Another approach could be to check for whitespaces. More than open to discuss, and my understanding of the specifications might be wrong. As a side note, this is a prime example that pdf readers do not fully enforce the specs and are more lenient. This is also why PdfPig has a lenient mode. Below are the documents that were failing before this PR: |
|
I asked Perplexity AI about our discussion: in the context of pdf specification, and using feedback you can find on the web, should the /length parameter of a stream be relied on?The PDF specification requires that every stream dictionary include the /Length entry, which specifies the number of bytes for the stream's data (typically the encoded data if filters are applied). However, in practice and based on community and expert feedback, you should be cautious about relying unconditionally on /Length.12 Specification Requirements
Practical Reliability Issues
Best Practices
In summary: /Length should be used according to spec, but practically, do not blindly trust it without verifying the correct stream boundaries, especially when working with PDFs from varied or untrusted origins.125 ⁂
Footnotes
|

Unfortunately the regression was not caught with the tests in the project.
I caught that when running tests in the https://github.com/BobLd/PdfPig.Rendering.Skia project.
3 tests are now failing over there:
I'm lacking time to add the tests to this projects, but wanted to at least fix that for now.
cc @rhuijben