revert flate decode handling to more lenient processing#1254
Conversation
the change to use zlib/a adler checksum verification flow meant that invalid flate streams would not be decoded correctly. this caused issues for files that included invalid/missing checksums. this reverts the processing to the old approach for files like #1235
|
@EliotJones fyi tests are failing |
|
So I don't see a way to pass I ended up having a larger refactor of some of the stuff we added for the file fuzzing checks since some of it seemed a bit too much like a workaround. In summary I removed the check for object number 30 with offset -30 by using the new XrefLocation type. I also changed a bit how cycle detection happens and made it much more strict (max 7) while also using a |
|
@BobLd this should be good to go now. I've added a test for the true cause of the int overflow issue but I haven't fixed it yet. We would need to rework the PDF page content stream parser to recover from corruption better which is a bigger undertaking so I'm fine to let the current exception be thrown for now. |
|
@EliotJones thanks a lot for that! Let's merge as it is and for performance impact, |
|
Thanks for the heads-up. For my usecases I would prefer to see just that the PDF is corrupt, instead of trying to expand untrusted most likely invalid data... But I can see cases where others like to recover every last bit of data. |
Updated [PdfPig](https://github.com/UglyToad/PdfPig) from 0.1.13 to 0.1.14. <details> <summary>Release notes</summary> _Sourced from [PdfPig's releases](https://github.com/UglyToad/PdfPig/releases)._ ## 0.1.14 ## Auto generated release notes * Increment version to 0.1.14 by @BobLd in UglyToad/PdfPig#1231 * Introduce StackDepthGuard class to check for stack depth in CoreTokenScanner and fix #1217 by @BobLd in UglyToad/PdfPig#1220 * Add Links to Pdf Generation by @ochsnerd in UglyToad/PdfPig#1232 * Make LinkAnnotation internal to fix unit tests by @BobLd in UglyToad/PdfPig#1239 * Only throw if ArrayToken length is lesss than 4 in ToRectangle() and fix #1238 by @BobLd in UglyToad/PdfPig#1240 * Handle empty encoding in Type1FontSimple and fix #1248 by @BobLd in UglyToad/PdfPig#1249 * Make extended graphics states stacked too by @PsykerUdot in UglyToad/PdfPig#1246 * both Tj and TJ operators should increment text sequence #1241 by @EliotJones in UglyToad/PdfPig#1251 * Improve HasFormXObjectCircularReference and fix #1250 by @BobLd in UglyToad/PdfPig#1252 * replace release flow single job with pr process by @EliotJones in UglyToad/PdfPig#1253 * Add UglyToad.PdfPig.Benchmarks and misc performance improvements by @BobLd in UglyToad/PdfPig#1255 * Make LinkAnnotation public by @BobLd in UglyToad/PdfPig#1256 * revert flate decode handling to more lenient processing by @EliotJones in UglyToad/PdfPig#1254 * Fix Benchmarks solution and add BruteForceBenchmarks by @BobLd in UglyToad/PdfPig#1260 * Additional digital corpora testing by @EliotJones in UglyToad/PdfPig#1261 * Introduce IBlock and ILettersBlock interfaces (Round 2) by @davebrokit in UglyToad/PdfPig#1263 * Improve SystemFontFinder performance and add benchmarks by @BobLd in UglyToad/PdfPig#1264 * For shading types 4 to 7, add Data property containing descriptive data characterizing the shading's gradient fill by @BobLd in UglyToad/PdfPig#1267 * creating branch in the previous step conflicts by @EliotJones in UglyToad/PdfPig#1269 * change the release flow to work on tags by @EliotJones in UglyToad/PdfPig#1271 ## New Contributors * @ochsnerd made their first contribution in UglyToad/PdfPig#1232 **Full Changelog**: UglyToad/PdfPig@0.1.13...v0.1.14 Commits viewable in [compare view](UglyToad/PdfPig@0.1.13...v0.1.14). </details> [](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
As discussed here #1186 (comment)
the change to use zlib/a adler checksum verification flow meant that invalid flate streams would not be decoded correctly. this caused issues for files that included invalid/missing checksums. this reverts the processing to the old approach for files like #1235
I'd like to check with @rhuijben before merging though to see if I'm missing the main reasons for the initial change, but in the meantime we'll run the tests on this PR to check what it would break.