revert flate decode handling to more lenient processing by EliotJones · Pull Request #1254 · UglyToad/PdfPig

EliotJones · 2026-02-15T19:00:47Z

the change to use zlib/a adler checksum verification flow meant that invalid flate streams would not be decoded correctly. this caused issues for files that included invalid/missing checksums. this reverts the processing to the old approach for files like #1235

I'd like to check with @rhuijben before merging though to see if I'm missing the main reasons for the initial change, but in the meantime we'll run the tests on this PR to check what it would break.

the change to use zlib/a adler checksum verification flow meant that invalid flate streams would not be decoded correctly. this caused issues for files that included invalid/missing checksums. this reverts the processing to the old approach for files like #1235

BobLd · 2026-02-15T23:42:41Z

@EliotJones fyi tests are failing

EliotJones · 2026-02-21T20:11:18Z

So I don't see a way to pass Issue953_IntOverflow currently and I somehow broken the build on Mac.

I ended up having a larger refactor of some of the stuff we added for the file fuzzing checks since some of it seemed a bit too much like a workaround.

In summary long is changed to XrefLocation for all object offsets read from the Xref table/stream or brute forced. This is so that object stream values can be distinguished from file values properly without the use of negative sentinel values which would be vulnerable to other malformed files. I'm interested to see how badly this blows up performance.

I removed the check for object number 30 with offset -30 by using the new XrefLocation type. I also changed a bit how cycle detection happens and made it much more strict (max 7) while also using a Span<int> to detect cycles even earlier.

EliotJones · 2026-02-22T14:31:36Z

@BobLd this should be good to go now. I've added a test for the true cause of the int overflow issue but I haven't fixed it yet. We would need to rework the PDF page content stream parser to recover from corruption better which is a bigger undertaking so I'm fine to let the current exception be thrown for now.

BobLd · 2026-02-22T15:49:11Z

@EliotJones thanks a lot for that! Let's merge as it is and for performance impact, [UglyToad.PdfPig.Benchmarks](https://github.com/UglyToad/PdfPig/tree/master/tools/UglyToad.PdfPig.Benchmarks) could be used

rhuijben · 2026-02-22T18:41:59Z

Thanks for the heads-up. For my usecases I would prefer to see just that the PDF is corrupt, instead of trying to expand untrusted most likely invalid data... But I can see cases where others like to recover every last bit of data.

Updated [PdfPig](https://github.com/UglyToad/PdfPig) from 0.1.13 to 0.1.14. <details> <summary>Release notes</summary> _Sourced from [PdfPig's releases](https://github.com/UglyToad/PdfPig/releases)._ ## 0.1.14 ## Auto generated release notes * Increment version to 0.1.14 by @BobLd in UglyToad/PdfPig#1231 * Introduce StackDepthGuard class to check for stack depth in CoreTokenScanner and fix #1217 by @BobLd in UglyToad/PdfPig#1220 * Add Links to Pdf Generation by @ochsnerd in UglyToad/PdfPig#1232 * Make LinkAnnotation internal to fix unit tests by @BobLd in UglyToad/PdfPig#1239 * Only throw if ArrayToken length is lesss than 4 in ToRectangle() and fix #1238 by @BobLd in UglyToad/PdfPig#1240 * Handle empty encoding in Type1FontSimple and fix #1248 by @BobLd in UglyToad/PdfPig#1249 * Make extended graphics states stacked too by @PsykerUdot in UglyToad/PdfPig#1246 * both Tj and TJ operators should increment text sequence #1241 by @EliotJones in UglyToad/PdfPig#1251 * Improve HasFormXObjectCircularReference and fix #1250 by @BobLd in UglyToad/PdfPig#1252 * replace release flow single job with pr process by @EliotJones in UglyToad/PdfPig#1253 * Add UglyToad.PdfPig.Benchmarks and misc performance improvements by @BobLd in UglyToad/PdfPig#1255 * Make LinkAnnotation public by @BobLd in UglyToad/PdfPig#1256 * revert flate decode handling to more lenient processing by @EliotJones in UglyToad/PdfPig#1254 * Fix Benchmarks solution and add BruteForceBenchmarks by @BobLd in UglyToad/PdfPig#1260 * Additional digital corpora testing by @EliotJones in UglyToad/PdfPig#1261 * Introduce IBlock and ILettersBlock interfaces (Round 2) by @davebrokit in UglyToad/PdfPig#1263 * Improve SystemFontFinder performance and add benchmarks by @BobLd in UglyToad/PdfPig#1264 * For shading types 4 to 7, add Data property containing descriptive data characterizing the shading's gradient fill by @BobLd in UglyToad/PdfPig#1267 * creating branch in the previous step conflicts by @EliotJones in UglyToad/PdfPig#1269 * change the release flow to work on tags by @EliotJones in UglyToad/PdfPig#1271 ## New Contributors * @ochsnerd made their first contribution in UglyToad/PdfPig#1232 **Full Changelog**: UglyToad/PdfPig@0.1.13...v0.1.14 Commits viewable in [compare view](UglyToad/PdfPig@0.1.13...v0.1.14). </details> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=PdfPig&package-manager=nuget&previous-version=0.1.13&new-version=0.1.14)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

EliotJones added 2 commits February 15, 2026 14:59

Merge remote-tracking branch 'origin/master' into flate-filter-tolerance

73d590d

BobLd mentioned this pull request Feb 19, 2026

Cannot decompress zip stream with missing checksum #1257

Closed

EliotJones added 3 commits February 21, 2026 15:47

fix object stream offset handling and track circular refs

542dd96

update tests

f515558

Merge remote-tracking branch 'origin/master' into flate-filter-tolerance

b05ec5c

EliotJones added 4 commits February 21, 2026 16:22

normalize line endings for mac runner

c9b83df

fixes for mac clownery

0a69dc6

add next pair to common crawl action

c82e06c

add a test case for the root cause of the int overflow

76cdae4

BobLd approved these changes Feb 22, 2026

View reviewed changes

BobLd merged commit 9c0d689 into master Feb 22, 2026
7 checks passed

BobLd deleted the flate-filter-tolerance branch February 22, 2026 15:49

This was referenced Mar 23, 2026

Bump PdfPig from 0.1.13 to 0.1.14 GuilhermeStracini/POC-dotnet-ExtractPdfContent#252

Merged

Bump PdfPig from 0.1.13 to 0.1.14 endjin/RLM#17

Open

This was referenced Mar 23, 2026

Bump the all group with 2 updates tryAGI/LangChain#601

Merged

Bump PdfPig from 0.1.13 to 0.1.14 guibranco/BancosBrasileiros-MergeTool#324

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

revert flate decode handling to more lenient processing#1254

revert flate decode handling to more lenient processing#1254
BobLd merged 9 commits intomasterfrom
flate-filter-tolerance

EliotJones commented Feb 15, 2026

Uh oh!

BobLd commented Feb 15, 2026

Uh oh!

EliotJones commented Feb 21, 2026 •

edited

Loading

Uh oh!

EliotJones commented Feb 22, 2026

Uh oh!

BobLd commented Feb 22, 2026 •

edited

Loading

Uh oh!

Uh oh!

rhuijben commented Feb 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

EliotJones commented Feb 15, 2026

Uh oh!

BobLd commented Feb 15, 2026

Uh oh!

EliotJones commented Feb 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

EliotJones commented Feb 22, 2026

Uh oh!

BobLd commented Feb 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

rhuijben commented Feb 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

EliotJones commented Feb 21, 2026 •

edited

Loading

BobLd commented Feb 22, 2026 •

edited

Loading