Skip to content

revert flate decode handling to more lenient processing#1254

Merged
BobLd merged 9 commits intomasterfrom
flate-filter-tolerance
Feb 22, 2026
Merged

revert flate decode handling to more lenient processing#1254
BobLd merged 9 commits intomasterfrom
flate-filter-tolerance

Conversation

@EliotJones
Copy link
Member

As discussed here #1186 (comment)

the change to use zlib/a adler checksum verification flow meant that invalid flate streams would not be decoded correctly. this caused issues for files that included invalid/missing checksums. this reverts the processing to the old approach for files like #1235

I'd like to check with @rhuijben before merging though to see if I'm missing the main reasons for the initial change, but in the meantime we'll run the tests on this PR to check what it would break.

the change to use zlib/a adler checksum verification flow meant that
invalid flate streams would not be decoded correctly. this caused
issues for files that included invalid/missing checksums. this reverts
the processing to the old approach for files like #1235
@BobLd
Copy link
Collaborator

BobLd commented Feb 15, 2026

@EliotJones fyi tests are failing

@EliotJones
Copy link
Member Author

EliotJones commented Feb 21, 2026

So I don't see a way to pass Issue953_IntOverflow currently and I somehow broken the build on Mac.

I ended up having a larger refactor of some of the stuff we added for the file fuzzing checks since some of it seemed a bit too much like a workaround.

In summary long is changed to XrefLocation for all object offsets read from the Xref table/stream or brute forced. This is so that object stream values can be distinguished from file values properly without the use of negative sentinel values which would be vulnerable to other malformed files. I'm interested to see how badly this blows up performance.

I removed the check for object number 30 with offset -30 by using the new XrefLocation type. I also changed a bit how cycle detection happens and made it much more strict (max 7) while also using a Span<int> to detect cycles even earlier.

@EliotJones
Copy link
Member Author

@BobLd this should be good to go now. I've added a test for the true cause of the int overflow issue but I haven't fixed it yet. We would need to rework the PDF page content stream parser to recover from corruption better which is a bigger undertaking so I'm fine to let the current exception be thrown for now.

@BobLd
Copy link
Collaborator

BobLd commented Feb 22, 2026

@EliotJones thanks a lot for that! Let's merge as it is and for performance impact, [UglyToad.PdfPig.Benchmarks](https://github.com/UglyToad/PdfPig/tree/master/tools/UglyToad.PdfPig.Benchmarks) could be used

@BobLd BobLd merged commit 9c0d689 into master Feb 22, 2026
7 checks passed
@BobLd BobLd deleted the flate-filter-tolerance branch February 22, 2026 15:49
@rhuijben
Copy link
Contributor

Thanks for the heads-up. For my usecases I would prefer to see just that the PDF is corrupt, instead of trying to expand untrusted most likely invalid data... But I can see cases where others like to recover every last bit of data.

TheCodeTraveler pushed a commit to TheCodeTraveler/MAUIChatGPTClone that referenced this pull request Mar 23, 2026
Updated [PdfPig](https://github.com/UglyToad/PdfPig) from 0.1.13 to
0.1.14.

<details>
<summary>Release notes</summary>

_Sourced from [PdfPig's
releases](https://github.com/UglyToad/PdfPig/releases)._

## 0.1.14

## Auto generated release notes
* Increment version to 0.1.14 by @​BobLd in
UglyToad/PdfPig#1231
* Introduce StackDepthGuard class to check for stack depth in
CoreTokenScanner and fix #​1217 by @​BobLd in
UglyToad/PdfPig#1220
* Add Links to Pdf Generation by @​ochsnerd in
UglyToad/PdfPig#1232
* Make LinkAnnotation internal to fix unit tests by @​BobLd in
UglyToad/PdfPig#1239
* Only throw if ArrayToken length is lesss than 4 in ToRectangle() and
fix #​1238 by @​BobLd in UglyToad/PdfPig#1240
* Handle empty encoding in Type1FontSimple and fix #​1248 by @​BobLd in
UglyToad/PdfPig#1249
* Make extended graphics states stacked too by @​PsykerUdot in
UglyToad/PdfPig#1246
* both Tj and TJ operators should increment text sequence #​1241 by
@​EliotJones in UglyToad/PdfPig#1251
* Improve HasFormXObjectCircularReference and fix #​1250 by @​BobLd in
UglyToad/PdfPig#1252
* replace release flow single job with pr process by @​EliotJones in
UglyToad/PdfPig#1253
* Add UglyToad.PdfPig.Benchmarks and misc performance improvements by
@​BobLd in UglyToad/PdfPig#1255
* Make LinkAnnotation public by @​BobLd in
UglyToad/PdfPig#1256
* revert flate decode handling to more lenient processing by
@​EliotJones in UglyToad/PdfPig#1254
* Fix Benchmarks solution and add BruteForceBenchmarks by @​BobLd in
UglyToad/PdfPig#1260
* Additional digital corpora testing by @​EliotJones in
UglyToad/PdfPig#1261
* Introduce IBlock and ILettersBlock interfaces (Round 2) by
@​davebrokit in UglyToad/PdfPig#1263
* Improve SystemFontFinder performance and add benchmarks by @​BobLd in
UglyToad/PdfPig#1264
* For shading types 4 to 7, add Data property containing descriptive
data characterizing the shading's gradient fill by @​BobLd in
UglyToad/PdfPig#1267
* creating branch in the previous step conflicts by @​EliotJones in
UglyToad/PdfPig#1269
* change the release flow to work on tags by @​EliotJones in
UglyToad/PdfPig#1271

## New Contributors
* @​ochsnerd made their first contribution in
UglyToad/PdfPig#1232

**Full Changelog**:
UglyToad/PdfPig@0.1.13...v0.1.14

Commits viewable in [compare
view](UglyToad/PdfPig@0.1.13...v0.1.14).
</details>

[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=PdfPig&package-manager=nuget&previous-version=0.1.13&new-version=0.1.14)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants