Fix regression introduced in 3592fc8 where slicing the stream to the length breaks decoding #1192

BobLd · 2025-10-23T17:57:30Z

Unfortunately the regression was not caught with the tests in the project.

I caught that when running tests in the https://github.com/BobLd/PdfPig.Rendering.Skia project.

3 tests are now failing over there:

UglyToad.PdfPig.Rendering.Skia.Tests.TestRendering.PdfPigSkiaTest(expectedImage: "AcroFormsBasicFields_1.png", pdfFile: "AcroFormsBasicFields.pdf", pageNumber: 1, scale: 2) Failed
UglyToad.PdfPig.Rendering.Skia.Tests.TestRendering.PdfPigSkiaTest(expectedImage: "FontMatrix-concat_1.png", pdfFile: "FontMatrix-concat.pdf", pageNumber: 1, scale: 2) Failed
UglyToad.PdfPig.Rendering.Skia.Tests.TestRendering.PdfPigSkiaTest(expectedImage: "caly-issues-58-2_1.png", pdfFile: "caly-issues-58-2.pdf", pageNumber: 1, scale: 2) Failed

I'm lacking time to add the tests to this projects, but wanted to at least fix that for now.

cc @rhuijben

…length breaks decoding

EliotJones · 2025-10-23T18:53:00Z

Thanks, I don't have time to review much anymore but I think it makes sense not to trust the declared /Length in most situations, people do things like putting the number of filters or length of the un-compressed data or some random number in there. It can improve the initial stream parse if it happens to be the right length but unfortunately you usually need to manually read the stream in every case.

rhuijben · 2025-10-29T10:14:54Z

Thanks! I will look into this

rhuijben · 2025-10-29T10:16:27Z

Do you need to drop the length check on both levels?

BobLd · 2025-10-29T18:54:18Z

@rhuijben that makes sense yes, ill push a pr for tgat

rhuijben · 2025-10-30T10:04:42Z

@rhuijben that makes sense yes, ill push a pr for tgat

I see you pushed a fix for the flate filter.

What I asked was if you see the problems at both levels?
You removed the length trimming in two places. And with that last commit in three.

I'm trying to find where it is safe to do the trimming. Just assuming that whitespace is part of the stream is not always the right thing to do... but just trimming isn't either, or the tests wouldn't have failed.

BobLd · 2025-10-30T19:33:44Z

@rhuijben my bad - I misunderstood your earlier comment.

What I asked was if you see the problems at both levels?

I only saw issue at top level (in PdfExtensions.Decode(...)), but not at FlateFilter level. That being said, and in line with Eliot's message, I think we should not trust /Length, mainly for the reasons laid out by Eliot. Another reason I remove the trimming in the FlateFilter is that it seams to not be the correct place to trim - see below.

You removed the length trimming in two places. And with that last commit in three.

Yes, I believe this is safer for now, in line with the above.

I'm trying to find where it is safe to do the trimming. Just assuming that whitespace is part of the stream is not always the right thing to do... but just trimming isn't either, or the tests wouldn't have failed.

Sadly, I don't think it's going to be easy. Going back to the Pdf 2.0 specifications:

7.3.8.2 Stream extent
Every stream dictionary shall have a Length entry that indicates how many bytes of the PDF file are used for the stream’ s data. (If the stream has a filter, Length shall be the number of bytes of encoded data.)

From this, my understanding is that the check should be be done in the PdfExtensions.Decode(...) methods, before any decoding. Doing the check in the FlateFilter might happen after another decoding. I removed this trimming in this PR, as it was creating an issue.

One possible approach would be to try decoding the stream first without any trimming. If this decoding fails, we trim the stream and try decoding again... Another approach could be to check for whitespaces. More than open to discuss, and my understanding of the specifications might be wrong.

As a side note, this is a prime example that pdf readers do not fully enforce the specs and are more lenient. This is also why PdfPig has a lenient mode.

Below are the documents that were failing before this PR:
AcroFormsBasicFields.pdf
FontMatrix-concat.pdf
caly-issues-58-2.pdf

BobLd · 2025-10-30T19:50:36Z

I asked Perplexity AI about our discussion:

in the context of pdf specification, and using feedback you can find on the web, should the /length parameter of a stream be relied on?

The PDF specification requires that every stream dictionary include the /Length entry, which specifies the number of bytes for the stream's data (typically the encoded data if filters are applied). However, in practice and based on community and expert feedback, you should be cautious about relying unconditionally on /Length.¹²

Specification Requirements

The /Length parameter is mandatory in the stream dictionary according to the PDF 1.7 specification.³¹
This /Length can be a direct or indirect object. If it is indirect, parsers must resolve the reference to find the correct length value.⁴
The canonical way to read a stream is to use the /Length field to read that many bytes, then ensure that what follows is the literal keyword endstream.²

Practical Reliability Issues

In the real world, malformed or hand-crafted PDFs can have incorrect /Length entries. This is especially true in documents created by buggy generators or those trying to evade detection or analysis.⁵
If a parser relies solely on /Length, but /Length is wrong, the stream read will result in garbage, truncated data, or unexpected errors.
Community discussions and StackOverflow guidance recommend double-checking that the /Length value does not overrun the next endstream and endobj keywords, and to fail safely or recover gracefully if mismatches are found.⁵
Some PDF libraries perform sanity checks: if the /Length leads past endstream or causes parser errors, they will fall back to searching for endstream as a secondary safeguard.⁵

Best Practices

Always read via /Length first, as that's canonical per spec.
Implement secondary safety: if /Length is out-of-range or mismatches the endstream delimiter, treat as an error or attempt stream recovery to maintain robustness.
Never 100% trust /Length from untrusted or unpredictable sources without basic sanity checking.⁵

In summary: /Length should be used according to spec, but practically, do not blindly trust it without verifying the correct stream boundaries, especially when working with PDFs from varied or untrusted origins.¹²⁵
⁶⁷⁸⁹¹⁰¹¹¹²¹³¹⁴¹⁵¹⁶¹⁷¹⁸¹⁹²⁰

⁂

Fix regression introduced in 3592fc8 where slicing the stream to the …

42f1a8c

…length breaks decoding

BobLd merged commit 3555521 into UglyToad:master Oct 23, 2025
2 checks passed

BobLd deleted the fix-regression-stream-length branch October 23, 2025 18:08

This was referenced Nov 23, 2025

Bump PdfPig from 0.1.11 to 0.1.12 yildirim-mehmet/onlineOfiice#7

Open

Bump PdfPig from 0.1.11 to 0.1.12 dotnet-presentations/ai-workshop#283

Open

This was referenced Dec 1, 2025

Bump PdfPig from 0.1.11 to 0.1.12 EvotecIT/OfficeIMO#1385

Open

Bump PdfPig from 0.1.11 to 0.1.12 MjrTom/PDF2MD#49

Open

Bump the nuget-all group with 9 updates magico13/MagiCloud#9

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix regression introduced in 3592fc8 where slicing the stream to the length breaks decoding #1192

Fix regression introduced in 3592fc8 where slicing the stream to the length breaks decoding #1192

Uh oh!

BobLd commented Oct 23, 2025

Uh oh!

Uh oh!

EliotJones commented Oct 23, 2025

Uh oh!

rhuijben commented Oct 29, 2025

Uh oh!

rhuijben commented Oct 29, 2025

Uh oh!

BobLd commented Oct 29, 2025

Uh oh!

rhuijben commented Oct 30, 2025

Uh oh!

BobLd commented Oct 30, 2025 •

edited

Loading

Uh oh!

BobLd commented Oct 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Fix regression introduced in 3592fc8 where slicing the stream to the length breaks decoding #1192

Fix regression introduced in 3592fc8 where slicing the stream to the length breaks decoding #1192

Uh oh!

Conversation

BobLd commented Oct 23, 2025

Uh oh!

Uh oh!

EliotJones commented Oct 23, 2025

Uh oh!

rhuijben commented Oct 29, 2025

Uh oh!

rhuijben commented Oct 29, 2025

Uh oh!

BobLd commented Oct 29, 2025

Uh oh!

rhuijben commented Oct 30, 2025

Uh oh!

BobLd commented Oct 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

BobLd commented Oct 30, 2025

in the context of pdf specification, and using feedback you can find on the web, should the /length parameter of a stream be relied on?

Specification Requirements

Practical Reliability Issues

Best Practices

Footnotes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

BobLd commented Oct 30, 2025 •

edited

Loading