Public `PdfScanner` Accessor on `Page` by jeske · Pull Request #1305 · UglyToad/PdfPig

jeske · 2026-05-24T17:45:39Z

TLDR; I am use PdfPig in a PDF renderer, and in order to render embedded XForm objects, I had to expose PdfScanner.

PdfPig Fork: Public `PdfScanner` Accessor on `Page`

Summary

We added a public IPdfTokenScanner PdfScanner property to UglyToad.PdfPig.Content.Page (commit b1a0abd9). This exposes the page's internal token scanner — the component responsible for resolving indirect object references within a PDF — to consumers of the Page object.

Why This Was Needed

The Problem: Rendering Embedded Form XObjects

PDF pages can contain Form XObjects — reusable content streams referenced via the Do operator (e.g., repeated logos, vector artwork, or template overlays). When our renderer (PdfPageView) encounters an InvokeNamedXObject operation, it needs to:

Navigate the page's resource dictionary: Page Dictionary → /Resources → /XObject → /<name>
Resolve the XObject stream (which may be stored as an indirect reference)
Read the Form XObject's /Matrix entry (also potentially an indirect reference)
Decode the content stream and recursively render the operations

Each of these steps requires resolving PDF indirect references (e.g., 12 0 R) back to their actual token values. That's what IPdfTokenScanner does — it's the lookup table from object numbers to their resolved content.

The Upstream Gap

PdfPig's Page class held the scanner as a private field (pdfScanner) and used it internally for its own operations (annotations, experimental access, etc.), but never exposed it to consumers. There was no public API to resolve arbitrary indirect references from a page's dictionary tree.

Without scanner access, our renderer had no way to:

Walk the /Resources → /XObject dictionary chain (entries are often indirect references)
Resolve the StreamToken for a Form XObject
Read the /Matrix array from a Form XObject's stream dictionary

Where We Use It

1. `SafePdfDocumentModel.ResolveFormXObject()` (SafePdfDocumentModel.cs:94)

var scanner = page.PdfScanner;

// Navigate: Page Dictionary → /Resources → /XObject → /<name>
PdfExtensions.TryGet<DictionaryToken>(page.Dictionary, NameToken.Resources, scanner, out var resources);
PdfExtensions.TryGet<DictionaryToken>(resources, NameToken.Xobject, scanner, out var xobjectDict);
PdfExtensions.TryGet<StreamToken>(xobjectDict, xobjectName, scanner, out var xobjectStream);

This resolves the full chain of indirect references from the page dictionary down to the actual Form XObject stream, then decodes and parses it into renderable operations. Results are cached per (page, xobjectName) for reuse across frames.

2. `PdfPageView.ResolveFormXObject()` (PdfPageView.cs:936)

Same pattern as above — a fallback path used when no document model is available.

3. Form XObject `/Matrix` Resolution (PdfPageView.cs:683)

PdfExtensions.TryGet<ArrayToken>(formStream.StreamDictionary, NameToken.Matrix, _page!.PdfScanner, out var matrixToken)

After resolving the Form XObject stream, we need to check if it has a /Matrix entry (a 6-element affine transform that positions the XObject content). This entry could itself be an indirect reference, so the scanner is needed here too.

The Change (in the fork)

File: src/UglyToad.PdfPig/Content/Page.cs

/// <summary>
/// The PDF token scanner for resolving indirect references in this page's dictionary.
/// </summary>
public IPdfTokenScanner PdfScanner => pdfScanner;

This is a minimal, read-only property exposing the existing private field. No behavioral changes, no new allocations, no breaking changes to existing consumers.

Alternatives Considered

Reflection: Could access the private field via reflection, but fragile and slow in a per-frame render loop.
Re-opening the document with a custom parser: Would duplicate state and lose page-level caching.
Using PdfPig's built-in Page.GetImages() / content stream API: PdfPig's internal rendering pipeline processes Form XObjects, but doesn't expose the parsed operations in a way our custom NanoVG-based renderer can consume. We need the raw IGraphicsStateOperation list to drive our own graphics state machine.

Related Fork Changes

7e0ac6da — Made ProcessOperations virtual in BaseStreamProcessor (upstream PR by BobLd), enabling custom stream processors to override Form XObject handling.
217b776b — Fixed decode values in images (related to correct XObject rendering).

BobLd · 2026-05-25T10:27:50Z

@@ -1,4 +1,4 @@
-<Project Sdk="Microsoft.NET.Sdk">
+<Project Sdk="Microsoft.NET.Sdk">


can you undo changes in this document?

BobLd · 2026-05-25T10:33:16Z

@jeske I have added a comment. Side question, what pdf renderer are you using?

BobLd · 2026-05-26T14:44:40Z

@jeske given the token scanner is shared across pages, making it available at page level might be a bit misleading. What about at PdfDocument level? Possibly just making it public in the Structure object

PdfPig/src/UglyToad.PdfPig/Structure.cs

Line 14 in 450b855

public class Structure

jeske · 2026-05-27T06:12:01Z

@jeske given the token scanner is shared across pages, making it available at page level might be a bit misleading. What about at PdfDocument level? Possibly just making it public in the Structure object

PdfPig/src/UglyToad.PdfPig/Structure.cs

Line 14 in 450b855

public class Structure

That's sensible. ill submit another patch. Thanks!

The pdf renderer is my own. Intent is 99% managed pdf viewer to (mostly) eliminate common buffer overrun security vulnerabilites. And its also obscenely fast.

Pdfpig parsing, wrapped my own 2d toolkit called Fluid, and forked SilkyNvg that i added a veldrid backend to (and soon CFF).

jeske · 2026-05-29T19:31:20Z

closing as this is superceded by - #1308

add access to the PdfScanner

b1a0abd

BobLd requested changes May 25, 2026

View reviewed changes

jeske mentioned this pull request May 29, 2026

make TokenScanner public through Structure to iterate indirect objects #1308

Merged

jeske closed this May 29, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Public `PdfScanner` Accessor on `Page`#1305

Public `PdfScanner` Accessor on `Page`#1305
jeske wants to merge 1 commit into
UglyToad:masterfrom
ArtificialNecessity:master

jeske commented May 24, 2026

Uh oh!

BobLd May 25, 2026

Uh oh!

BobLd commented May 25, 2026

Uh oh!

BobLd commented May 26, 2026

Uh oh!

jeske commented May 27, 2026 •

edited

Loading

Uh oh!

jeske commented May 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		@@ -1,4 +1,4 @@
		<Project Sdk="Microsoft.NET.Sdk">
		<Project Sdk="Microsoft.NET.Sdk">

Uh oh!

Conversation

jeske commented May 24, 2026

PdfPig Fork: Public PdfScanner Accessor on Page

Summary

Why This Was Needed

The Problem: Rendering Embedded Form XObjects

The Upstream Gap

Where We Use It

1. SafePdfDocumentModel.ResolveFormXObject() (SafePdfDocumentModel.cs:94)

2. PdfPageView.ResolveFormXObject() (PdfPageView.cs:936)

3. Form XObject /Matrix Resolution (PdfPageView.cs:683)

The Change (in the fork)

Alternatives Considered

Related Fork Changes

Uh oh!

BobLd May 25, 2026

Choose a reason for hiding this comment

Uh oh!

BobLd commented May 25, 2026

Uh oh!

BobLd commented May 26, 2026

Uh oh!

jeske commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jeske commented May 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

PdfPig Fork: Public `PdfScanner` Accessor on `Page`

1. `SafePdfDocumentModel.ResolveFormXObject()` (SafePdfDocumentModel.cs:94)

2. `PdfPageView.ResolveFormXObject()` (PdfPageView.cs:936)

3. Form XObject `/Matrix` Resolution (PdfPageView.cs:683)

jeske commented May 27, 2026 •

edited

Loading