Skip to content

make TokenScanner public through Structure to iterate indirect objects#1308

Merged
BobLd merged 1 commit into
UglyToad:masterfrom
ArtificialNecessity:us_master
May 29, 2026
Merged

make TokenScanner public through Structure to iterate indirect objects#1308
BobLd merged 1 commit into
UglyToad:masterfrom
ArtificialNecessity:us_master

Conversation

@jeske

@jeske jeske commented May 29, 2026

Copy link
Copy Markdown
Contributor

TLDR - I am using PdfPig in a custom PDF Renderer, and in order to render embedded XForm object, I need access to TokenScanner/PdfScanner publically.

This Supercedes - #1305

This is a screenshot of an early alpha build of SafePDF, intended to provide a free 99% memory safe PDF reader for the masses that is not vulnerable to the kind of buffer overrun CVEs as Acrobat. It's built with my custom UI toolkit Fluid, vector rendering using my SilkyNvg fork to which I added a Veldrid GPU backend (it's wicked fast).

image

Summary

It's a one-line visibility change. No new allocations, no behavioral changes, no breaking changes.

File: src/UglyToad.PdfPig/Structure.cs

// Was: internal IPdfTokenScanner TokenScanner { get; }
public IPdfTokenScanner TokenScanner { get; }

Why

When rendering Form XObjects, the renderer must:

  1. Walk Page Dictionary → /Resources → /XObject → /<name> (entries are often indirect references)
  2. Resolve the StreamToken for the Form XObject
  3. Read the /Matrix array from the stream dictionary (also potentially indirect)
  4. Decode and parse the content stream into operations

Each step requires IPdfTokenScanner to dereference IndirectReferenceToken values. PdfPig uses this scanner internally everywhere, but never exposed it for consumers who need to traverse dictionary trees themselves.

Usage (consumer side)

// Access scanner at document level (correct — it's shared across all pages)
var scanner = pdfDocument.Structure.TokenScanner;

// Resolve indirect references in page resource dictionaries
PdfExtensions.TryGet<DictionaryToken>(page.Dictionary, NameToken.Resources, scanner, out var resources);
PdfExtensions.TryGet<DictionaryToken>(resources, NameToken.Xobject, scanner, out var xobjectDict);
PdfExtensions.TryGet<StreamToken>(xobjectDict, xobjectName, scanner, out var xobjectStream);

Alternatives Considered

  • PdfPig's built-in content stream processing: Processes Form XObjects internally, but doesn't expose parsed IGraphicsStateOperation lists in a way an external renderer can consume.

@jeske jeske changed the title make TokenScanner public through Structure to access indirect object … make TokenScanner public through Structure to iterate indirect objects May 29, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants