Fix font width array parsing when entries are indirect object references#1307
Merged
Conversation
Refactor error handling for non-numeric tokens in width array.
BobLd
approved these changes
May 25, 2026
This was referenced Jun 25, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
When parsing font dictionaries,
FontDictionaryAccessHelper.GetWidths()iterates the/Widthsarray and expects every element to be a directNumericToken. However, the PDF specification allows array elements to be indirect object references that resolve to numeric values.When a width entry is an
IndirectReferenceToken(e.g.,212 0 Rresolving to value763), the method throwsInvalidFontFormatException: Token which was not a number found in the widths array.This is particularly common in PDFs that use object streams (
/Type /ObjStm) for storing font dictionaries, where individual width values may be stored as separate indirect objects.Since
PdfDocument.GetPage()wraps exceptions withPdfDocumentEncryptedExceptionwhen the document is encrypted, the user-visible error is the misleading message: "Document was encrypted which may have caused error when retrieving page" — even though the actual issue is unrelated to encryption.Fix
In
GetWidths(), when an array element is not aNumericToken, attempt to resolve it viaDirectObjectFinder.TryGet<NumericToken>()before throwing. ThepdfScannerparameter is already available in the method signature.Impact
Fixes page access failures on PDFs where font width arrays contain indirect references. Tested on a 400-page encrypted PDF that previously failed on 228 pages — all 400 pages now succeed.