Add CoreTokenScanner.ClearPreReadByte() and further fix #1332 by BobLd · Pull Request #1334 · UglyToad/PdfPig

BobLd · 2026-06-19T21:11:53Z

Commit 9c9cb41 added % to the PlainTokenizer break set so a comment immediately following a keyword/operator terminates the token, which matches ISO 32000-2 §7.2.3–7.2.4 and pdfbox's BaseParser.isEndOfName (which likewise treats % as a delimiter).

That change exposed a latent bug: PdfPig's CoreTokenScanner keeps a persistent look-ahead byte (hasBytePreRead) that was not discarded on Seek, so after jumping to an object's xref offset the scanner began reading one byte late and failed, triggering a spurious brute-force scan during the still-NoOp encryption phase that cached objects undecrypted.

The fix adds CoreTokenScanner.ClearPreReadByte() and calls it after the seeks in PdfTokenScanner.Get and TryBruteForceFileToFindReference, so the next read starts exactly at the sought byte. This mirrors pdfbox, whose parseFileObject does source.seek(objOffset) and immediately reads from that offset using read-then-rewind(1) look-ahead, meaning no byte is ever carried across a seek.

Add CoreTokenScanner.ClearPreReadByte() and further fix UglyToad#1332

7b11675

BobLd merged commit 5d9dd37 into UglyToad:master Jun 19, 2026
2 checks passed

BobLd deleted the issues/1332-2 branch June 19, 2026 21:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add CoreTokenScanner.ClearPreReadByte() and further fix #1332#1334

Add CoreTokenScanner.ClearPreReadByte() and further fix #1332#1334
BobLd merged 1 commit into
UglyToad:masterfrom
BobLd:issues/1332-2

BobLd commented Jun 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

BobLd commented Jun 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant