Skip to content

Conversation

@EliotJones
Copy link
Member

as long as there is a pages entry we accept this in lenient parsing mode. this is to fix document 006705.pdf in the corpus that had '/calalog' as the dictionary entry.

also adds a test for some weird content stream content in 0006324.pdf where numbers seem to get split in the content stream on a decimal place. this is just to check that our parser doesn't hard crash

as long as there is a pages entry we accept this in lenient parsing mode. this
is to fix document 006705.pdf in the corpus that had '/calalog' as the dictionary
entry.

also adds a test for some weird content stream content in 0006324.pdf where
numbers seem to get split in the content stream on a decimal place. this is
just to check that our parser doesn't hard crash
@EliotJones EliotJones requested a review from BobLd July 26, 2025 21:55
@BobLd BobLd merged commit 83d6fc6 into master Jul 27, 2025
2 checks passed
@BobLd BobLd deleted the fix-document-with-calalog-typo branch July 27, 2025 01:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants