FEAT: Text stream validation for literals #988

carlwilson · 2025-01-26T09:26:05Z

added PdfTextStream class to handle:
- detection of a text stream between BT and ET operators;
- distinguish between hex streams (not validated yet) and string literals;
- balance parenthesis in string literals while accounting for escaped characters;
added two new PDF error messages:
- PDF-HUL-163 IO Exception reading text stream;
- PDF-HUL-164 Unbalanced parentheses in text stream;
added a first cut function for walking pagesfor validation: PdfModule:checkPageTextStreams;
added method to check text streams: PageObject:checkTextStreams;
tidied up page content stream handling, empty lists are safer than nulls;
check page text streams after font finding;
removed unnecessary param from filter extraction;
fixed minor issue in header handling that terminated processing early for invalid files; and
added test files for the above.

- added `PdfTextStream` class to handle: - detection of a text stream between `BT` and `ET` operators; - distinguish between hex streams (not validated yet) and string literals; - balance parenthesis in string literals while accounting for escaped characters; - added two new PDF error messages: - `PDF-HUL-163` IO Exception reading text stream; - `PDF-HUL-164` Unbalanced parentheses in text stream; - added a first cut function for walking pagesfor validation: `PdfModule:checkPageTextStreams`; - added method to check text streams: `PageObject:checkTextStreams`; - tidied up page content stream handling, empty lists are safer than nulls; - check page text streams after font finding; - removed unnecessary param from filter extraction; - fixed minor issue in header handling that terminated processing early for invalid files; and - added test files for the above.

carlwilson added 2 commits January 24, 2025 11:23

Merge branch 'integration' into feat/textstreamvalidation

c1a7bde

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FEAT: Text stream validation for literals #988

FEAT: Text stream validation for literals #988

carlwilson commented Jan 26, 2025

FEAT: Text stream validation for literals #988

Are you sure you want to change the base?

FEAT: Text stream validation for literals #988

Conversation

carlwilson commented Jan 26, 2025