Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FEAT: Text stream validation for literals #988

Open
wants to merge 2 commits into
base: integration
Choose a base branch
from

Conversation

carlwilson
Copy link
Member

  • added PdfTextStream class to handle:
    • detection of a text stream between BT and ET operators;
    • distinguish between hex streams (not validated yet) and string literals;
    • balance parenthesis in string literals while accounting for escaped characters;
  • added two new PDF error messages:
    • PDF-HUL-163 IO Exception reading text stream;
    • PDF-HUL-164 Unbalanced parentheses in text stream;
  • added a first cut function for walking pagesfor validation: PdfModule:checkPageTextStreams;
  • added method to check text streams: PageObject:checkTextStreams;
  • tidied up page content stream handling, empty lists are safer than nulls;
  • check page text streams after font finding;
  • removed unnecessary param from filter extraction;
  • fixed minor issue in header handling that terminated processing early for invalid files; and
  • added test files for the above.

- added `PdfTextStream` class to handle:
  - detection of a text stream between `BT` and `ET` operators;
  - distinguish between hex streams (not validated yet) and string literals;
  - balance parenthesis in string literals while accounting for escaped characters;
- added two new PDF error messages:
  - `PDF-HUL-163` IO Exception reading text stream;
  - `PDF-HUL-164` Unbalanced parentheses in text stream;
- added a first cut function for walking pagesfor validation: `PdfModule:checkPageTextStreams`;
- added method to check text streams: `PageObject:checkTextStreams`;
- tidied up page content stream handling, empty lists are safer than nulls;
- check page text streams after font finding;
- removed unnecessary param from filter extraction;
- fixed minor issue in header handling that terminated processing early for invalid files; and
- added test files for the above.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant