Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Close parenthesis included in Literal _rawBytes, causes incorrect NameTreeNode.compareKey() #696

Closed
karenhanson opened this issue Nov 25, 2021 · 0 comments · Fixed by #734

Comments

@karenhanson
Copy link
Contributor

karenhanson commented Nov 25, 2021

I think this is a bug, but it may also be badly labeled annotation links in a PDF. I'm logging it in case others find the same issue, or the underlying cause has additional implications. It is causing numerous errors like this: edu.harvard.hul.ois.jhove.module.pdf.PdfInvalidException: Invalid indirect destination - referenced object 'bm_b11' cannot be found in a particular set of PDFs. In this instance the problem seems to be that there is also a 'bm_b1' reference as well as 'bm_b11'.

The problem stems from the position of this line in the Literal class:


A character is appended to _rawBytes regardless of whether it is a close parentheses that should end the Literal. That means the _rawBytes value ends with ",41" (close parentheses).

Where this causes problems is when performing NameTreeNode.compareKey() here:


The compare process uses the _rawBytes - for comparison it basically truncates the longer key to a shorter key. If the two being compared are a different length, it will not reach the ",41"/ close paren of the longer _rawBytes, but it will be the last character compared on the end of the shorter one. That means if the character that is compared to the close paren falls below 41, it returns -1 and exits the matching loop. This would be fine if the character was alpha-numeric, but in this case the character being compared is a null and so the annotation links, which are functioning when rendered, are causing validation errors in JHOVE.

Example of the problem reference:
image
I think this should be valid?

To confirm this was causing the problem, I did a quick hack to move this line:


... to the last line of the for-loop since a close paren will cause a return offset; and the character will not be appended to the _rawBytes. Moving that line caused the error messages to stop.

karenhanson added a commit to karenhanson/jhove that referenced this issue Apr 14, 2022
Proposed fix for openpreserve#696 - Unless there is a reason to include a close paren in the rawbytes output, I think the rawbytes.add should be at the end of the loop.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
1 participant