Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metadata and AF entries permitted in any PDF object #65

Closed
bdoubrov opened this issue Apr 20, 2023 · 3 comments
Closed

Metadata and AF entries permitted in any PDF object #65

bdoubrov opened this issue Apr 20, 2023 · 3 comments

Comments

@bdoubrov
Copy link
Collaborator

PDF 2.0 permits Metadata and AF entries in any PDF object. Currently the model only specifies these two keys only for the dictionaries, where they are explicitly present. But ISO 32000-2 (14.3 - Metadata, 14.13 - Associated files) allows the use of these entries even if they are not explicitly mentioned.

Is this something that has to be addressed somehow in the Arlington model as well? Either as a special clause saying that these two keys are permitted everywhere, or even explicitly adding them to all objects?

@bdoubrov bdoubrov changed the title Metadata and AF entries permitted in any PDF dictionary Metadata and AF entries permitted in any PDF object Apr 20, 2023
@petervwyatt
Copy link
Member

You are correct with your comment: these keys are in the data model only where they are explicitly mentioned in ISO 32K.

In the TestGrammar (C++) PoC, I explicitly coded to check all dictionaries and to report as INFO messages when they are encountered but are not in the Arlington model (i.e. are not explicitly stated in 32K). They are not errors, but in my mind they have a different "level" of official-ness than private / undocumented entries.

A similar discussion might also be made where some dictionaries are defined in 32K to explicitly allow arbitrary key names (which I encoded as * keys in such dicts), as any dictionary can have any key simply because PDF is extensible by design.

Do you have a preference?

@bdoubrov
Copy link
Collaborator Author

bdoubrov commented Apr 20, 2023

I really find important that all requirements of the Arlington model are transparent as either a part of tsv grammar or maybe as some extra documentation clauses, for example, in INTERNAL_GRAMMAR.md.

For example, it would be great if there was a list of all cases which are treated with different level of severity. Ideally, defined in a machine syntax similar to tsv files, but at least unambiguously documented. This would leave no space for different interpretations of the Arlington model by different people / implementations.

Then having AF and Metadata entries reported as INFO messages, where they are not explicitly declared, would certainly make sense.

@petervwyatt
Copy link
Member

petervwyatt commented Apr 21, 2023

I'll make a new MD file called MODEL_NOTES (it's not really the internal grammar per-se).
This can document such things as well as known model limitations, etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants