-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
obj followed by endobj? #2
Comments
@faceless2 Thanks Mike! I was focused on the statement under 7.3.9 Null object "An indirect object reference (see 7.3.10, "Indirect objects") to a nonexistent object shall be treated the same as a null object" so this was an attempt to make a form of "nonexistent object" while also having a fully valid xref, which I believe it is. |
Aha, that makes more sense. That's a different test to the one I thought you were aiming at. The non-existent object case we usually see is simply a reference to an object that doesn't exist, eg |
I'm with @faceless2 here: The object in question exists because it has an entry in the cross-reference table. However, it is invalid since its value is missing. If you want an indirect object reference to a nonexistent object, there are two options:
At least two tools, HexaPDF and qpdf, mark the file as invalid due to this. |
I'm not so interested in tools as I can drive trucks through those tools and almost any other tool... :-) What would be far better is to get a point of a fully unambiguous definition of everything that we can be reasoned about mathematically. So I'm re-opening because this thread is going in interesting directions... and I may want to copy this across to the pdf-issues GitHub repo eventually since I'm fairly sure we could improve the spec wording! So if we conclude that a syntactically valid PDF object requires
Thoughts? |
I have some thoughts about that. Just to give things names: let's say that On those grounds, I'd consider your second example a syntactically valid PDF object definition. On the other hand, I don't think the first and third example make sense in that context: in my opinion, even assigning the Just my 2¢. |
A comment is not a valid PDF object. PDF objects are listed in section 7.3 of the spec, with 7.3.1 listing all eight basic object types. Comments are defined in section 7.2.3. So your example would be equivalent to
The dictionary case is how such things are normally handled in programming languages. For example, in Ruby you have hashes and if there is no value for a key, h = {}
h[:key] = :value
p h[:key] # => :value
p h[:unknown] # => nil So having a key with nil/null as value gets the same result as not having that key at all. There is a semantic difference that is sometimes used, e.g. when one wants to know whether a key was explicitly set to the nil/null value. But that is, I think, of no interest in the PDF case. And yes, in the case of an array the null object would need to be kept in order to keep the indices correct. Again, this is just how such things work and this feature is also used in the PDF spec, see section "12.3.2.2 Explicit Destinations".
Again, we would need to have one of the PDF objects listed in section 7.3 for this to be a valid indirect object. The garbage you are referring to would be treated as a token, just like 'obj' is a token or |
The difference between
Fair call, and I agree. But more interesting than determining whether a sequence is valid, is determining how to parse it when it's not. What do we do if length doesn't match the parsed length? At what point we terminate parsing if there's trailing noise? Sure it's invalid, but standardising (or at least recommending) an approach for recovery is the big prize in my opinion. Although perhaps it's because it feels like I've spent much of the last 20 years in this gray area... |
It's a funny coincidence that I find out about the SafeDocs project, the Arlington PDF Model and so on a few days after I started a project to document the ways PDF documents can be invalid and how to handle them - see https://github.com/gettalong/annotated-pdf-spec/ |
@gettalong I would also strongly encourage you to join ISO TC 171 SC 2 if you can, either via your national body (if it is a participating member country) or via the PDF Association (as it has an official liaison with ISO). There are many other discussions that occur within ISO that are restricted by ISO in what can be said publicly - you may infer some activities from https://www.pdfa.org/iso-status/ but there are others. The PDF Association also has many Technical Working Groups all working across various topics that is often input into ISO. If you want to discuss more, feel free to DM me. |
Peter - this is really great, by the way. The only assertion I have an issue with in this PDF is this token sequence:
2 0 obj endobj
I don't think this meets the requirements of the spec: here's the text (identical in both PDF1.4 and ISO32K2:2020)
The "value of the object" is missing, so I'm not sure that's a valid sequence of tokens.
The text was updated successfully, but these errors were encountered: