Skip to content

Latest commit

 

History

History
1185 lines (1096 loc) · 64.1 KB

INTERNAL_GRAMMAR.md

File metadata and controls

1185 lines (1096 loc) · 64.1 KB

Arlington PDF Model Grammar Validation Rules

This document describes some strict rules for the Arlington PDF model, for both the data and the predicates (custom declarative predicates that start fn:). Only some of these rules are currently implemented by various PoCs, but everything is precisely documented here.

Note that the Arlington PDF Model accurately reflects the latest agreed ISO 32000-2:2020 PDF 2.0 specification (available for no-cost) and as amended by industry-agreed errata from https://pdf-issues.pdfa.org. If this state of affairs is unsuitable for adopters of the Arlington PDF Model (e.g. unresolved errata are causing issues for implementations) then the recommended practice is for those specific implementations to create private diff patches against the model as it is entirely text-based.

TSV file rules

  • They are TSV, not CSV. Use tabs (\t).
  • No double quotes are used.
  • Every TSV file needs to have the same identical header row as first line in file
  • EOL rules for TSV are now set by .gitattributes to be LF -
    • standard Linux CLI works including under Windows WSL2: cut, grep, sed, etc.
    • this means you can also use all the Ebay TSV utilities even under Windows
    • GNU datamash can also be used.
  • Every TSV file needs to have the full set of TABS (for all columns).
  • Last row in TSV needs EOL after last TAB.
  • TSV file names are case-sensitive.
  • TSV file extensions are always .tsv (lowercase) but are not present in the TSV data itself.
  • all TSV files will have matching numbers of [, ] and (, )
  • for a single row in any TSV, splitting each field on ';' will either result in 1 or N.
  • files that represent PDF arrays match either ArrayOf*.tsv, *Array.tsv or *ColorSpace.tsv
    • many are also identifiable by having a Key name of 0 (or 0* or *)
    grep "^0" *
  • files that represent PDF 'map' objects (meaning that the dictionary key name can be anything) match *Map.tsv
    • note that CMaps are in CMapStream.tsv
  • NOT all files that are PDF stream objects match *Stream.tsv
    • since each Arlington object is fully self-contained, many objects can be streams. The best method is to search for DecodeParms key instead:
    grep "^DecodeParms" * | tsv-pretty

PDF Object conventions

  • There are NO leading SLASHES for PDF names (ever!)
  • PDF names don't use #-escaping (currently unsupported)
  • PDF strings use single quotes ' and ' (since ( and ) are ambiguous with expressions and single quotes are supported natively by Python csv module)
  • Expressions with integers need to use integers. Integers can be used in place of numbers.
  • * represents a wildcard (i.e. anything). Other regex are not supported. Wildcards can be used in the Key field and in the PossibleValues field for names (when the PDF standard specifically states that other arbitrary names can be used)
  • Leading @ indicates "value of" a key or array element
  • PDF Booleans are true and false lowercase.
    • Uppercase TRUE/FALSE are reserved for logical Boolean TSV data fields such as the "Required" field.
  • expressions using && or || logical operators need to be either fully bracketed or be just a predicate and have a single SPACE either side of the logical operator. precedence rules are NOT implemented.
  • the predefined Arlington paths parent:: and trailer:: represent the parent of the current object and file trailer (either traditional or a cross-reference stream) respectively. All other paths are relative from the containing PDF object
  • PDF arrays always use [ and ] (which may require some additional processing so as not to be confused with our [];[];[] syntax for complex fields)
    • elements in a PDF array do not use COMMA-separators and are specified just like in PDF e.g. [0 1 0]
    • if a PDF array needs to be specified as part of a complex typed key ([];[];[]) then 2 sets of [ and ] need to be used for the array values
      • e.g. [[0 1]];[123];[SomeThing] might be a Default Value for a PDF key that can be an array, an integer or a name (alphabetically sorted in the "Type" field!) each with a default value.
      • this extra pair of [ and ] is only needed for complex types.

TSV Data Fields

  • A key or array element is so-called "complex" if it can be multiple values. This is represented by [];[];[]-type expressions.
  • Something is so called a "wildcard" if the "Key" field contains an ASTERISK.
  • An array is so-called a "repeating array" if it requires N x a set of elements. This is represented by DIGIT+ASTERISK in the "Key" field.
    • Repeating array elements with DIGIT+ASTERISK must be the last rows in a TSV
    • e.g. 0* 1* 2* would be an array of 3 * N triplets of elements
    • e.g. 0 1* 2* would be an array of 2 * N + 1 elements, where the first element has a fixed definition, followed by repeating pairs of elements

Column 1 - "Key"

  • Must not be blank
  • Case-sensitive (as per PDF spec)
  • No duplicates keys in any single TSV file
  • Only alphanumeric, ., -, _ or ASTERISK characters (no whitespace or other special characters)
    • The proprietary Apple APPL extensions also use : (COLON) as in AAPL:ST
  • If a dictionary, then "Key" may also be an ASTERISK * meaning wildcard, so anything is allowed
  • If ASTERISK * by itself then must be last row in TSV file
  • If ASTERISK * by itself then "Required" column must be FALSE
  • If representing a PDF array, then "Key" name is really an integer array index.
    • Zero-based increasing (always by 1) integers always starting at ZERO (0), with an optional ASTERISK appended after the digit (indicating repeat)
    • Or just an ASTERISK * meaning that any number of array elements may exist
  • If representing a PDF array with a repeating set of array elements (such as alternating pairs of elements) then use digit+ASTERISK where the last set of rows must all be digit+ASTERISK (indicating a repeating group of N elements starting at array element M (so array starts with a fixed set (non-repeating) array elements 0 to M-1, followed by the repeating set of element M to (M + N-1)) array elements).
  • If representing a PDF array with digit+ASTERISK then the "Required" column should be TRUE if all N entries must always be repeated as a full set (e.g. in pairs or quads).
  • Python pretty-print/JSON
    • String (as JSON dictionary key)
  • Linux CLI tests:
    # List of all key names and array indices
    cut -f 1 * | sort -u
  • files that define objects with an arbitrary number of keys or array elements use the wildcard *. If the line number of the wildcard is line 2 then it is a map-like object. If the line number is after 2, then are additional fixed keys/elements.
    grep --line-number "^\*" * | sed -e 's/\:/\t/g' | tsv-pretty
  • files that define arrays with repeating sequences of N elements use the digit+ASTERISK syntax. Digit is currently restricted to a SINGLE digit 0-9.
    grep "^[0-9]\*" * | tsv-pretty

Column 2 - "Type"

  • Must not be blank
  • Alphabetically sorted, SEMI-COLON separated list from the following predefined set of Arlington types (always lowercase):
    • array
    • bitmask
    • boolean
    • date
    • dictionary
    • integer
    • matrix
    • name
    • name-tree
    • null
    • number
    • number-tree
    • rectangle
    • stream
    • string
    • string-ascii
    • string-byte
    • string-text
  • Each type may also be wrapped in a version-based predicate (e.g. fn:SinceVersion(version,type), fn:Deprecated(version,type), fn:Extension(name,type), etc.).
  • When a predicate is used, the internal simple type is still kept in its alphabetic sort order
  • The following predefined Arlington types ALWAYS REQUIRE a link:
    • array, dictionary, stream
  • The following predefined Arlington types MAY have a link (this is because name and number trees can have nodes which are the primitive Arlington types below or a complex type above):
    • name-tree, number-tree
    • e.g. Navigator\Strings is a name-tree of string objects
  • The following predefined Arlington types NEVER have a link (they are the primitive Arlington types):
    • bitmask, boolean, date, integer, matrix, name, null, number, rectangle, string, string-ascii, string-byte, string-text
  • Note that null is only an explicit type when mentioned in ISO 32000-2:2020.
    • Dictionary handling is covered by subclause 7.3.7 "A dictionary entry whose value is null (see 7.3.9, "Null object") shall be treated the same as if the entry does not exist." so dictionaries will never have a null type unless ISO 32000-2 explicitly mentions it or there is a glitch in the matrix (e.g. Table 207 for Mac and Unix entries).
    • Array objects and name-tree and number-trees are more complex as ISO 32000-2:2020 makes no statements about null. See also Arlington Issue #90 and PDF 2.0 Errata #157.
  • Python pretty-print/JSON:
    • Always a list
    • List elements are either:
      • Strings for the basic types listed above
      • Python lists for predicates - a simple search through the list for a match to the types above is sufficient (if understanding the predicate is not required)
    • Not to be confused with "/Type" keys which is why the [ is included in this grep!
    • grep "'Type': \[" dom.json | sed -e 's/^ *//' | sort -u
  • Linux CLI tests:
    cut -f 2 * | sort -u
    cut -f 2 * | sed -e "s/;/\n/g" | sort -u

Column 3 - "SinceVersion"

  • Must not be blank
  • Must resolve to one of 1.0, 1.1, ... 1.7 or 2.0
  • Can be a predicate such as fn:Extension(...) or fn:Eval(...)
    • e.g. fn:Extension(XYZ,2.0) or fn:Eval(fn:Extension(XYZ,1.3) || 1.6)
  • In the future the set of versions may be increased - e.g. 2.1
  • Version-based predicates in other fields should all be based on versions explicitly AFTER the version in this column
  • Python pretty-print/JSON
    • Always a string (never blank!)
    • Value is one of the values listed above
    • grep "'SinceVersion'" dom.json | sed -e 's/^ *//' | sort -u
  • Linux CLI tests:
    cut -f 3 * | sort -u

Column 4 - "DeprecatedIn"

  • Can be blank
  • Must be one of 1.0, 1.1, ... 1.7 or 2.0
  • Version-based predicates in other fields should all be based on versions explicitly BEFORE the version in this column
  • In the future:
    • Set of versions may be increased - e.g. 2.1
  • Python pretty-print/JSON
    • A string or None
    • Value is one of the values listed above
    • grep "'Deprecated': " dom.json | sed -e 's/^ *//' | sort -u
  • Linux CLI tests:
    cut -f 4 * | sort -u

Column 5 - "Required"

  • Must not be blank
  • Either:
    • Single word: FALSE or TRUE (uppercase only)
    • The predicate fn:IsRequired(...) - no SQUARE BRACKETS!
      • This may then have further nested predicates (e.g. fn:SinceVersion, fn:IsPresent, fn:Not)
  • If "Key" column contains ASTERISK (as a wildcard), then "Required" field must be FALSE
    • Cannot require an infinite number of keys! If need at least one element, then have explicit first rows with "Required"==TRUE followed by ASTERISK with "Required"==FALSE)
  • Python pretty-print/JSON:
    • Always a list
    • List length is always 1
    • List element is either:
      • Boolean
      • Python list for predicates which must be fn:IsRequired(
    • grep "'Required': " dom.json | sed -e 's/^ *//' | sort -u
  • Linux CLI tests:
    cut -f 5 * | sort -u

Column 6 - IndirectReference

  • Must not be blank
  • Streams must always have "IndirectReference" as TRUE
  • For name- and number-trees, the value represents what the direct/indirect requirements of the values of tree (e.g. if it is a stream, it would be TRUE)
  • Either:
    • Single word: FALSE or TRUE (uppercase only, as it is not a PDF keyword!); or
    • Single predicate fn:MustBeDirect() or fn:MustBeIndirect() indicating that the corresponding key/array element must be a direct object or not
    • [];[];[] style expression - SEMI-COLON separated, SQUARE-BRACKETS expressions that exactly match the number of items in the "Type" column. Only the values TRUE or FALSE can be used inside each [...].
    • A more complex set of requirements using the predicate fn:MustBeDirect(optional-key-path>) or fn:MustBeIndirect(...) NOT enclosed in SQUARE-BRACKETS
  • Python pretty-print/JSON:
    • Always a list
    • List length always matches length of "Type" column
    • List elements are either:
      • Python Boolean (True/False)
      • Python list for predicates where the outer-most predicate must be fn:IsRequired(, with an optional argument for a condition
    • grep "'IndirectReference':" dom.json | sed -e 's/^ *//' | sort -u
  • Linux CLI tests:
    cut -f 6 * | sort -u

Column 7 - Inheritable

  • Must not be blank
  • Single word: FALSE or TRUE (uppercase only, as it is not a PDF keyword!)
  • Python pretty-print/JSON:
    • Always a boolean
    • grep "'Inheritable'" dom.json | sed -e 's/^ *//' | sort -u
  • Linux CLI tests:
    cut -f 7 * | sort -u

Column 8 - DefaultValue

  • Represents a default value for the PDF key/array element. As such it is always a single value for each Type.
    • see "PossibleValues" field below for when multiple values need to be specified.
  • Can be blank
  • SQUARE-BRACKETS are also used for PDF arrays, in which case they must use double SQUARE-BRACKETS if part of a complex type (not that lowercase true/false are the PDF keywords). If the array is the only valid type, then single SQUARE-BRACKETS are used. PDF array elements are NOT separated with COMMAs.
    • e.g. [[false false]];[123] vs [false false]
    • thus a complex expression can first be split by SEMI-COLON, then each portion has the SQUARE-BRACKETS stripped off - any remaining SQUARE-BRACKETS indicate an array.
  • If there is a "DefaultValue" AND there are multiple types, then require a complex [];[];[] expression
    • If the "DefaultValue" is a PDF array as part of a complex type, then this will result in nested SQUARE-BRACKETS as in [];[[0 0 1]];[]
  • The only valid predicates are:
    • fn:ImplementationDependent(), or
    • fn:DefaultValue(condition, value) where value must match the appropriate type (e.g. an integer for an integer key, a string for a string-* key, etc), or
    • fn:Eval(expression)
    • Predicates only need [];[];[] expression if a multi-typed key
  • Python pretty-print/JSON:
    • A list or None
    • If list, then length always matches length of "Type"
    • If list element is also a list then it is either:
      • Predicate with 1st element being a FUNC_NAME token
      • "Key" value (@key) with 1st element being a KEY_VALUE token
      • A PDF array (1st token is anything else) - including an empty PDF array
    • grep -o "'DefaultValue': .*" dom.json | sed -e 's/^ *//' | sort -u
  • Linux CLI tests:
    cut -f 8 * | sort -u
    cut -f 2,8 * | sort -u | grep -P "\t[[:graph:]]+.*" | tsv-pretty
    cut -f 1,2,8 * | sort -u | grep -P "\t[[:graph:]]*\t[[:graph:]]+.*$" | tsv-pretty

Column 9 - "PossibleValues"

  • Can be blank
  • SQUARE-BRACKETS are only required for complex types. A single type does not use them.
    • e.g. 12.34 is a valid default for a key which can only be a number
  • SEMI-COLON separated, SQUARE-BRACKETS expressions that exactly match the number of items in "Type" column
  • SQUARE-BRACKETS are also used for PDF arrays, in which case they must use double SQUARE-BRACKETS if part of a complex type. If the array is the only valid type, then single SQUARE-BRACKETS are used. PDF array elements are NOT separated with COMMAs - they are only used between arrays.
    • e.g. [[0 1],[1 0]];[Value1,Value2,Value3] is a choice of 2 arrays [0 1] and [1 0] if the type is an array or a choice of Value1 or Value2 or Value3 if the type was something else (e.g. name)
    • thus a complex expression can first be split by SEMI-COLON, then each portion has the SQUARE-BRACKETS stripped off, then multiple options can be split by COMMA as any remaining SQUARE-BRACKETS indicate an array.
  • If there is a "PossibleValues" AND there are multiple types, then require a complex [];[];[] expression
    • If the "PossibleValues" is a PDF array as part of a complex type, then this will result in nested SQUARE-BRACKETS as in [];[[0 0 1]];[]
  • For keys or arrays that are PDF names, a wildcard * indicates that any arbitrary name is explicitly permitted according to the PDF specification along with formally defined values (e.g. OptContentCreatorInfo, Subtype key: [Artwork,Technical,*]).
    • Do not use * as the only value - since an empty cell has the same meaning as "anything is OK" although there is some subtle nuances regarding whether custom keys have to be 2nd class names or can be really anything. See Errata #229
    • The wildcard must be the LAST entry in the list of names and, because it cannot be alone, it will always be preceded by a COMMA. This may occur in complex forms too such as [...];[...,*];[...].
    • The TestGrammar PoC will not report an error about unexpected values in this case unless the --explicit-values-only CLI option.
    • To locate all such uses in the Arlington model, search for ,*]: grep ",\*]" *.tsv
  • fn:Eval predicate wrapper is only needed for predicates which need to perform calculations. fn:Eval is not required around the version-based predicates (which includes fn:Extension) or expressions using fn:RequiredValue
  • Python pretty-print/JSON:
    • A list or None
    • If list, then length always matches length of "Type"
      • Elements can be anything, including None
    grep -o "'PossibleValues': .*" dom.json | sed -e 's/^ *//' | sort -u
  • Linux CLI tests:
    cut -f 9 * | sort -u

Column 10 - "SpecialCase"

  • Can be blank
  • SEMI-COLON separated, SQUARE-BRACKETED complex expressions that exactly match the number of items in "Type" column
  • Each expression inside a SQUARE-BRACKET is a predicate that reduces to TRUE/FALSE or is indeterminable.
    • TRUE means that it is a valid, FALSE means it would be invalid.
  • A SpecialCase predicate is not meant to reflect all rules from the PDF specification (things are declarative, not programmatic!)
    • It should not test for required/optional-ness, whether an object is indirect or not, etc. as those rules should live in the other fields
  • Python pretty-print/JSON:
    • A list or None
    • If list, then length always matches length of "Type"
      • Elements can be anything, including None
    grep -o "'SpecialCase': .*" dom.json | sed -e 's/^ *//' | sort -u

Column 11 - "Link"

  • Can be blank (but only when "Type" is a single basic type)
  • If non-blank, always uses SQUARE-BRACKETS
  • SEMI-COLON separated, SQUARE-BRACKETED complex expressions that exactly match the number of items in "Type" column
  • Valid "Links" must exist for these selected object types only:
    • array
    • dictionary
    • stream
    • name-tree - the value represents the node in the tree, not how trees are specified
    • number-tree - the value represents the node in the tree, not how trees are specified
  • "Links" must NOT exist for selected fundamental "Types" (i.e. must be empty [] in the SEMI-COLON separated list):
    • array
    • bitmask
    • boolean
    • date
    • integer
    • matrix
    • name
    • null
    • number
    • rectangle
    • string
    • string-ascii
    • string-byte
    • string-text
  • Each sub-expression inside a SQUARE-BRACKET is a COMMA separate list of case-sensitive filenames of other TSV files (without .tsv extension)
  • These sub-expressions MUST BE one of these version-based predicates:
    • fn:SinceVersion(pdf-version,link)
    • fn:SinceVersion(pdf-version,fn:Extension(name,link))
    • fn:IsPDFVersion(pdf-version,fn:Extension(name,link))
    • fn:Deprecated(pdf-version,link)
    • fn:BeforeVersion(pdf-version,link)
    • fn:IsPDFVersion(version,link)
  • Python pretty-print/JSON:
    • A list or None
    • If list, then length always matches length of "Type"
      • List elements can be None
      • Validity of list elements aligns with indexed "Type" data
  • Linux CLI test:
    # A list of all predicates used in the Link field (column 11)
    cut -f 11 * | sort -u | grep -o "fn:[a-zA-Z0-4]*" | sort -u

Column 12- "Notes"

  • Can be blank
  • Free text - no validation possible
  • Often contains a reference to Table(s) (search for case-sensitive "Table ") or clause number(s) (search for case-sensitive "Clause ") from ISO 32000-2:2020 (PDF 2.0) or a PDF Association Errata issue link (as GitHub URL) where the Arlington machine-readable definition is defined.
    • For dictionaries, this is normally on the first key on the Type or Subtype row depending on what is the primary differentiating definition
    • Note that this is not where an object is referenced from, but where its key and values are defined. Sometimes this is within body text prose of ISO 32000-2:2020 (so outside a Table and a Clause reference is used) or as prose within the "Description" cell of some other key in another Table. Where an object is referenced is encoded by the Arlington PDF Model "Link" field - just grep for the case-sensitive TSV file (no extension)!
  • The spreadsheet Arlington-vs-ISO32K-Tables.xlsx provides a cross reference from all mentions of "Table" within the Arlington PDF Model against the an index of every Table in ISO 32000-2:2020 as published by ISO. Tables that are not mentioned anywhere in Arlington TSV files may indicate poor coverage in the Arlington PDF Model - or that the table is inappropriate for incorporating into the Arlington PDF Model.
    • Current known limitations include no support for FDF; less-than-perfect definition for Linearization objects; and no definition of content streams.
    • Note also that Arlington does additionally reference other ISO and Adobe publications, sometimes also with specific clause and Table references (such as for Adobe Extension Level 3).
  • Python pretty-print/JSON:
    • A string or None
  • Linux CLI voodoo:
    # Find all TSV files in a data set that do not have either a Table number or Clause reference
    grep -PL "(Table )|(Clause )" *
    # A list of most (but not all!) Table numbers referenced in an Arlington TSV file set. Does not capture Annex tables.
    grep --color=none -Pho "(?<=Table) [0-9]+" * | sort -un
    # Some PDF objects are defined by prose in clauses, rather than Tables
    grep -Pho "Clause [0-9A-H\.]*" * | sort -u
    # Find all ISO publication that are explicitly referenced
    grep -Pho "ISO[^_]*$" * | sort -u

Validation of predicates (declarative functions)

First and foremost, the predicate system is not based on functional programming!

The best way to understand an expression with a predicate is to read it out aloud, from left to right. Its verbalization should relatively closely match wording found in the PDF specification. Predicate simplification is avoided so that wording (when read aloud) is kept as close as possible to wording in the PDF specification.

  • the internal Arlington grammar is loosely typed (so things need to match or be interpreted as matching the "Type" field (column 2)).
    • integers may be used in place of numbers (but not vice-versa!)
  • _parent::_ (all lowercase) is a special Arlington grammar keyword that forms the basis of a conceptual "relative" path in the PDF DOM. There can be multiple parent::s.
  • _trailer::_ (all lowercase) is a special Arlington grammar keyword that forms the basis of a conceptual "absolute" path in the PDF DOM. Arlington always starts with the trailer, so that trailer keys and values can also be used in predicates.
    • trailer::Catalog is a special Arlington alias for trailer::Root, as the Root key in the trailer is the reference to the Document Catalog, however normal PDF terminology refers to the "Document Catalog" and so that commonly understood term is preferred over the ambiguous word "root" (as that could ambiguously mean either the trailer as the root or the Document Catalog as the root) - and reading aloud "Catalog" sounds more natural.
      • Either trailer::Catalog or trailer::Root can be used, but the preference is trailer::Catalog because it verbalises better
  • null (all lowercase) is the PDF null object (Note: it is also valid predefined Arlington type).
    • null gets used in "DefaultValue" or "PossibleValue" fields only when it is explicitly mentioned in the PDF specification.
  • Key means key is present (Key is case-sensitive match and may include an Arlington path)
  • @Key means the value of key (Key is case-sensitive match and may include an Arlington path).
    • this also applies after a path - e.g. keyA::keyB::@keyC is valid and is the value of keyC when the path KeyA::keyB is traversed
  • Arlington paths are separated by :: (double COLONs)
    • e.g. parent::@Key. KeyA::KeyB, trailer::Catalog::Size, Object::&lt;0-based integer&gt;
    • the @ operator only applies to the right-most portion
    • The @ sign is always required for math and comparison operations, since those operate on values.
      • if an array or stream length is needed then use the specific predicate
    • The predefined Arlington types used with @ are the primitive types such as boolean, integer, number, string-*, name, etc.
    • It is also possible to use the key name of an array for certain predicates such as fn:Contains(...)
    • For complex types, if the "DefaultValue" for KeyA is @KeyB then it means that the "default value for Key A is the value of Key B" and so long as Keys A and B both have the same type then this is logical.
  • true and false (all lowercase) are the PDF keywords (required for explicit comparison with @key) - uppercase TRUE and FALSE never get used in predicates as they represent Arlington model values such as for "Required", "IndirectReference" or "Inheritable" fields.
  • All predicates start with fn: (case-sensitive, single COLON) followed by an uppercase character (A-'Z')
  • All predicate names are CamelCase case-sensitive with BRACKETS ( and ) and do NOT use DASH or UNDERSCOREs (i.e. must match a simple alphanumeric regex)
  • Predicates can have 0, 1 or 2 arguments that are always COMMA separated
    • Predicates need to end with () for zero arguments
    • Arguments always within (...)
    • Predicates can nest (as arguments of other Predicates)
  • Support two C/C++ style boolean operators: && (logical and), || (logical or). There is also a special fn:Not(...) predicate.
  • Support six C/C++ style comparison operators: <. <=, >, >=, ==, !=
  • NO bit-wise operators - use predicates instead
  • NO unary NOT (!) operator (use predicate fn:Not(...))
  • All expressions MUST be fully bracketed between Boolean operators (to avoid defining precedence rules)
  • NO conditional if/then, switch or loop style statements - its purely declarative!
  • NO local variables - its purely declarative!
  • Using comparison operators requires that the full expression is wrapped in fn:Eval(...)

Linux CLI voodoo

# List all predicates by names:
grep --color=always -ho "fn:[[:alnum:]]*" * | sort -u

# List all predicates and their Arguments
grep -Pho "fn:[a-zA-Z0-9]+\((?:[^)(]+|(?R))*+\)" * | sort -u

# List all predicates that take no parameters:
grep --color=always -Pho "fn:[a-zA-Z0-9]+\(\)" * | sort -u

# List all parameter lists (but not predicate names) (and a few PDF strings too!):
grep --color=always -Pho "\((?>[^()]|(?R))*\)" * | sort -u

# List all predicates with their arguments:
grep --color=always -Pho "fn:[a-zA-Z0-9]+\([^\t\]\;]*\)" * | sort -u

EBay TSV Utilities

Any Linux command that outputs a row from an Arlington TSV data file can be piped through tsv-pretty to improve readability.

# Pretty columnized output:
tsv-pretty Catalog.tsv

# Find all keys that are of "Type" 'string-byte':
tsv-filter -H --str-eq Type:string-byte *.tsv

# Only precisely 'string-byte':
tsv-filter -H --str-eq Type:string-byte --ge SinceVersion:1.5 *.tsv

# Any string type (using string-based regex):
tsv-filter -H --regex Type:string\* --ge SinceVersion:1.5 *.tsv

# "Type" includes 'string-byte':
tsv-filter -H --regex Type:.\*string-byte\* --ge SinceVersion:1.5 *.tsv

# Find all annotations which have the ExData key
grep ^ExData Annot* | tsv-pretty

Parameters to predicates

The term "reduction" is used to describe how predicates and their parameters get recursively processed from left-to-right. At any point a predicate or argument can be indeterminable, such as when a PDF does not have a key, or if the key is the wrong type, etc.

When thinking about predicates, it is important to remember that not all the parameters (arguments) to predicates will exist - thus only a portion of a predicate statement may be determinable when checking a PDF file. For example, a predicate of the form fn:Eval(fn:SomeThing(@A, fn:Not(@B==b)) is expecting that both the /A and /B keys will exist in the current object so that their values can be obtained, but this may not be required (and PDFs don't always follow requirements anyway!).

Note also that if both /A and /B are optional and both had a "DefaultValue" in TSV column 9 then this predicate would always be determinable. Further note that if a key is present but null then it is the same as not present!

bit-posn
  • bits are numbered 1-32 inclusive.
  • bit 1 is the low-order bit.
version
  • One of "1.0, "1.1", ..., "1.7", and "2.0" currently.
  • Same set as used in "SinceVersion" (column 3) and "Deprecated" (column 4).

Predicates (declarative functions)

Do not use additional whitespace!

Single SPACE characters are only required around logical operators (" && " and " || "), MINUS (" - ", to disambiguate from a negative number) and the " mod " mathematical operator.

fn:AlwaysUnencrypted()
  • Asserts that the current key or array element is a PDF string object and is always unencrypted when the PDF file itself is encrypted.
  • There are no parameters.
  • Read aloud as: "<current key> shall always be unencrypted."
fn:ArrayLength(key)
  • Asserts that key exists and is an array, and returns the array length as an integer value >= 0.
  • There is only one parameter and it must be an array.
  • Read aloud as: "... the length of array <key> shall ..."
fn:ArraySortAscending(key,integer)
  • Asserts key references something of type array, and
  • Asserts that the integer-th array elements are sorted in ascending order.
  • Requires that all integer-th array elements are numeric. Other array elements however can be anything.
  • An empty array will be considered sorted.
  • integer is 1 or greater. 1 means all array elements, 2 means every second element, 3 means every 3rd element, etc. It does not imply anything about the array length, as an empty array is considered logically sorted.
  • e.g. fn:ArraySortAscending(Index,2) tests that the array elements at indices 0, 2, 4, ... are all sorted.
  • Read aloud as (approx.): "... the <integer>-th elements in array <key> shall all be sorted in ascending order ..."
fn:BeforeVersion(version)
fn:BeforeVersion(version,statement)
  • version must be 1.1, ..., 2.0 (1.0 makes no sense!).
  • Asserts that the optional statement only applies before (i.e. strictly less than) PDF version.
  • version must also make sense in light of the "SinceVersion" and "DeprecatedIn" fields for the current row (i.e. is between them).
  • Read aloud as: "... prior to PDF version <version>, <statement> ..."
fn:BitClear(bit-posn)
  • bit-posn is 1-32 inclusive.
  • asserts that bit-posn (1-32 inclusive) is zero (clear).
  • asserts key is something of type bitmask and the value fits in 32-bits.
  • note that there is NO reference to a key or key-value. It is always assumed to apply to the current key.
  • Read aloud as: "... bit position <bit-posn> of current key shall be clear (zero) ..."
fn:BitSet(bit-posn)
  • bit-posn is 1-32 inclusive.
  • Asserts that bit-posn is one (set).
  • Asserts key is something of type bitmask and the value fits in 32-bits.
  • Note that there is NO reference to a key or key-value. It is always assumed to apply to the current key.
  • Read aloud as: "... bit position <bit-posn> of current key shall be set (one) ..."
fn:BitsClear(low-bit,high-bit)
  • low-bit and high-bit must be 1-32 inclusive.
  • Asserts that all bits between low-bit and high-bit inclusive are all zero (clear) and the value fits in 32-bits.
  • low-bit and high-bit must be different. Use fn:BitClear() for single bit assertions.
  • Note that there is NO reference to a key or key-value. It always applies to the current key which must be of type bitmask. This keeps all Arlington predicates to having 2 parameters.
  • Read aloud as: "... bit positions from <low-bit> to <high-bit> inclusive of current key shall be clear (zero) ..."
fn:BitsSet(low-bit,high-bit)
  • low-bit and high-bit must be 1-32 inclusive
  • Asserts that all bits between low-bit and high-bit inclusive are all one (set) and the value fits in 32-bits.
  • low-bit and high-bit must be different. Use fn:BitSet() for single bit assertions.
  • Note that there is NO reference to a key or key-value. It is always assumed to apply to the current key which must be of type bitmask. This keeps all Arlington predicates to having 2 parameters.
  • Read aloud as: "... bit positions from <low-bit> to <high-bit> inclusive of current key shall be set (one) ..."
fn:Contains(@key,value)
  • Asserts that key can be an array object (and thus can hold multiple values) containing value, or a PDF basic object (such as a name) that can have the value value.
  • Specific use-case are stream Filter keys which can be an array or a name, so testing this cannot just use @Filter==XXX as this will only work if Filter is a name as the @ logic returns true for an array to indicate existence.
  • Always use @array-key for array-key
  • Read aloud as: "... the value of <key> shall be <value>, or if <key> is an array it shall contain an array element equal to <value>, ..."
fn:DefaultValue(condition,value)
  • States a conditionally-based default value.
  • When condition is true, then the Default Value is specified by value.
  • Only used in "DefaultValue" field (column 8).
  • Read aloud as: "The default value of current-key shall be <value> when <condition>."
fn:Deprecated(version,statement)
  • indicates that statement was deprecated, such as a type (in "Type" field), a value (e.g. in "PossibleValues" or "SpecialCase" field ) or as a link in the "Links" field.
  • The version is inclusive of the deprecation (i.e. when the feature was first stated it was deprecated).
  • Obsolescence is different to deprecation in ISO 32000: deprecation is allowed/permitted but is strongly recommended against ("should not"). Obsolescence is a "shall not" appear in a PDF and documentation has been removed.
  • version must also make logical sense in light of the "SinceVersion" and "DeprecatedIn" fields for the current row (i.e. is between them).
  • Read aloud as: "<statement> was deprecated in PDF version <version>."
fn:Eval(expr)
  • In the "SpecialCase" field, always the outer-most predicate
  • For other fields such as "Required", "IndirectRef", can be the 2nd most outer predicate (for example, directly inside fn:IsRequired() or fn:MustBeDirect())
  • Evaluates the expression expr that may involve multiple terms with logical operators && or || .
  • The result of expr can be anything: a numeric value, true/false, a type, a statement but must be appropriate to its usage.
fn:Extension(name)
fn:Extension(name,value)
  • Used in the "SinceVersion", "Required", "PossibleValues", "SpecialCase" or "Link" fields.
  • name is an arbitrary identifier for the extension or subset and uses the same lexical conventions as for the "Key" field (e.g. no SPACEs).
  • In the "SinceVersion" field must reduce down to a valid PDF version for when the key or array element or which extension name introduced the key/array element. This may be combined with value to express a version-based introduction such as ISO subsets:
    • fn:Extension(XYZ) - under extension XYZ for any PDF version
    • fn:Extension(XYZ,1.5) - under extension XYZ but only since PDF 1.5 (inclusive)
    • fn:Eval(fn:Extension(XYZ,1.6) || 2.0) - under extension XYZ since PDF 1.6, but then became a standardized feature since PDF 2.0
  • In other fields such as "PossibleValues" or "SpecialCase" identifies that a specific value for the key or array element is only valid for the specified extension name. This may be combined with fn:SinceVersion to express a more nuanced introduction
    • e.g. fn:SinceVersion(2.0,fn:Extension(ISO_TS_12345,AESV99))
fn:FileSize()
  • Represents the length of the "PDF file" in bytes (from %PDF-x.y to last %%EOF but in reality depends on PDF SDK).
  • Will always be an integer > 0.
  • There are no parameters.
  • This may not be the same as the physical file size!
  • This is mostly used as an upper integer bound for values that represent byte offsets.
  • Read aloud as: "... length of the PDF file in bytes."
fn:FontHasLatinChars()
  • Asserts that the current font descriptor object (the PDF object that contains the row with this predicate) has Latin characters.
  • Checks that the PDF object has the entry /Type /FontDescriptor.
  • There are no parameters.
  • Read aloud as: "... the font shall contain Latin characters."
fn:HasProcessColorants(array)
  • Asserts that the given array object of PDF names contains at least one process colorant name (Cyan, Magenta, Yellow or Black).
  • Read aloud as: "... array <array> of names shall contain at least one process colorant name."
fn:HasSpotColorants(array)
  • Asserts that the given array object of PDF names contains at least one spot colorant name. A spot colorant is any name besides the CMYK colorant names.
  • Read aloud as: "... array <array> of names shall contain at least one spot colorant name."
fn:Ignore()
fn:Ignore(condition)
  • Zero or one parameters.
  • Asserts that the current row (key or array element) is to be ignored when condition evaluates to true, or ignored all the time (no parameter).
  • Only used in "SpecialCase" field (column 10).
  • Read aloud as: "current-key shall be ignored when <condition>."
fn:ImageIsStructContentItem()
  • Asserts that a PDF image object is a Tagged PDF structure content item.
  • Asserts that the PDF object has the entry /Subtype /Image.
  • There are no parameters.
  • Read aloud as: "current-key shall be in the structural parent tree." (exact quote from ISO 32000-2:2020 Table 359)
fn:ImplementationDependent()
  • Asserts that the current row (key or array element) is formally defined to be implementation dependent in the PDF specifications.
  • There are no parameters.
  • Read aloud as: "current-key is implementation dependent."
fn:InKeyMap(key)
  • key is a reference to a PDF dictionary which can have arbitrary key names.
  • Asserts that the current row (key or array element) and which must be a PDF name exists as a key in the specified map dictionary.
  • Note that this predicate is not for use with name-trees or number-trees!
  • Read aloud as: "The name current-key shall be a key in <key>."
fn:InNameTree(key)
  • key is a reference to a PDF name-tree which use PDF strings as indices. Names trees are complex PDF data structures that use strings as indices.
  • Asserts that the current row (key or array element) and which must be a PDF string exists in the specified name-tree.
  • Note that this predicate is not for use with dictionaries that support arbitrary key names or number-trees!
  • TO BE REPLACED - SEE BELOW!
fn:IsAssociatedFile()
  • Asserts that the containing object of the current row (key or array element) needs to be a PDF 2.0 Associated File object, meaning that it is referenced from an AF key or marked content sequence with tag AF.
  • There are no parameters.
  • Note that this cannot be codified via the EmbeddedFiles name-tree or it will cause false-negatives, since a PDF is in error if the Associated File is not in the EmbeddedFiles name-tree which means an outer nested predicate such as fn:IsRequired(fn:IsAssociatedFile()) will return FALSE when it should return TRUE! Note also that AFRelationship is optional.
fn:IsEncryptedWrapper()
  • Asserts that the current PDF file needs to be a PDF 2.0 Encrypted Wrapper.
  • There are no parameters.
fn:IsFieldName(value)
  • Asserts that the value is a PDF string object and that is a valid partial Field Name according to clause 12.7.4.2 of ISO 32000-2:2020.
  • There is one key-value (@key) parameter.
fn:IsHexString()
  • Asserts that the current object is a PDF string object and that was in the PDF as a hex string (`<...>`).
  • There are no parameters.
  • Read aloud as: "current-key shall be a hexadecimal string."
fn:IsLastInNumberFormatArray(key)
  • Asserts that the current row is the last array element in a number format array (normally the containing object).
  • Read aloud as: "current-key shall be the last array element in the number format array defined by <key>."
fn:IsMeaningful(condition)
  • Asserts that the current row is only "meaningful" (precise quote from ISO 32000-2:2020!) when condition is true.
  • Possibly the inverse of fn:Ignore(...)!
  • See also [Errata #6](pdf-association/pdf-issues#6)
  • Read aloud as: "current-key shall only be meaningful when <condition>."
fn:IsPDFTagged()
  • Asserts that the PDF file is a Tagged PDF.
  • This means trailer::Root::MarkInfo::Marked exists and is true.
  • There are no parameters.
fn:IsPDFVersion(version) fn:IsPDFVersion(version,statement)
  • Can have one or two parameters.
  • If no optional statement, then always TRUE for the stated PDF version version.
  • Otherwise asserts that the optional statement only applies to the stated PDF version version. This might be a type, a possible value, a new kind of linked object, etc.
  • version must also make sense in light of the "SinceVersion" and "DeprecatedIn" fields for the current row (i.e. is between them).
fn:IsPresent(key or expr)
fn:IsPresent(key,condition)
  • Can have one or two parameters.
  • For a single parameter: asserts that the current row (key or array element) must be present in a PDF if key is present, or when the expression expr is true.
  • e.g. fn:IsPresent(StructParent) or fn:IsPresent(@SMaskInData>0)
  • For two parameters: asserts that when key is present in a PDF, that condition should also be true.
  • e.g. fn:Eval(fn:IsPresent(Matte,(@Width==parent::@Width)))
fn:IsRequired(condition)
  • condition is a conditional expression that resolves to a Boolean (true/false).
  • If condition evaluates to true, then asserts that the current key is required.
  • If condition evaluates to false, then asserts that the current key is optional.
fn:KeyNameIsColorant()
  • Asserts that the current (arbitrary) key or array element is also a colorant name.
  • There are no parameters.
fn:MustBeDirect()
fn:MustBeDirect(condition)
  • Commonly used in the "IndirectRef" field.
  • If condition is true or is not specified, then asserts that the current key value must be a direct object.
  • if condition is false, then asserts that the current key value can be either direct or indirect.
fn:MustBeIndirect()
fn:MustBeIndirect(condition)
  • Only ever used in the "IndirectRef" field.
  • If condition is true or not specified, the current key value must be an indirect object.
  • If condition is false, the current key value can be direct or indirect.
fn:NoCycle()
  • Asserts that the PDF file shall not contain any cycles (loops) when using the current key or array index to key into the linked list of objects.
  • There are no parameters.
fn:Not(expr)
  • Logical inverse of the Boolean expression argument.
fn:NotStandard14Font()
  • Asserts that the current font object is not one of the Standard 14 Type 1 fonts.
  • Requires /Type /Font /Subtype /Type1 and that /BaseFont is not a Standard 14 Type 1 font name.
  • e.g. fn:IsRequired(fn:SinceVersion(2.0) || fn:NotStandard14Font()).
fn:NumberOfPages()
  • Number of pages in the PDF document (integer value).
  • For valid PDFs will always be 1 or greater.
  • There are no parameters.
  • Mostly used as an upper-bound for key values that represent a PDF page index
fn:PageContainsStructContentItems()
  • Asserts that the current PDF page contains the structure content item represented by the integer value of the current row.
  • Only used in the "Required" field, as in fn:IsRequired(fn:PageContainsStructContentItems()).
  • There are no parameters.
fn:PageProperty(page-ref,key)
  • References a property (i.e. dictionary entry) of a page, that cannot be accessed via a fixed or relative Arlington path (using ::).
  • The page is defined by the first parameter page-ref which is a value-reference (@Key) to a Page Object.
  • The page property can be a key or key-value and is defined by the second parameter.
  • e.g. fn:ArrayLength(fn:PageProperty(@P,Annots))
  • e.g. fn:Eval(@A==fn:PageProperty(@P,Annots::@NM)
fn:RectHeight(key)
  • key needs to be rectangle in Arlington predefined types.
  • Returns a number >= 0.0, representing the height of the rectangle.
  • Needs to be wrapped inside the fn:Eval(...) predicate.
fn:RectWidth(key)
  • key needs to be rectangle in Arlington predefined types.
  • Returns a number >= 0.0, representing the width of the rectangle.
  • Needs to be wrapped inside the fn:Eval(...) predicate.
fn:RequiredValue(condition,value)
  • Only used in the "PossibleValue" field to indicate if a specific value is required under a specific condition.
  • Asserts that the current row must by value when condition evaluates to true.
fn:SinceVersion(version)
fn:SinceVersion(version,statement)
  • If no optional statement, then always true for the stated PDF version version and later.
  • Otherwise asserts that the optional statement applies from the stated PDF version version (inclusive). This might be a type, a possible value, a new kind of linked object, etc.
  • version must also make sense in light of the "SinceVersion" and "DeprecatedIn" fields for the current row (i.e. is between them).
fn:StreamLength(key)
  • key needs to be a stream and returns an integer >= 0 representing the compressed stream length.
  • Uses the value of the streams' /Length key, rather than reading and decoding actual streams.
  • Needs to be wrapped inside fn:Eval(...).
  • e.g. fn:Eval(fn:StreamLength(DL)==(@Width * @Height))
fn:StringLength(key)
  • key needs to be a string object and returns an integer >= 0. Empty strings (length 0) are valid in PDF.
  • Needs to be used inside fn:Eval(...).
  • e.g. fn:Eval(fn:StringLength(Panose)==12)

Proposals for future predicates

Please review and add any feedback or comments to the appropriate issue!

fn:ValueOnlyWhen(value,condition)
  • See Issue #74
  • Only used in "PossibleValues" field
  • Asserts that a specific possible value (single value) of the current key or array element is conditionally valid (condition evaluates to true).
  • Read aloud as "value can be X but only when Y"
  • this is a logically weaker predicate than fn:RequiredValue(...) as there may be other allowable PossibleValues (either conditionally or not)
  • Example for Di key in Transition dictionary:
    [0,fn:ValueOnlyWhen(90,(@S==Wipe)),fn:ValueOnlyWhen(180,(@S==Wipe)),270,fn:ValueOnlyWhen(315,(@S==Glitter)))];[fn:RequiredValue(((@S==Fly) && (@SS!=1.0)),None)]
fn:IsArray(key)
fn:IsDictionary(key)
fn:IsStream(key)
  • See Issue #61
  • Asserts that the specified key or array element is the specified kind of PDF object (dictionary, array, stream).
  • key may be just a key name, array element (integer), or a longer relative or absolute Arlington path (using ::).
  • Read aloud as "when X is a Y then ..."
  • Example is for AS in annotations:
    fn:IsRequired(fn:IsDictionary(AP::N) || fn:IsDictionary(AP::R) || fn:IsDictionary(AP::D))
fn:IsNameTreeValue(tree-reference,key)
fn:IsNameTreeIndex(tree-reference,@key)
fn:IsNumberTreeValue(tree-reference,key)
fn:IsNumberTreeIndex(tree-reference,@key)
  • See Issue #49. This proposal will replace fn:InNameTree(...) with these predicates.
  • PDF name-/number-trees have complex internal data structures that vary per PDF file (see subclauses 7.9.6 and 7.9.7 in ISO 32000-2:2020). Arlington hides this complexity by defining both these as Arlington data types.
  • The indexing of nodes in a name-tree are always by a string. Thus the type of key in fn:IsNameTreeIndex(...) must be a string (any type).
  • The value of leaf nodes in a name-tree can be any type of PDF object, including strings. Thus the type of key in fn:IsNameTreeValue(...) can be anything.
  • The indexing of nodes in a number-tree are always by an integer. Thus the type of key in fn:IsNumberTreeIndex(...) must be an integer.
  • The value of leaf nodes in a number-tree can be any type of PDF object, including integers. Thus the type of key in fn:IsNumberTreeValue(...) can be anything.
  • tree-reference is a name- or number-tree as appropriate for the predicate. It will commonly be a reference to trailer::Catalog::Names::key.
fn:IsInArray(@key,array)
  • See PDF Errata #396.
  • Only used in "SpecialCase" field
  • Need to assert that NS objects in structure elements are always in trailer::Catalog::StructTreeRoot::Namespaces array: so add to StructElem.tsv NS row, "SpecialCase" field: [fn:Eval(fn:IsInArray(@NS,trailer::Catalog::StructTreeRoot::Namespaces))]
fn:AllowNull(key)
  • See Issue #90 and Issue #118 .
  • Only used in "SpecialCase" field
  • Within Arlington, name-tree and number-tree are treated as pre-defined types where the "Link" field is the list of permitted type(s) of objects that are to be expected as the allowable node values in the tree. However current internal grammar rules do NOT permit null as a the node in a tree so in order to codify whether null is also a permitted node value we need a new predicate that might occur in the "SpecialCase" field.
  • Validator implementations can then process name-tree and number-tree while also accounting for specific rules related to null. Normally a null in a name- or number-tree would likely trigger a warning, but this can be overriden with this new predicate.
  • Argument key must be either a name-tree or number-tree
  • e.g. Add to StructTreeRoot.tsv ParentTree row, "SpecialCase" field: [fn:Eval(fn:AllowNull(ParentTree))]

Negative predicate grammar validation checks

The following are predicate grammar validation checks that should FAIL(!!) for each set of Arlington TSV files that represent a PDF version. Some of these negative test cases can be done on a row-by-row basis, while others refer to a single TSV file, and a few may require checking other TSV files (such as when using path::key). Implicit (semantic) knowledge of valid predicates and their arguments is also required:

  • TSV with less than 2 rows (i.e. minimum TSV is header row + at least one key)
  • missing or incorrect header row
  • duplicate key names (or array indices) in same TSV
  • wrong number of fields in TSV (it's fixed!)
  • incorrect field ordering (it's fixed!)
  • if TSV filename contains "Array" or "ColorSpace" then all rows except header must be integers 0-9 or integer 0-9 + ASTERISK
  • excess use of SPACES in predicate expressions (e.g. around != or ==)
  • invalid PDF version (only 1.0, 1.1, ..., 1.7 and 2.0)
  • unknown predicates
  • number of ; in Type field does not match number of ; in non-blank DefaultValue, PossibleValues, SpecialCase or Link fields
  • unmatched/unbalanced ( / ) or [ / ] and '
  • key = ASTERISK is not last row in TSV
  • if 2nd row in a TSV is an integer or integer + ASTERISK and key integer of 2nd row is not 0 (i.e. all array indices start at 0)
  • reference to a key, @key, path::key or path::@key that is not valid in a PDF version
  • mixture of keys that are alphanumeric with integers (0-9) or integers (0-9) followed by ASTERISK
  • the list of types in a complex Type field are not alphabetically sorted or separated by SEMI-COLON
  • Link field has entries for simple types (incl. for complex types)
  • Link field has entries for linked types (incl. for complex types) but Link field is empty or just []
  • Link field not enclosed in [ / ]
  • an unscoped key reference (key or @key) does not precisely match any key in current TSV
  • a scoped key reference to trailer:: does not match any key in FileTrailer.tsv (all PDF versions) or XRefStream.tsv (for appropriate PDF versions)
  • a scoped key reference to trailer::Catalog does not match any key in Catalog.tsv
  • type of data in DefaultValue and PossibleValues fields does not match appropriate Type field
  • incorrect number of arguments for predicate
  • wrong kind of argument for predicate
  • for predicates that only work with specific types of PDF objects, the use of key or self-reference that cannot be that type (e.g. fn:ArrayLength is not referencing something that can be an array; fn:StringLength is not referencing something that can be a string;fn:BitSet, fn:BitClear, etc. only work with bitmask)
  • mathematical operation on non-numeric data or predicate
  • logical operation on non-boolean data or predicate
  • reference to a key or key value that has a SinceVersion field that is later than the current key SinceVersion and is not protected with a version-based predicate
  • a predicate with a condition argument that is always constant

Checks still needing to be completed in ISO 32000-2:2020

  • Check array length requirements of all array search hits - done up to Table 95

  • Check ranges of all integer search hits - done up to Table 170. See also: Errata #15

  • Check explicit ranges of all number search hits - not started yet

  • Check all arrays for handling of null elements - not started yet. See PDF 2.0 Errata #157