Skip to content

Commit 8362112

Browse files
committed
text validation: recommend either checksum or text based on text length
1 parent 97adc27 commit 8362112

File tree

1 file changed

+13
-7
lines changed

1 file changed

+13
-7
lines changed

extensions/stam-textvalidation/README.md

+13-7
Original file line numberDiff line numberDiff line change
@@ -17,19 +17,25 @@ RFC 2119.
1717
## Vocabulary
1818

1919
This extension defines an annotation dataset with ID `https://w3id.org/stam/extensions/stam-textvalidation/`.
20-
In this set we define the following keys, the use of `checksum` over `text` is *RECOMMENDED* by this extension:
20+
In this set we define the following keys:
2121

22-
* ``checksum``: The SHA-1 checksum of the text of the annotation. We use SHA-1 because it is *fast* and *small enough* (40 bytes). It does not offer strong cryptographic security though.
23-
* ``text``: The exact text of the current annotation
22+
* ``checksum``: The SHA-1 checksum of the text of the annotation, in hexadecimal representation (lower case). We use SHA-1 because it is *fast* and *small enough* (40 bytes). It does not offer strong cryptographic security though. Use of this field is *RECOMMENDED* for texts longer than 40 bytes.
23+
* ``text``: The exact text of the current annotation. Use of this field is *RECOMMENDED* for texts shorter than 40 bytes.
2424
* ``delimiter``: The delimiter to use to concatenate text selections in case the current annotation has a complex selector. If this key is not supplied, concatenation *MUST* proceed without delimiter.
2525

2626
The advantage of `text` over `checksum` is that it is directly interpretable
27-
and facilitates readability of a serialisation. For any other purposes,
28-
the overhead quickly becomes a nuisance and a `checksum` is appropriate, the latter is therefore *RECOMMENDED*.
27+
and facilitates readability of a serialisation. For large texts,
28+
the overhead quickly becomes a nuisance and a `checksum` is appropriate.
29+
30+
Annotation data using any of the above keys *MUST* be directly associated with the annotation they are validating, i.e.
31+
there *MUST NOT* be an extra Annotation and AnnotationSelector involved.
2932

3033
## Functionality
3134

32-
Parser implementations, whenever encountering a `text` or `checksum` key in an annotation's data,
35+
Parser implementations, whenever encountering a `text` and/or `checksum` key in an annotation's data,
3336
*MUST* verify if the text of the annotation matches the `text`
34-
property or the SHA-1 checksum in the `checksum` property. If not,
37+
property and if the SHA-1 checksum in the `checksum` property matches the checksum of the text of the annotations. If not,
3538
implementations *SHOULD* raise a hard validation failure.
39+
40+
Implementations *MAY* dynamically chose use of either `text` or `checksum` based text-length, resulting in
41+
an optimal.

0 commit comments

Comments
 (0)