diff --git a/descriptor.md b/descriptor.md index 2538bb514..e735d172c 100644 --- a/descriptor.md +++ b/descriptor.md @@ -59,10 +59,10 @@ Extended _Descriptor_ field additions proposed in other OCI specifications SHOUL The _digest_ property of a Descriptor acts as a content identifier, enabling [content addressability](http://en.wikipedia.org/wiki/Content-addressable_storage). It uniquely identifies content by taking a [collision-resistant hash](https://en.wikipedia.org/wiki/Cryptographic_hash_function) of the bytes. -If the identifier can be communicated in a secure manner, one can retrieve the content from an insecure source, calculate the digest independently, and be certain that the correct content was obtained. +If the digest can be communicated in a secure manner, one can retrieve the content from an insecure source, recalculate the digest independently, and be certain that the correct content was obtained. -The value of the digest property, the _digest string_, is a serialized hash result, consisting of an _algorithm_ portion and a _hex_ portion. -The algorithm identifies the methodology used to calculate the digest; the hex portion is the lowercase hex-encoded result of the hash. +The value of the digest property is a string consisting of an _algorithm_ portion (the "algorithm identifier") and a _hex_ portion. +The algorithm identifier specifies the cryptographic hash function used to calculate the digest; the hex portion is the lowercase hex-encoded result of the hash. The digest string MUST match the following grammar: @@ -74,20 +74,17 @@ hex := /[a-f0-9]+/ Some example digest strings include the following: -digest | algorithm | +digest string | algorithm | ------------------------------------------------------------------------|---------------------| sha256:6c3c624b58dbbcd3c0dd82b4c53f04194d1247c6eebdaab7c610cf7d66709b3b | [SHA-256](#sha-256) | -* Before consuming content targeted by a descriptor from untrusted sources, the byte content SHOULD be verified against the digest. +* Before consuming content targeted by a descriptor from untrusted sources, the byte content SHOULD be verified against the digest string. * Before calculating the digest, the size of the content SHOULD be verified to reduce hash collision space. * Heavy processing before calculating a hash SHOULD be avoided. -* Implementations MAY employ some canonicalization of the underlying content to ensure stable content identifiers. +* Implementations MAY employ [canonicalization](canonicalization.md) of the underlying content to ensure stable content identifiers. -### Algorithms +### Digest calculations -While the _algorithm_ component of the digest does allow one to utilize a wide variety of algorithms, compliant implementations SHOULD use [SHA-256](#sha-256). - -Let's use a simple example in pseudo-code to demonstrate a digest calculation: A _digest_ is calculated by the following pseudo-code, where `H` is the selected hash algorithm, identified by string ``: ``` let ID(C) = Descriptor.digest @@ -97,7 +94,7 @@ let verified = ID(C) == D ``` Above, we define the content identifier as `ID(C)`, extracted from the `Descriptor.digest` field. Content `C` is a string of bytes. -Function `H` returns the hash of `C` in bytes and is passed to function `EncodeHex` to obtain the _digest_. +Function `H` returns the hash of `C` in bytes and is passed to function `EncodeHex` and prefixed with the algorithm to obtain the digest. The result `verified` is true if `ID(C)` is equal to `D`, confirming that `C` is the content identified by `D`. After verification, the following is true: @@ -107,21 +104,29 @@ D == ID(C) == ':' + EncodeHex(H(C)) The _digest_ is confirmed as the content identifier by independently calculating the _digest_. -#### Registered identifiers +### Registered algorithms + +While the _algorithm_ portion (the "algorithm identifier") of the digest string allows the use of a variety of cryptographic algorithms, compliant implementations SHOULD use [SHA-256](#sha-256). -The following algorithm identifiers are defined by this specification: +The following algorithm identifiers are currently defined by this specification: -| identifier | algorithm | -|------------|---------------------| -| `sha256` | [SHA-256](#sha-256) | +| algorithm identifier | algorithm | +|----------------------|---------------------| +| `sha256` | [SHA-256](#sha-256) | +| `sha512` | [SHA-512](#sha-512) | -If a useful algorithm is not included in the above table, it SHOULD be submitted to this specification for standardization. +If a useful algorithm is not included in the above table, it SHOULD be submitted to this specification for registration. #### SHA-256 -[SHA-256](https://tools.ietf.org/html/rfc4634#page-7) is a collision-resistant hash function, chosen for ubiquity, reasonable size and secure characteristics. +[SHA-256][rfc4634-s4.1] is a collision-resistant hash function, chosen for ubiquity, reasonable size and secure characteristics. Implementations MUST implement SHA-256 digest verification for use in descriptors. +#### SHA-512 + +[SHA-512][rfc4634-s4.2] is a collision-resistant hash function which [may be more perfomant][sha256-vs-sha512] than [SHA-256](#sha-256) on some CPUs. +Implementations MAY implement SHA-512 digest verification for use in descriptors. + ## Examples The following example describes a [_Manifest_](manifest.md#image-manifest) with a content identifier of "sha256:5b0bcabd1ed22e9fb1310cf6c2dec7cdef19f0ad69efa1f392e94a4333501270" and a size of 7682 bytes: @@ -148,6 +153,9 @@ In the following example, the descriptor indicates that the referenced manifest ``` [rfc3986]: https://tools.ietf.org/html/rfc3986 +[rfc4634-s4.1]: https://tools.ietf.org/html/rfc4634#section-4.1 +[rfc4634-s4.2]: https://tools.ietf.org/html/rfc4634#section-4.2 [rfc6838]: https://tools.ietf.org/html/rfc6838 [rfc6838-s4.2]: https://tools.ietf.org/html/rfc6838#section-4.2 [rfc7230-s2.7]: https://tools.ietf.org/html/rfc7230#section-2.7 +[sha256-vs-sha512]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/hsMw7cAwrZE