Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
44 changes: 26 additions & 18 deletions descriptor.md
Original file line number Diff line number Diff line change
Expand Up @@ -59,10 +59,10 @@ Extended _Descriptor_ field additions proposed in other OCI specifications SHOUL

The _digest_ property of a Descriptor acts as a content identifier, enabling [content addressability](http://en.wikipedia.org/wiki/Content-addressable_storage).
It uniquely identifies content by taking a [collision-resistant hash](https://en.wikipedia.org/wiki/Cryptographic_hash_function) of the bytes.
If the identifier can be communicated in a secure manner, one can retrieve the content from an insecure source, calculate the digest independently, and be certain that the correct content was obtained.
If the digest can be communicated in a secure manner, one can retrieve the content from an insecure source, recalculate the digest independently, and be certain that the correct content was obtained.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing to digest here is incorrect. We are trying to communicate the concept of calculating a common identifier.


The value of the digest property, the _digest string_, is a serialized hash result, consisting of an _algorithm_ portion and a _hex_ portion.
The algorithm identifies the methodology used to calculate the digest; the hex portion is the lowercase hex-encoded result of the hash.
The value of the digest property is a string consisting of an _algorithm_ portion (the "algorithm identifier") and a _hex_ portion.
The algorithm identifier specifies the cryptographic hash function used to calculate the digest; the hex portion is the lowercase hex-encoded result of the hash.

The digest string MUST match the following grammar:

Expand All @@ -74,20 +74,17 @@ hex := /[a-f0-9]+/

Some example digest strings include the following:

digest | algorithm |
digest string | algorithm |
Copy link
Contributor

@wking wking Mar 13, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With 23af834 moving us towards consistently using digest/algorithm (identifier)/hex matching our ABNF, I think we may want to stick to “digest” instead of using “digest string”. I don't mind if existing instances of “digest string” are changed to “digest” in this PR or not, but I think this line (and the “Before consuming…” line below) should be left alone in this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's fine like this in context. If you want then submit a follow up that changes those AND the preceding two references.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just referred to as a "digest". String here is redundant.

------------------------------------------------------------------------|---------------------|
sha256:6c3c624b58dbbcd3c0dd82b4c53f04194d1247c6eebdaab7c610cf7d66709b3b | [SHA-256](#sha-256) |

* Before consuming content targeted by a descriptor from untrusted sources, the byte content SHOULD be verified against the digest.
* Before consuming content targeted by a descriptor from untrusted sources, the byte content SHOULD be verified against the digest string.
* Before calculating the digest, the size of the content SHOULD be verified to reduce hash collision space.
* Heavy processing before calculating a hash SHOULD be avoided.
* Implementations MAY employ some canonicalization of the underlying content to ensure stable content identifiers.
* Implementations MAY employ [canonicalization](canonicalization.md) of the underlying content to ensure stable content identifiers.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This implies that the canonicalization is limited to those described in the document. Implementations may employ any kind of canonicalization they want in the generation of content.


### Algorithms
### Digest calculations

While the _algorithm_ component of the digest does allow one to utilize a wide variety of algorithms, compliant implementations SHOULD use [SHA-256](#sha-256).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While the algorithm component of the digest does allow one to utilize a wide variety of algorithms, compliant implementations SHOULD use those specified here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nevermind, this moved down.


Let's use a simple example in pseudo-code to demonstrate a digest calculation:
A _digest_ is calculated by the following pseudo-code, where `H` is the selected hash algorithm, identified by string `<alg>`:
```
let ID(C) = Descriptor.digest
Expand All @@ -97,7 +94,7 @@ let verified = ID(C) == D
```
Above, we define the content identifier as `ID(C)`, extracted from the `Descriptor.digest` field.
Content `C` is a string of bytes.
Function `H` returns the hash of `C` in bytes and is passed to function `EncodeHex` to obtain the _digest_.
Function `H` returns the hash of `C` in bytes and is passed to function `EncodeHex` and prefixed with the algorithm to obtain the digest.
The result `verified` is true if `ID(C)` is equal to `D`, confirming that `C` is the content identified by `D`.
After verification, the following is true:

Expand All @@ -107,21 +104,29 @@ D == ID(C) == '<alg>:' + EncodeHex(H(C))

The _digest_ is confirmed as the content identifier by independently calculating the _digest_.

#### Registered identifiers
### Registered algorithms

While the _algorithm_ portion (the "algorithm identifier") of the digest string allows the use of a variety of cryptographic algorithms, compliant implementations SHOULD use [SHA-256](#sha-256).

The following algorithm identifiers are defined by this specification:
The following algorithm identifiers are currently defined by this specification:

| identifier | algorithm |
|------------|---------------------|
| `sha256` | [SHA-256](#sha-256) |
| algorithm identifier | algorithm |
|----------------------|---------------------|
| `sha256` | [SHA-256](#sha-256) |
| `sha512` | [SHA-512](#sha-512) |

If a useful algorithm is not included in the above table, it SHOULD be submitted to this specification for standardization.
If a useful algorithm is not included in the above table, it SHOULD be submitted to this specification for registration.

#### SHA-256

[SHA-256](https://tools.ietf.org/html/rfc4634#page-7) is a collision-resistant hash function, chosen for ubiquity, reasonable size and secure characteristics.
[SHA-256][rfc4634-s4.1] is a collision-resistant hash function, chosen for ubiquity, reasonable size and secure characteristics.
Implementations MUST implement SHA-256 digest verification for use in descriptors.

#### SHA-512

[SHA-512][rfc4634-s4.2] is a collision-resistant hash function which [may be more perfomant][sha256-vs-sha512] than [SHA-256](#sha-256) on some CPUs.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This implies that performance is a good reason to select one digest algorithm over another, when that is not the case at all. The most important factors in the selection of a digest algorithm is that it is common to all implementations.

In practice, the use of sha512 will likely cause the introduction of incompatible images.

Implementations MAY implement SHA-512 digest verification for use in descriptors.

## Examples

The following example describes a [_Manifest_](manifest.md#image-manifest) with a content identifier of "sha256:5b0bcabd1ed22e9fb1310cf6c2dec7cdef19f0ad69efa1f392e94a4333501270" and a size of 7682 bytes:
Expand All @@ -148,6 +153,9 @@ In the following example, the descriptor indicates that the referenced manifest
```

[rfc3986]: https://tools.ietf.org/html/rfc3986
[rfc4634-s4.1]: https://tools.ietf.org/html/rfc4634#section-4.1
[rfc4634-s4.2]: https://tools.ietf.org/html/rfc4634#section-4.2
[rfc6838]: https://tools.ietf.org/html/rfc6838
[rfc6838-s4.2]: https://tools.ietf.org/html/rfc6838#section-4.2
[rfc7230-s2.7]: https://tools.ietf.org/html/rfc7230#section-2.7
[sha256-vs-sha512]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/hsMw7cAwrZE