-
Notifications
You must be signed in to change notification settings - Fork 206
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow for automatic content discovery for cross-mounting blobs #275
Conversation
This follows up on the discussion in #236. |
The registry MAY treat the `from` parameter as optional. | ||
|
||
It MAY cross-mount the blob referred to by the `mount` parameter even if the `from` parameter is omitted or the `from` repository does not contain the blob. | ||
|
||
Alternatively, if a registry does not support cross-repository mounting or is unable to mount the requested blob, | ||
it SHOULD return a `202`. This indicates that the upload session has begun and that the client MAY proceed with the upload. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to specify the 202
response will also include a Location
header for uploading the blob?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's unrelated to this change, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a bit of the "else" path we hit, when the blob isn't found. But yes, to your point, that path isn't being changed with this PR, so we could clarify that in a separate PR.
Okay, this has like 80% of the work required: https://github.com/sargun/distribution/tree/cross-repo-mount The configurations aren't user exposed, but the inner guts are wired up. |
I changed the proposal / spec ever so slightly, and removed the sentence that said even if the blob wasn't found in the |
How is this semantically different? |
@jonjohnsonjr If there are many repos with the same content, and the registry has the concept of "linking" inside of it, and it exposes this information in any way (which, as it turns out, the "reference" docker distribution does), it's confusing. |
I'd be surprised that mounting a blob would interact with references, but also I don't think that I would predicate this PR on its interactions with a fork of distribution. |
er, I don't mean "Reference" as in manifest references, I mean, the Docker Distribution implementation being the "reference implementation". |
As described in opencontainers/distribution-spec#275, this adds, and allows for "automatic content discovery" on mount if the from field is ommitted. This implementation does not provide and authz functionality, so it should be only be used / implemented, if the registry has no cross-repo authz. Signed-off-by: Sargun Dhillon <[email protected]>
@@ -372,6 +372,8 @@ The Location header will contain the registry URL to access the accepted layer f | |||
header returns the canonical digest of the uploaded blob which MAY differ from the provided digest. Most clients MAY | |||
ignore the value but if it is used, the client SHOULD verify the value against the uploaded blob data. | |||
|
|||
The registry MAY treat the `from` parameter as optional, and it MAY cross-mount the blob if it can be found. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we change if it can be found
to if it can be found and can be accessed
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to me, that ACL implication is included in can be found
, because ideally if it can not be accessed then it should be found. Or is there another use case you're thinking about?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although it is outside of the specification, I would suggest that we indicate that it is not a good idea to implement this feature on any registry with cross-repo ACLs due to the possibility of information disclosure. I do not think found
implies any notion of "access" in the sense of "Security"
So, I would rather say something like:
"If the registry does not implement a security model which allows for attenuation of access to reading blobs, it MAY treat the from
parameter as optional, and it MAY cross-mount the blob if it can be found."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed on adding more to wording here. While we don't define an ACL model, it is a potential pitfall with a very small change that is worth calling out. The updated language is reasonable, but I think it can be simplified some to just mentioned the read access. Like @jonjohnsonjr brought it, it could be that the registry only searches in a list of a blob that are known public and therefore able to be read by anyone.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could leverage this for MCR/ACR as well and for avoiding pushing these public well known layers multiple times.
I added two FAQ entries explaining the situation slightly. Once this gets merged, I'll add a conformance test to make sure that old registries properly handle the behaviour that when |
d81969d
to
18d9675
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realize that you're not the original author of the things you're modifying, but it would be nice to follow one sentence per line, as per https://github.com/opencontainers/distribution-spec#markdown-style
FAQ.md
Outdated
|
||
**Q: How are clients expected to adopt (and probe for) automatic content discovery support?** | ||
|
||
The process of mounting a blob is supposed to fail in such a way that if a blob cannot be cross-mounted, the registry is supposed to initiate an upload. Clients should try to use the automatic content discovery mechanism. Non-conformant |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The process of mounting a blob is supposed to fail in such a way that if a blob cannot be cross-mounted, the registry is supposed to initiate an upload. Clients should try to use the automatic content discovery mechanism. Non-conformant | |
The process of mounting a blob is supposed to fail in such a way that if a blob cannot be cross-mounted, the registry initiates an upload. Clients should try to use the automatic content discovery mechanism. Nonconformant |
FAQ.md
Outdated
**Q: How are clients expected to adopt (and probe for) automatic content discovery support?** | ||
|
||
The process of mounting a blob is supposed to fail in such a way that if a blob cannot be cross-mounted, the registry is supposed to initiate an upload. Clients should try to use the automatic content discovery mechanism. Non-conformant | ||
registries may return a non-201 or non-202 error code. If the client is trying to be defensive to non-complaint registries, and receives a non-201 or non-202 error code, it should fall back to other methods. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
registries may return a non-201 or non-202 error code. If the client is trying to be defensive to non-complaint registries, and receives a non-201 or non-202 error code, it should fall back to other methods. | |
registries may return a non-201 or non-202 error code. If the client is trying to be defensive to nonconformant registries, and receives a non-201 or non-202 error code, it should fall back to other methods. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of "fall back to other methods", maybe link to https://github.com/opencontainers/distribution-spec/blob/main/spec.md#pushing-blobs
FAQ.md
Outdated
|
||
**Q: How come `from` is required on cross-repo mount for some registries?** | ||
|
||
Mounting without having to specify `from`, also known as automatic content discovery, requires the registry to determine whether or not a blob exists in a repository. If the existence check for the blob is done first, an immediate failure will |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also known as automatic content discovery
This is kind of confusing because we've already called tag listing "Content Discovery" here: https://github.com/opencontainers/distribution-spec/blob/main/spec.md#content-discovery
And "automatic content discovery" doesn't really fit into that theme. We're also less interested in the discovery part than the "mounting" part.
I don't have a great suggestion as an alternative, but we often refer to this as "short-circuiting" an upload internally. Maybe "automatic blob mounting" or something could work, too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about
"Automatic mount origin discovery"
When uploading to multiple registries, the user may or may not what other repositories exist in these registries. Therefore, a client may perform an unnecessary upload when the registry already has a given blob. This an optimization that allows the registry to perform the authz check and check if it can find the blob with a given the passed digest in its blobstore. If that blob is accessible (from an authz perspective) to the user, it can then perform the mount automatically on its behalf. Because there is a potential a timing attack that could be used to disclose knowledge of whether or not the registry has a given blob (for example, a vulnerable version of a Linux image), this an optional feature for registries to implement. Signed-off-by: Sargun Dhillon <[email protected]>
Although the specification states that registries should start an upload on a failed mount, certain registry implementations may not properly implement this behaviour. This explicitly states that clients MAY adopt the automatic content discovery behaviour immediately, but MAY want to be defensive to non-compliant registries. Signed-off-by: Sargun Dhillon <[email protected]>
Automatic content discovery creates an information disclosure risk. There are a variety of mitigations to this risk, for example, storing knowledge of public, cached layers elsewhere, or performing the authz check to determine all of the blobs the user has access to prior to performing the mount check. Signed-off-by: Sargun Dhillon <[email protected]>
@jonjohnsonjr Updated based on your feedback |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Before merging, I would like to milestone issues into 1.1 for adding related conformance tests and client/registry verification
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🦑
In opencontainers#281, we stated that cross-repo mounts without from should start an upload. This is a requirement to make it so that clients can adopt the changes in opencontainers#275 smoothly. Signed-off-by: Sargun Dhillon <[email protected]>
In opencontainers#281, we stated that cross-repo mounts without from should start an upload. This is a requirement to make it so that clients can adopt the changes in opencontainers#275 smoothly. Signed-off-by: Sargun Dhillon <[email protected]>
As described in opencontainers/distribution-spec#275, this adds, and allows for "automatic content discovery" on mount if the from field is ommitted. This implementation does not provide and authz functionality, so it should be only be used / implemented, if the registry has no cross-repo authz.
As described in opencontainers/distribution-spec#275, this adds, and allows for "automatic content discovery" on mount if the from field is ommitted. This implementation does not provide and authz functionality, so it should be only be used / implemented, if the registry has no cross-repo authz. Signed-off-by: Sargun Dhillon <[email protected]>
As described in opencontainers/distribution-spec#275, this adds, and allows for "automatic content discovery" on mount if the from field is ommitted. This implementation does not provide and authz functionality, so it should be only be used / implemented, if the registry has no cross-repo authz. Signed-off-by: Sargun Dhillon <[email protected]>
As described in opencontainers/distribution-spec#275, this adds, and allows for "automatic content discovery" on mount if the from field is ommitted. This implementation does not provide and authz functionality, so it should be only be used / implemented, if the registry has no cross-repo authz. Signed-off-by: Sargun Dhillon <[email protected]>
As described in opencontainers/distribution-spec#275, this adds, and allows for "automatic content discovery" on mount if the from field is ommitted. This implementation does not provide and authz functionality, so it should be only be used / implemented, if the registry has no cross-repo authz. Signed-off-by: Sargun Dhillon <[email protected]>
When uploading to multiple registries, the user may or may not what
other repositories exist in these registries. Therefore, a client may
perform an unnecessary upload when the registry already has a given
blob. This an optimization that allows the registry to perform the
authz check and check if it can find the blob with a given the passed
digest in its blobstore. If that blob is accessible (from an authz
perspective) to the user, it can then perform the mount automatically
on its behalf.
Because there is a potential a timing attack that could be used to
disclose knowledge of whether or not the registry has a given blob
(for example, a vulnerable version of a Linux image), this an
optional feature for registries to implement.
Signed-off-by: Sargun Dhillon [email protected]