-
Notifications
You must be signed in to change notification settings - Fork 662
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal: Add References #827
Comments
Why include the size of the reference? |
Looking at this example: {
"mediaType": "application/vnd.example.signature+json",
"size": 3514,
"digest": "sha256:19387f68117dbe07daeef0d99e018f7bbf7a660158d24949ea47bc12a3e4ba17",
"reference": {
"mediaType": "application/vnd.docker.distribution.manifest.list.v2+json",
"size": 1201,
"digest": "sha256:b4f9e18267eb98998f6130342baacaeb9553f136142d40959a1b46d6401f0f2b"
}
} The outer descriptor is pointing to a signature of size (For posterity), the thing being referenced happens to be
|
Some things I'm personally unsure about for this proposal: pluralityThe current proposal adds a singular With a singular namingI like Certainly open to ideas here. |
I agree - but resisted making it plural because I couldn't think of a single concrete use-case for it. If anyone has any we can definitely make this plural. |
In addition to @jonjohnsonjr 's answer, including the size can avoid DoS attacks on services that will chase these references. A service can look at the descriptor upfront and say "that size is too big, I'm going to stop here." Or if it decides to read the referenced blob and it turns out to be bigger than it said it would be, the service can say "this blob is bigger than it said it would be, I'm going to stop here" and avoid large responses exhausting resources. |
Exactly. Here's a scenario I would imagine is prevented by this: You've been compromised such that there is a MITM between your client and the registry. Fortunately for you, you are smart and deploy everything by digest, so the attacker can't serve you arbitrary images (say, a bitcoin miner). One obvious thing the attacker can still do is just not let you pull the images, but this would likely be easily detected because your deployments would fail. Similarly, they could act as a proxy registry and just serve you images really slowly. This would be annoying, but eventually pulls would succeed or timeout. If they wanted to be even more disruptive, they could try to take down not just one service but everything on the node by flooding your disk with garbage data. If the |
A few that I can think of based around the image signing work.
Note with a plurality of references, GC should only prune the signature artifact when all of those references no longer exists, and not if just one of them is deleted, since the signing data for other images is still valid and useful. There's also been discussions on Helm artifacts, and one helm chart may include references to multiple images. I'm still not sold on this particular example because you're including a fixed reference to a potentially mutable template, but the logical grouping of one artifact pointing to multiple manifests by a single artifact applies to a larger context of use cases. |
It feels a little forced to add this to the Descriptor object, requiring an extra blob to be uploaded to create a reference. One of the advantages of the Artifact proposals is the ability to upload an artifact that is just references and perhaps some annotations, allowing the entire artifact to be pulled with a single request, rather than moving the annotation data into a separate config blob that gets pushed separately. Logically, it feels like the wrong level. You want the reference on the artifact, not on a blob or config object shipped with the artifact.
One of the controversial changes I had pushed Steve to add to the Artifact spec was the ability to inline the artifacts as part of the query response. The reason for inlining results, rather than just a list of manifests, is because the number of signatures can grow on any one image. It may be periodically resigned, and it could be signed by many entities. Those entities may be different organizations (ACME Rockets resigns the image they pulled from Wabbit Networks to use the Notary example). However they could also be done as a proof that an approval was received in a larger organization, showing that the image has passed one of several security checks from things like an image scanning tools. Most clients won't care about all of these signatures, they are looking for a single one from a known entity, that can hopefully be identified by the annotations on the signing artifact. And by inlining the results, we can turn a worst case of n x 2 pulls for signatures (where n is the number of signatures) down to a fixed 2 pulls (one for the inlined list of artifacts, one for the signature blob). If we decide not to allow inline results, then allowing a query that filters on media type and annotations could allow this to be done on the server rather than the client sorting through the results for the desired annotations, giving us a fixed 3 pulls (one for the query, one for the signature artifact, and a final one for the signature blob). |
So this would sort of turn the conceptual model from "here is a list of signatures all tied to one object" over to "here is a list of signatures tied to many objects". That could make sense if you had a set of signatures that referenced a large set of images, but the signatures all had a similar lifecycle tied to each other or the key, rather than to the images they reference, like you suggested. I'm still not convinced though - allowing this option forces people to decide up front how their signatures will be managed later. A list gives flexibility, but flexibility could create confusion. Some can be updated by themselves, some are updated by doing a "read modify update" loop on a larger "multi-signature-set" object.
I think that would be handled by downloading the Index, signing and uploading a signature for each Manifest, then signing and uploading one for the Index itself (so the overall index is signed, and each platform's manifest is signed). I'm not sure I follow the issue. Signed tags are a separate problem.
I think this is solved by the #826 proposal - the signatures can be inlined directly in the Data field of the descriptor.
I'd love both! Inlining signatures on the data field and better querying. We still don't have basic list support cross-registry, so I'd rather get something we can use first then push for better querying. Performance optimizations and the fields we'd like to query on feel premature to me still. We don't know how people will use these new fields in the real world, we can only guess.
A final note: I do like the idea of the Artifact spec. It addresses a lot of issues in the registry today. The downside, is that it addresses a lot of issues in the registry spec. It's going to take a LONG time to get accepted and supported by registries. I don't think we should have to wait for that to add improvements to the existing types. |
An important part of both this and the artifact spec is the ability to query for artifacts linked to another artifact or manifest. That's a new API that needs to be added to make this work. As soon as we take that leap, this is no longer something we can shoehorn into existing registries, so we may as well come up with the solution that makes the most logical sense. |
This proposal also includes an API for linking that works for all types. I don't think all new features will be equivalent in terms of cost to roll out. This proposal was specifically designed to minimize changes required to registries. Nothing precludes more support for filtering/querying later on. |
Wait - I think I misunderstood what you're saying about the "list attached objects" API. To make sure I understand: My proposal currently returns a list of "pointers" to attached objects, where the pointers are descriptors. You're asking for an option to instead "deference" those pointers, where the returned list would be the actual full objects themselves. Is that roughly correct? |
I'm looking to make the queries faster, so inlining the content a descriptor points to with a dereference. If #826 allows the server to build a descriptor with the content dereferenced into the data field, then that solves my issue. But it's not clear whether this is a server (on a query) or client (on a push) generated field. |
Right - I think the two are actually complementary. The only challenge I can think of here with inlining data in the list response will be clients now knowing what the data types are ahead of time. It works in the artifact manifests API because referencing objects can only be one type: the artifact type. |
If you have the descriptor, you'll have the media type and can parse the base64 encoded bytes based on that media type. Whether the content/data is unlined, I always want the descriptor with those details to both verify the decoded bytes, and to allow individual blob pulls later (e.g. head request to verify blob still exists for mirror updates). |
One may need to reference multiple descriptors from one descriptor. Two use cases I can think of is multiple SBoMs and multiple signatures establishing a chain of trust. |
I think that's fine - this linkage/field is actually the reverse direction. So each SBOM would reference one artifact via this link. Then the artifact has multiple SBOMs pointing back at it. Same for signatures. The question would be if you need one signature to reference multiple artifacts. |
I'm not following the logic to artificially constrain future expansion of this based on the use cases we see today, particularly since some of those use cases would have a value to having a list of external references rather than a single one (Helm charts, TUF snapshots, and merging the signature for multiple objects into a single signature blob). However we could likely hack this solution even more to allow multiple references, even if it's not a list, by putting a different reference on each blob in a manifest. It could even be the same blob digest repeated with different references. (Yes, that's horrible. No, I don't want to do that. But people will do this without better options.) That discussion aside, this is still appears to be at the wrong level. We're making a reference from a blob within manifest A to manifest B, rather than manifest A to manifest B. And since a manifest has multiple blobs, including multiple child manifests in an index, or a config object and layers in an image manifest, there are lots of potential ways to link blobs within manifest A to manifest B. That link needs to be conveyed to the manifest A itself, or are we suggesting that another index is upload that includes a descriptor to A, and we include the reference on that index descriptor, rather than on one of the blobs within manifest A itself? |
I'd flip this around. Concrete use-cases don't add artificial constraints, they add concrete ones. I'm happy to expand the constraints if we can come up with real future use-cases. |
Are you expecting the proposed artifact manifest to solve the problem of top-down linking? I wonder then why one would need the "bottom-up" linking in this proposal. |
I'm not sure I understand the question. This proposal is largely independent of the artifacts proposal. What's the "top-down" linking problem? The concrete use-case I need this field for is to be able to upload an image, then to later upload a signature for that image. The signature will use this field to reference the image. That's not possible today without linking in this direction AFAIK. |
This is exactly the intention of that proposal. It should complement this proposal very well.
Aren't we also doing that? See the diffs for manifest and image index.
I'm not sure I understand what you're saying here, but I'll try to take a stab at answering it, because I have my own issues with the proposal that I think are related. If a blob within manifest A has a reference to manifest B, I think the correct thing for a registry to do is include a descriptor pointing to manifest A when you ask it "what references manifest B?". I can see this being potentially confusing, though, because only part of manifest A references manifest B, and not the whole thing. While I like the ability to associate part of a manifest with something, perhaps this is enough of a footgun to just remove it from descriptor?
I don't see where that's happening -- can you quote what you're replying to? (Not because I disagree, but because the thread is long and my attention span is short.) By allowing |
You're right, I missed that part thinking it was just updating the existing digests, rather than adding a new field. Will a reference descriptor in these objects have no descriptor level media type, digest, or size, only the fields within the reference? e.g.
|
Per this proposal you want the ability to link
Or bigger circular dependencies. Rather than do that, it would be easier to follow the regular merkle DAG format that the image manifest uses. You could start from the image index:
This would mean restricting descriptors to the Image Index and Content Descriptors only. Then again, I am not sure if this is how Content Descriptors are supposed to be used. Let me know if I misunderstood. |
The example given isn't really a valid image manifest as they're used today -- are you just using the top-level
We can't have circular dependencies, which is a nice property of the DAG, but this helps me understand your objection, I think. This isn't what's being proposed. As example, imaging we have two artifacts. First, (linux/amd64) debian:
{
"schemaVersion": 2,
"mediaType": "application/vnd.docker.distribution.manifest.v2+json",
"config": {
"mediaType": "application/vnd.docker.container.image.v1+json",
"size": 1463,
"digest": "sha256:dc2eddc158255ea75b9774d29924a700e95d988bcb7612abbda29baddb291670"
},
"layers": [
{
"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
"size": 50400353,
"digest": "sha256:e22122b926a1a853d61887fa35c3fe53e05ee7dc0f2f488936dc9838bd0e230d"
}
]
} Second, something that references debian, let's use an image manifest that contains a signature: {
"schemaVersion": 2,
"config": {},
"layers": [
{
"mediaType": "application/vnd.example.signature+json",
"size": 7143,
"digest": "sha256:19387f68117dbe07daeef0d99e018f7bbf7a660158d24949ea47bc12a3e4ba17",
"reference": {
"mediaType": "application/vnd.docker.distribution.manifest.v2+json",
"size": 529,
"digest": "sha256:a4e852392000434b7c50b26dcf6a659a037521b17df69dd2ace5c2368efba38b"
}
}
]
} That layers[0].reference descriptor points to debian:
So we're saying that Because there's only one signature here and one thing being referenced, this would be more or less equivalent to: {
"schemaVersion": 2,
"config": {},
"layers": [
{
"mediaType": "application/vnd.example.signature+json",
"size": 7143,
"digest": "sha256:19387f68117dbe07daeef0d99e018f7bbf7a660158d24949ea47bc12a3e4ba17"
}
],
"reference": {
"mediaType": "application/vnd.docker.distribution.manifest.v2+json",
"size": 529,
"digest": "sha256:a4e852392000434b7c50b26dcf6a659a037521b17df69dd2ace5c2368efba38b"
}
} Either way you represent this "signature image", the idea is that you can ask the registry what references debian and get back a useful answer:
That Things can get more interesting with more complex examples, but this example is the core of the idea. I can associate one artifact with another using a novel kind of relationship (the "reference"), then ask the registry about relationships between artifacts. It's very possible that someone has attached a signature to the manifest list rather than the specific linux/amd64 image. How can we know about that? Clients can walk the graph backwards by asking about references to the manifest list...
And discover that there is in fact no signature referencing the manifest list. This is really getting into the distribution side of things, which is not as relevant to this proposal; however, I think it's useful to see how these things would tie together. |
This part is where I am confused. AIUI the image-spec says it's reserved for only image manifest compatible mediaTypes. {
"schemaVersion": 2,
"mediaType": (?)
"config": {},
"layers": [
{
"mediaType": "application/vnd.example.signature+json",
"size": 7143,
"digest": "sha256:19387f68117dbe07daeef0d99e018f7bbf7a660158d24949ea47bc12a3e4ba17",
"reference": {
"mediaType": "application/vnd.docker.distribution.manifest.v2+json",
"size": 529,
"digest": "sha256:a4e852392000434b7c50b26dcf6a659a037521b17df69dd2ace5c2368efba38b"
}
}
]
} I had asked about this a while ago when I was first learning about the spec and my understanding was that we couldn't use image manifests for other things because client tools may not handle this well (backwards compatibility thing). Perhaps this should be addressed first? I got the gist of what references can do. However, without some constraints I think you can end up with something like: {
"schemaVersion": 2,
"mediaType": "application/vnd.docker.distribution.manifest.v2+json", <-- this is an image manifest
"config": {
"mediaType": "application/vnd.docker.container.image.v1+json",
"size": 1463,
"digest": "sha256:dc2eddc158255ea75b9774d29924a700e95d988bcb7612abbda29baddb291670"
},
"layers": [
{
"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", <-- this is a layer for the image
"size": 50400353,
"digest": "sha256:e22122b926a1a853d61887fa35c3fe53e05ee7dc0f2f488936dc9838bd0e230d"
},
{
"mediaType": "application/vnd.example.signature+json", <-- this is a layer for the signature of the image manifest
"size": 7143,
"digest": "sha256:19387f68117dbe07daeef0d99e018f7bbf7a660158d24949ea47bc12a3e4ba17",
"reference": {
"mediaType": "application/vnd.docker.distribution.manifest.v2+json", <-- this is a reference back to the top level image manifest
"size": 529,
"digest": "sha256:a4e852392000434b7c50b26dcf6a659a037521b17df69dd2ace5c2368efba38b"
}
]
} |
I remember this concern being brought up, but I don't really share it, because things generally just work if you use manifests in this way. If I remember correctly, @SteveLasker doesn't want someone to be able to use docker to run something that isn't a container image. Registries don't really care about the content of the blobs (and shouldn't). This is only a concern if clients are just pulling and running arbitrary things instead of being instructed to run specific things. Software that scan images could get confused as well, but that's a bug in their implementation (IMO) because clients should consider the mediaType of the content before assuming it's a changeset. If you look at the spec for an image manifest, the language appears intentionally non-limiting to allow for future extension (emphasis mine):
It doesn't forbid other mediaTypes, but certainly my example is a departure from most implementation expectations.
You cannot do this without cracking sha256, so I don't think it's a problem. If you tried to do this, you would create a new image that references the original version of itself, but you wouldn't create any kind of cycle, you'd just have two images with the second containing a lot of redundant information. The second image would contain everything from the first image, but it would also contain a signature pointing at the first image. This is fine, and in line with the proposal, but it's not necessary. |
+1, I think everybody agrees the naming is unfortunate and it would be great to change, but that's going to be a massive, multi-year effort from all the registries :( |
@dlorenc I would certainly appreciate including @jonjohnsonjr's example of how this would work in the spec, with the clarification on "Image Manifest" applications. |
This PR doesn't mention gc, as indeed the image spec doesn't, but currently as discussed yesterday if you pushed a signature (say) that is using a Reference and you don't tag it, and the registry doesn't do anything special, then the signature would be deleted by a registry that uses gc. Hence the suggestion that Reference arrows are treated is reversed arrows for gc. If you can add a Reference to an image Index (which is allowed by this PR) then you can create both a forward and backwards link to the same manifest, its easy to construct a simple case, have the image index and the Reference both point to the same manifest while create arrows to the same object in both directions, hence a circular gc reference. (the "manifest should not be a blob" rule does rule this out from images; this applies in eg Distribution, but it is not clear if it always applies, ie whether it is implicitly required or not). Basically allowing generic forward and backwards references without restrictions will allow you to create cycles, thats why I said I had withdrawn my proposal. There are restrictions that make it ok, eg all references from any doc are all in the same direction, but these fit quite weirdly with generic object descriptions hence revisiting the best way to express these, working on some ideas. The references proposal for signatures I have been working on with Steve Lasker has restricted documents without this issue. In terms of handling gc in the presence of cycles, unfortunately it seems to mean switching from ref counting to mark and sweep, which becomes O(size of registry) which is not workable for say Docker Hub. |
Could you link to the proposal? I'm not sure which one this refers to. |
@dlorenc that is opencontainers/artifacts#29 |
I might be dense, but I don't see it. How does this create a cycle? The graph is still acyclic, unless you assume a specific garbage collection model. You can abstractly think about the reference relationship backwards (because you want there to be an index to make this cheap), but the data model is still acyclic AFAICT. This proposal does not prescribe any kind of garbage collection model. Personally, I don't really see a problem with either:
I think if we're going to start talking about GC behaviors, we need to define some things. This is completely unspecified anywhere, and registries have a ton of different behaviors. For example, GCR doesn't delete things just because they aren't tagged (I think doing so is an anti-pattern given how the protocol is designed). I am not aware of how every behavior functions, but it would help me understand the problems we're dealing with if folks could define their GC models and limitations.
Given that Docker Hub doesn't allow deletions, this seems like a moot point. |
I specifically mentioned that gc is not defined and some models that had been discussed. Docker Hub absolutely allows deletions, I don't know why you think it doesn't. Our gc model only anchors via tags, plus an "interesting" model of historic tags, I can write it up in more detail. I think most of the registries that have a lot of public content use gc heavily, while the registries that were built around private content tend not to. |
@justincormack would it make sense to try to separate out the references part of that larger proposal? Or do you think the GC stuff presents fundamental issues that can't be addressed in the existing image-spec? |
Yes, I am agreeing with you! I mean "we" as in "the community of people working on this problem". It would be nice to produce some documents or graphs explaining how registries currently behave so that we can understand the limitations under which we are working. Just referring to this problem as "GC" really undersells the complexity involved, and I think we might be talking past each other.
Perhaps something has changed in the meantime, but last I checked this was true: docker/hub-feedback#1759 (comment)
I would love this, that sounds really interesting. I've thought about trying to do something similar in the data model by having tagged images reference their predecessors via some kind of pointer (perhaps a reference descriptor). This would make tags more similar to git branches (which is how they are used in practice) and pin anything that's ever been pullable by that tag. Right now, we're operating as if git's |
Registry compatibility is certainly relevant from the perspective of "is this a major version change". For existing registries today, the registry will not be able to recognize and reject these manifests. This leads to most registries seeing these "reference" manifests as eligible to cleanup from most existing GC logic (see below) and provides no way to discover the content. This ends up breaking client expectations for how the feature would be used and in my opinion that is enough to consider the change not compatible with the existing schema version. Related to GC... Related to the feature... |
I also have a hard time understanding the implications of changes to the image spec that would affect the distribution spec. So if any of y'all are OK with a drawing session, I'm open to setting that up.
It seems to me that the expectation from the community is that the registry should delete all referenced artifacts if the image is deleted. Maybe clients can take care of keeping a list of references on disk and managing deletion rather than the registry for the time being? Or is this too much of a lift? |
I don't think they'd need to do this if we end up implementing an API that returns a list of things that reference a given artifact. Another alternative would be returning an error code that says something like "this is referenced by these other artifacts, you have to delete them first".
It seems like everyone is assuming some behavior that isn't being proposed:
Does that seem accurate? I would challenge these assumptions, as I don't think they are necessary. They introduce a lot of complexity, and references could easily fit into existing models today without having to change the registry behavior. In particular, the automatic deletion part doesn't feel like a good thing to me. I'd like to see each of these aspects discussed individually. This proposal doesn't prescribe any of those assumptions, but they could be interesting features to add to the data model or registry API separately from this references change. |
I'm curious if we really need to solve the GC problem in the spec. I think that I understand that it's theoretically possible for clients to create a cycle if you consider the "reference" relationship as a backwards reference. It's also definitely true that deleting a signature just because it isn't tagged or referenced before the manifest it references would be bad for users... However, if registries with GC operations treat the referenced object as a backwards pointer and increment the refcount for the manifest making the reference, this should enable registries to keep "artifacts related to important manifests" around while still being able to "clean up" unreferenced, dangling objects. For the GC concerns, I'd love to hear:
|
This is the model being proposed in the reference type working group <- link is the visual version of PR#29 -but is easier to read
With more details here:
We discussed weak references in PR#27, and it got pretty thrashed that weak references wasn't something we should pursue, at least not yet. I actually like the weak reference model, in addition to the hard references model, as it enables Helm Chart references to images in other repos, but I figured I'd reduce the scope and focus on the immediate needs for hard references PR#29.
Is this suggesting that adding a signature or sbom to the
The terminology does get confusing. Using names of things helps. The links above tried to address the confusion by using example names with imagery to visualize.
The key thing is thinking and asking questions, so we all get smarter about the issues. Whether this is the right place for this discussion is a larger issue, but the questions and thoughts are great.
I've been trying to treat GC similar to auth. The spec doesn't call out authentication, but it's designed with the understanding that all registries implement some level of auth, for at least push, if not pull for private registries. The thought process is we design these apis with the understanding of how they'd be used, and make sure they can be used for the identified, most common scenarios. Leaving room for registries to do added value, but above a common expectation. |
What you are proposing is, in fact, a weak reference from a signature to an image. Can you explain what you mean by hard and weak references? It seems to diverge from how these terms are used in the industry.
No, the opposite. That the existence of an image prevents the deletion of a signature. |
Again, naming is hard, and the terminology can be confusing. Early on, we tried storing a collection of references with a type (hard/weak, ..) and it was just too confusing for now. The current use cases outline this behavior:
For the client vs registry discussions for deleting, many of us implement some level of auto-purging of content. |
Precision in terminology is important, especially when producing a specification document. I don't think it's an unreasonable expectation for us to agree on a common vocabulary. That seems like a prerequisite to having a conversation about what's being proposed.
Sorry, this wasn't clear. My second point is regarding deletion by a garbage collector. I've updated my comment.
No. A signature is valuable unto itself even if what it's referencing has been deleted. I can imagine that sometimes you do want to delete them, but the entire concept of transparency logs is built on top of the idea that having a record of what someone signed is a useful security property. As an example: In the event of a compromise, you might want to delete an artifact because it contains a vulnerability and you don't want anyone to pull it again. However, those signatures might be useful evidence in an investigation of the breach. |
After reviewing this and opencontainers/artifacts#29 (the primary two "references" proposals), the underlying goal of both appears the same:
Functionally, the main differences appear to be:
Given the similarity, can we form consensus around one of them? (The first half feels like it might belong in the image-spec repository, but the second half is pretty clearly distribution-spec.) |
I think we're all still waiting for the WG proposal, which was created back in April but still hasn't been submitted for a formal vote yet: opencontainers/tob#96 |
I agree that they're pretty much the same at this point. I think there are a few more small differences though. I wrote this up a few months ago with some summaries: https://link.medium.com/RwrXtvqgAjb Note that since then the artifacts proposal moved over to ORAS, where it underwent more changes. I haven't carefully read it all yet so that post might be out of date. |
While on first view, it may appear similar however, they are much more different when you dig into the details of the implementation, support, and expectations users may have. As noted in opencontainers/artifacts#29, the Artifacts Spec has been working under https://github.com/oras-project/artifacts-spec/ with a draft release today. |
After another look, it actually looks like they're somewhat closer. The ORAS proposal has seemingly dropped the "reference type" concept that was used in the prior proposal to work around the garbage collection issue. Now that ORAS Artifacts can reference any type, I'm not actually sure how the ORAS proposal will work around the large concerns with garbage collection previously identified with the OCI image-spec one. |
@dlorenc, you might want to take the time to understand the proposal, the iterations it's taken, and the current state before making inaccurate statements. Since it makes cosign work cleanly, without the "hack" of putting digests in the tags, I'm not sure why you're continuing the FUD campaign. There are no workarounds, there's implicit support for the defined scenarios with clear expectations set for how lifecycle management would be implemented. |
I've read the proposal and much of the history and I think I understand it, but I welcome any corrections!
This isn't a productive statement. |
Thanks @tianon for driving for consensus. |
Thanks @tianon and @vbatts, For more details: Comparing the ORAS Artifact Manifest and OCI Image Manifest While there have been various opinions written, we have not had a collaborative discussion on the proposals. We have a session at the OCI Summit planned for this discussion. |
Closing this as resolved by #934. Hopefully we covered all the use cases there. |
OCI References
New March 27th 2021: The
Testing
section below now shows some validation to attempt to prove that this does not break existing clients.New March 28th 2021: The
Backwards Compatibility
section contains how I'm defining backwards-compatibility for the purposes of this proposal.Background
This document contains two high level proposals that allow for artifacts in a registry to be linked together.
They are meant to be mostly equivalent to the linking portion of the proposed OCI artifact manifest changes.
This document takes a different approach in a few areas:
Additionally, this document contains further design and discussion for the "query" portion of the API.
This should be compatible with the OCI artifact manifest changes, and can serve as inspiration or as an addition to that proposal if we decide to move forward with that change (either instead of or in addition to this one).
We start out with the proposed API and format changes, then discuss the requirements they were designed against and the CUJs they address.
Proposed Design
This proposal accompanies a pull request which contains the actual proposed changes to the specifications and types in this repository.
We propose adding a new field and a new GET API (in the distribution-spec).
Each change is described below.
Image Spec
Refer to the linked PR for the full changes.
We propose adding a new field (
reference
) to theImage Manifest
,Index
andDescriptor
types.This field will contain a
Descriptor
that points to the linked object.Here is an example:
Distribution Spec
We propose adding a new, read-only API to the Distribution Spec to query the registry for references to a given object.
The API will look like:
The response from this API will be a list of
Descriptors
that matches the existingManifest Index
specification:There will be an accompanying PR to the distribution-spec with more details on this API if we decide to move forward.
It is currently located here to keep the entire proposal in one place.
Requirements
The proposal above was designed with the following requirements in mind.
There was also an overarching constraint to make these API changes as small as possible, but no smaller:
Backwards Compatibility
Updated March 28th
There is no formal definition for backwards-compatible changes in this repo. Here's how I'm thinking about it:
The final bullet here is the most important one. If clients break because of a new field being present, this is not backwards-compatible.
I've verified the following clients so far:
Please suggest others!
Testing
New March 27th 2021
I've implemented this in a patch to distribution as a testbed to try to validate whether or not clients get broken with this. My patch is here: https://github.com/dlorenc/distribution/tree/references, and a registry image is available here: gcr.io/dlorenc-vmtest2/registry:references
You can pull and run that with:
I made a client to test this with. It's available in a patch to ggcr here: https://github.com/dlorenc/go-containerregistry/tree/references
There's a simple command line tool to create an object that refers to another object. With that code checked out, and the registry running, you can do:
Example Use Cases
Signing and verifying an image
Signing:
mediaType
for this signature. This signature must contain a protected reference to the subject.Verification:
GET
API for referencesmediaType
Attaching SBOMs to an artifact
to be filled in
More
Add more here!
Registry Implications
Registries may need to maintain a reverse index to efficiently satisfy queries for references to a given object.
Registries will need to parse and understand
reference
fields in order to support this.Registries are free to implement garbage collection of referenced objects as they see fit.
Alternatives Considered
Registry only changes
We also considered proposing a design where references are NOT included on the existing types.
It would have looked like:
Pros
This would not require changes to the image spec
Cons
The text was updated successfully, but these errors were encountered: