-
Notifications
You must be signed in to change notification settings - Fork 54
OCI Artifact Manifest - with weak reference support #27
Conversation
Signed-off-by: Steve Lasker <[email protected]>
Signed-off-by: Steve Lasker <[email protected]>
Signed-off-by: Steve Lasker <[email protected]>
Signed-off-by: Steve Lasker <[email protected]>
874d36c
to
c3b001e
Compare
Signed-off-by: Steve Lasker <[email protected]>
c3b001e
to
d3e24fb
Compare
Signed-off-by: Steve Lasker <[email protected]>
This first draft is looking good. I have two pieces of high level feedback advocating to keep as much simplicity in the manifest as possible.
|
Thanks, @dmcgowan,
We've played with the collections a few ways, including a single collection that contained a direction and strictness (weak/strong, hard/soft, lose/tight?). Again, please don't read too much into the names as we wanted to figure if the structure worked, and we'd figure out the proper names later, but here was an example of the single collection: {
"mediaType": "application/vnd.oci.artifact.manifest.v1+json",
"artifactType": "application/vnd.cncf.notary.v2",
"config": {
"mediaType": "application/vnd.cncf.notary.config.v2",
"digest": "sha256:b5b2b2c507a0944348e0303114d8d93aaaa081732b86451d9bce1f432a537bc7",
"size": 102
},
"references": [
{
"mediaType": "application/vnd.cncf.notary.v2.json",
"digest": "sha256:9834876dcfb05cb167a5c24953eba58c4ac89b1adf57f28f2f9d09af107ee8f0",
"size": 32654,
"direction": "child",
"strictness": "weak|strong"
}
]
} The problem with a single collection:
By using different collections to represent:
There's lots of interesting metadata scenarios. Some are known at the time the artifact is submitted. Others are later. The problem is how do we address metadata added after, as we need a way to add without changing the digest. I'm punting this to the OCI Metadata Service round of discussions.
Not all artifacts will have config, yet different artifacts may share the same config schema. In ORAS, we had to explicitly enable the scenario of defining a Since we're defining a new manifest, it seemed time to lift the
You're correct, these are both "hard" references from the artifact manifest perspective. The two main differences noted above is directionality for ref-counting, and whether the registry looks in the blob store for content or the manifest store. |
Will references in an artifact always be local to the current repository? I think Helm breaks that logic with image references, but Helm also breaks a lot of the artifact logic since the image names themselves could be templated to point to a different location by a values.yaml after pulling the chart, so it may be worth excluding Helm image references from any of the chained reference logic. I suspect having everything in the same repo is important for GC, but also useful for portability of manifests and their artifacts. |
What are the methods to lookup an artifact? Is it only with a query using a manifest sha, or can tags point to artifacts? If so, do we need to namespace a tag lookup to the type of artifact we are looking for, so that artifacts with tags don't accidentally collide with images or other artifacts in the same repo? That question comes from looking at how TUF could possibly be implemented with artifacts, and they may want e.g. a "snapshot" TUF artifact in a repo, that applies to multiple manifests, and that they can lookup and update at any time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is pretty comprehensive from the CNAB point of view. I think more information is needed at the container level. Can I make use of your examples to play around with scenarios? I've put some of the scenarios in comments on artifact-manifest.md
.
@@ -0,0 +1,462 @@ | |||
# OCI Artifact Manifest | |||
|
|||
The OCI artifact manifest provides a means to define a wide range of artifacts, including a chain of dependencies of related artifacts. It provides a means to define multiple collections of types, including blobs, dependent artifacts and referenced artifacts. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think what we want is a list of related artifacts. The SBoM of the artifact would typically have the name of the artifact and its relation to other artifacts defined in the manifest. The SPDX spec has a comprehensive list of possible relationships to pick from https://spdx.github.io/spdx-spec/7-relationships-between-SPDX-elements/. Defining "dependencies" and "provides" relationships in the SBoM takes the burden away from defining it here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My working assumption how SBoMs would be represented in a registry would be a new artifact, with a dependency to the thing artifact it's declaring it's an SBoM of. Essentially, it would be the same as a Notary v2 signature.
The SBoM artifact would have its own blobs
, optionally a config, and a dependency (manifest entry in the latest example). It would not have a tag, has the tag of an SBoM isn't really interesting as the SBoM is an "enhancement" to an existing tagged artifact in a repo.
|
||
To support artifact movement to various registry and namespace structures, the registry and path must not be embedded within the artifact definition. Client CLIs and configurations will provide default locations and mappings for where to find the referenced content. | ||
|
||
Artifacts that reference other artifacts must include an OCI Artifact Descriptor which includes the `manifest type`, `digest`, `size` and `repo:tag` of the artifact, however it will defer resolution of the reference to client tools that MAY reconstitute the references from multiple repositories and/or registries. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It sounds like we expect artifacts to be identified by repo:tag
. So if we have a container image wordpress:v5
do we expect a related SBoM to be named wordpress-sbom:v5
? Can we upload singleton blobs and then reference that in wordpress-bundle:v5
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Artifacts that are things customers want to directly reference would have a :tag
. However, I'm trying to identify enhancements (SBoM, Signatures, GPL source, ...) as things that are dependent upon another artifact, but don't really get shown on their own.
Notice the difference with this image: https://github.com/opencontainers/artifacts/raw/8760ac5e42bc7a36802559481b34e2f4e8584492/artifact-manifest/media/repo-listing-flat.svg
and this image:
The first shows the artifacts as an equal collection. The second shows a set of artifacts as attributes to the thing they're enhancing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I think about this scheme from the perspective of a user who does not want to see the enhancements until they have to, the query would be something like "do you have the signatures/SBoM/sources for mysql@sha256:digest?". By what mechanism would a client find out when "pull mysql:8" downloads an image manifest with no reference to the other enhancements? Would we need another endpoint for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll add some examples for the list APIs to get the content. But, yes, there will be a new listing API as we must support push, discover, pull to complete the experiences.
The premise is a client can ask for artifactType
of application/vnd.openssf.sbom.v1+json
to get just the SBoM documents for the mysql:8
image. You could also query by the digest of the MySQL image.
|
||
![notary v2 signature](media/notaryv2-signature.svg) | ||
|
||
The Notary v2 signature would reference an artifact, such as the `wordpress:v5` image above. Notice the directionality of the references. One or more signatures may be added to a registry after the image was persisted. While an image knows of it's layers, and a Notary v2 signature knows of its config and blob, the Notary v2 signature declares a dependency to the artifact it's signing. The visualization indicates the references through solid lines as these reference types are said to be hard references. Just as the layers of an OCI Image are deleted (*ref-counted -1*), the blobs of a signature are deleted (*ref-counted -1*) when the signature is deleted. Likewise, when an artifact is deleted, the signature would be deleted (*ref-counted -1*) as the signatures have no value without the artifact they are signing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In a situation where someone uses an image say debian:bullseye
which comes with a list of signatures, and adds dependencies for their middleware, say Go, and then pushes the image as golang-debian:1.16
, would the debian signatures still be linked? How can one verify where the non-golang dependencies came from? Do those signatures get added as well? My concern is that each layer in the container image is it's own build and release pipeline, and they can get pretty complicated (the wordpress Dockerfile is actually a good example of how complicated).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know some early versions of the tern SBoM focused on adding layers to an image. That has some challenges as you're changing the image-spec, and breaking the digest/tag of the thing you're adding.
In this design, you add the SBoM as a new artifact, and add a reference to the other manifest. See the notary.v2 signature of mysql:8
example.
This keep the artifacts separate, but referenced, enabling the current image runtimes to continue, while adding SBoM's that are also signed.
"digest": "sha256:b5b2b2c507a0944348e0303114d8d93aaaa081732b86451d9bce1f432a537bc7", | ||
"size": 102 | ||
}, | ||
"blobs": [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suppose Notary signatures were used to sign the base OS of mysql:8
, I would imagine the blob and dependency list would be appended to in building the final mysql:8
image. Do you think so? In which case, how would a client know which signature is for which image?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We haven't targeted layer signing yet.
"size": 32654 | ||
} | ||
], | ||
"dependencies": [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems to me the relationship between the artifacts needs more description at this level. I was hoping the SBoM would provide that but perhaps that's something that can be helpful to clients.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need to add more descriptive text to the examples we've been iterating upon.
|
||
```json | ||
{ | ||
"schemaVersion": 2, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Everything here and below will need to be "schemaVersion": 3
, I think, as these are breaking changes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is actually a new schema as we're introducing a new oci.artifact.manifest
that would rev independently from oci.image
and oci.index
.
Since the oci.artifact.manifest
is versioned in the mediaType=application/oci.artifact.manifest.v1
, I've removed the schemaVersion element.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, that's not how schemaVersion
works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added "schemaVersion": 1
back in as the first scheamVersion for oci.artifact.manifest
], | ||
"references": [ | ||
{ | ||
"artifact": "wordpress:5.7", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like tags in here. Not sure about repositories...
Does the repo refer to a sibling repository? Child repository? Top-level?
Can this just be under a org.opencontainers.image.ref.name
annotation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll have an update later this week that will use a mix of annotations and descriptors to cleanup the references.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The more I think about the repository thing, the more I dislike it. Repositories are a natural security boundary. You're introducing a multi-repository artifact, which presents a lot of problems in terms of authentication and management.
We removed catalog because it violated these boundaries.
Cross-repository mounting is the only thing similar to this that remains, and the authentication for that is really hard to get right. This worries me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, a helm chart, a CNAB and other artifacts are already cross repository artifact types. But, we are describing these as loose references that the client can validate independently. The blobs
and manifests that have a depends-on
annotation are hard dependencies and must be in the same repo, and must exist to complete the manifest put.
So, I think we're straddling this problem, supporting the artifact types, while not putting an undue burden on auth boundaries that don't already exist today.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would the client do with a helm cross repository artifact reference?
I ask since the values for the helm template could override the default tag or even use a different registry server, and we can't (or at least shouldn't) template the loose reference in the registry, so I feel like any helm client should ignore this field. If we use the field for signing, does a different helm values invalidate the signed chart? If we use the field for mirroring, would mirroring tools potentially copy images that go unused?
I'm having a hard time seeing the value add for including a potentially templated field into a non-templated registry object that doesn't introduce the risk of breakage.
The Artifact Manifest idea looks great! It helps to attach artifacts to images without affecting existing container images and thus keeps backward compatibility. What's more, it looks super easy to add new artifact types based on the artifact manifest. For example, a nydus artifact manifest would look like:
For those who are not familiar with The nydus artifact manifest follows the same schema for other artifact types, while a new artifact type and two media types are added:
And It has a Currently we have an nydus image @SteveLasker would it make sense to list |
Signed-off-by: Steve Lasker <[email protected]>
Signed-off-by: Steve Lasker <[email protected]>
Yes, it would be great to add the example, as it's a new example I hadn't yet thought of. But, if it fits, even better. Let me digest a bit more and align with the new-new |
"digest": "sha256:3c3a4604a545cdc127456d94e421cd355bca5b528f4a9c1905b15da2eb4a4c6b", | ||
"size": 16724, | ||
"annotations": { | ||
"oci.distribution.relationship": "depends-on" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hard dependencies affecting functions like garbage collections should not in annotations
, which is an optional section.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good point. Having annotations for client or even filtering information is super interesting. Having a registry implement garbage collection and ref-counting seems less structured and should be surfaced as a first class property.
I'll make another PR with some comparison examples.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See latest PR commit for A/B options:
@SteveLasker Thanks a lot! I tried to amend the above nydus artifact manifest example to fit the new-new {
"mediaType": "application/vnd.oci.artifact.manifest.v1+json",
"artifactType": "application/vnd.cncf.nydus.v1",
"config": {
"mediaType": "application/vnd.oci.image.manifest.v1.config.json",
"digest": "sha256:9e988712154fcc2ceda5602eb1d98c1f28299ba6fbf0be49d3717c35a2d76674",
"size": 1102
},
"blobs": [
{
"mediaType": "application/vnd.cncf.nydus.bootstrap.v1.tar+gzip",
"digest": "sha256:f6bb0822fe567c98959bb87aa316a565eb1ae059c46fa8bba65b573b4489b44d",
"size": 32654
},
{
"mediaType": "application/vnd.cncf.nydus.blob.v1",
"digest": "sha256:f6bb0822fe567c98959bb87aa316a565eb1ae059c46fa8bba65b573b4489b44d",
"size": 72832
},
{
"mediaType": "application/vnd.cncf.nydus.blob.v1",
"digest": "sha256:f6bb0822fe567c98959bb87aa316a565eb1ae059c46fa8bba65b573b4489b44d",
"size": 928324
}
],
"manifests": [
{
"mediaType": "application/vnd.oci.image.manifest.v1+json",
"digest": "sha256:8c3a4604a545cdc127456d94e421cd355bca5b528f4a9c1905b15da2eb4a4c31",
"size": 1578,
"annotations": {
"oci.distribution.relationship": "references",
"oci.distribution.artifact": "mysql:8",
"oci.distribution.artifactType": "application/vnd.oci.image.v1"
}
}
]
} To explain it in more details:
With such an artifact manifest,
|
|
||
### Helm Charts & CNAB | ||
|
||
A Helm chart can represent the images it references within the chart. These references are loose references as they may be persisted in different registries, or may change as a values file is updated. However, the chart may also be persisted together as a collection of artifacts in a registry. The lines are dotted to represent the loose reference. Deleting the `wordpress-chart:v5` may, or may not delete the images as the images have value unto themselves. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it mean the manifest of the chart in this PR only reflects the case when the images and chart are persisted as a collection and the images are NOT loosely referenced?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an area of churn, and I'm going to create two options for how we can support the two distinct scenarios.
A Notary v2 signature, an SBoM are enhancements to an existing artifact. They depend-upon
the thing they're enhancing to have meaning. When the artifact they enhance is deleted, these artifacts would also be deleted.
However, in the Helm case, a helm chart can reference
other images, which may, or may not be stored in the same registry. The combination of the digest and artifact name
are ways to identify it as a unique entity (digest) or stable/updateable tag. The registry will store these values in the manifest, but won't do anything, unless a client tells it to do something.
For instance, compared to the Notary & SBoM example above, a helm chart can be copied to another registry or deleted without impacting the images they reference.
However, the oci-reg
CLI (imagined) would be able to read the manifest and do a pull and or copy of the helm chart and its referenced images. If the client says use-digest
a registry can identify the image by its digest as it's unique, regardless of where in the registry it lives. If it says use-tag
, it would need a registry.config to assist where to find that tag.
@SteveLasker For some of the artifacts such as CNAB or Helm charts(v3), there are tools to package them as OCI artifacts and store them in OCI registries such as helm v3, so does this PR mean to introduce a break-change to the existing artifacts? |
"size": 16724 | ||
} | ||
], | ||
"manifests": [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it assume the referenced manifest is persisted in the same registry?
This breaks a very common usage pattern that the referenced images are set in values.yaml
of a chart, and the deployer of a chart will set the images when he issues helm install
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea is they can be stored in the same registry and the registry can understand the references, genericially. This enables oci-reg copy
scenarios and optional oci-reg delete --with-references
scenarios. But, you're correct that this information is duplicated in the helm chart, as this artifact.manifest
didn't (still doesn't) exist, yet. If/when this artifact.manifest
gets adopted, would a combination of this manifest reference, along with oci-reg.config enable a helm chart to evolve, where the chart references the thing in the manifest. In theory, this would make it far easier to move charts across registries without having to change the chart.
Signed-off-by: Steve Lasker <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on feedback around the types of collections, I've incorporated the following into the latest:
- Changed
dependencies
tomanifests
. This represents anoci.artifact.manifest
to have "dependencies" on a collection ofblobs
, a collection ofmanifests
, and an optionalconfig
blob. - Added A/B options for splitting out the references.
- OPTION A uses the
manifests
collection with an annotation:"oci.distribution.relationship": "depends-on"
. The pros are this is a single manifests collection. The cons are it's not a great design for a registry to read an annotation for managing garbage collection of core components. - OPTION B creates a second
references
collection of manifests. All descriptors in this collection are lose/weak references that may not be resolved in the registry. And, not subject to garbage collection. A client MAY delete-references, as noted in theoci-reg
cli example.
- OPTION A uses the
You could think of it as a new version. Let's say a new version of CNAB and/or Helm could use this new manifest, but that's a choice for these communities to make. As with anything that's already shipped, with limitations, it's always a choice for whether a change provides enough value. It's my hope the references solves many of the limitations of Artifacts v1, based on the image-manifest that makes it worth the change, and tooled in such a way it adds enough value with minimal breaking change implications. |
@SteveLasker Overall looks good. My one question/slight concern is around references: With above (unless I'm mistaken) it would be possible to form a chain of dependent manifests as large or as long as clients specify. Do we have any concerns about this proving hard to mirror? The one major benefit of the current manifest list design is that it is (relatively) flat: a tag points to a single list and the list points to a set of manifests, but it cannot go beyond that. What is the expected direction for tooling that will need to mirror the current manifests? Walk the manifests to determine the full set to replicate? |
This is not the case:
And the diagram from https://github.com/opencontainers/image-spec/blob/master/media-types.md#relations |
My own intention is to support recursive copies with a user configurable max depth on the recursion. And if the max depth is hit, send back a warning or error. I'd also want to handle the types of artifacts differently (allowing users to filter in/out). For example, they may want to mirror images and notary signatures that point to those images, but may not be interested in mirroring helm charts that point to the image (along with everything else that helm chart points to). And I'd want some directionality on the references for mirroring, e.g. helm charts may mirror child images, but child images don't mirror parent helm charts. That assumes it makes sense to link helm charts and images, which I'm still uncertain of (charts can template an image name, and artifacts don't template a reference). |
Right... I forgot that it was intended that manifest lists could reference other lists. We just never did so because they were (practically speaking) never used.
Perhaps we should formalize this a bit then? I could see a scenario where someone pushes a large-chain of manifests to a repository and then, when the repository is mirrored, it fails at that point. Unsure if it would be better to fail at push time, though. |
The tooling I'm working with is strictly client side, so a server decision to fail would be separate. I could see this being enforced by the registry similar to how user namespaces as the first part of the repository path is enforced by many registries, which is separate from the spec. I'd be interested in the http response codes when the server refuses to accept an artifact for reasons like this. |
I've taken a ton of great feedback that we need more bake-time on the I'll be closing this PR, once I revert a few things. I ask folks to please focus on #29 for feedback, as it has the core link-list of |
ac5e8fa
to
a56aaad
Compare
I think we have to first define "mirror". Is a mirror at a registry/repo level? Meaning, whatever is in a given repo is mirrored?
I worry about the formalization of a dependency count. In some cases, it makes sense, like the 256 registry/namespace character limit. But npm and other package managers have kinda dealt with this. I see this as a client configuration scenario as it just seems hard to know upfront how the dependencies are either circular and closed or endless. I suspect a registry throttling scenario would solve this, but I'd have to think more. With
This ^ sounds like something to consider as we revisit this scenario. How do all package managers like Pypi, npm, ... manage these types of scenarios? |
I'm closing this one as it's no longer the active conversation. See #29 for the more focused, iterative approach. Happy to continue the background conversation here if folks want to keep thinking about it. |
The OCI artifact manifest provides a means to define a wide range of artifacts, including a chain of dependencies of related artifacts. It provides a means to define multiple collections of types, including blobs, dependent artifacts and referenced artifacts, expanding on the work done around OCI Artifacts based on oci.image.manifest, addressing the challenges attempted with image index
This is an initial PR for discussion.