Skip to content
This repository has been archived by the owner on Jul 18, 2023. It is now read-only.

OCI Artifact Manifest - with weak reference support #27

Closed

Conversation

SteveLasker
Copy link
Contributor

The OCI artifact manifest provides a means to define a wide range of artifacts, including a chain of dependencies of related artifacts. It provides a means to define multiple collections of types, including blobs, dependent artifacts and referenced artifacts, expanding on the work done around OCI Artifacts based on oci.image.manifest, addressing the challenges attempted with image index

This is an initial PR for discussion.

Signed-off-by: Steve Lasker <[email protected]>
@dmcgowan
Copy link
Member

dmcgowan commented Jan 28, 2021

This first draft is looking good. I have two pieces of high level feedback advocating to keep as much simplicity in the manifest as possible.

  1. I see there are 3 different keys to represent lists of referenced objects. These could be categorized into two types of references, weak references and strong references. The strong references represent objects which must be kept around as long as the manifest exists. The weak references represent objects the manifest associates with, and the manifest may not need to be kept around if none of its weak references still exist. I think adding in the weak reference types (called "dependencies" now but seems name is up for debate) is a good addition to the data model and solves the mentioned use cases well. I don't see a need to have multiple different keys represent the strong references since dividing objects into one or the other could be considered metadata. Which brings me to my second point...
  2. Keep the manifest definition as simple as possible and use annotations for metadata. Each type may use metadata differently and the metadata may not be relevant for the distribution of content. The artifact type and artifact name could be defined in annotation if necessary, but normally the config media type will need to be checked by clients for artifact compatibility. Having two fields which define the same property and must agree just open up more cases to check and error out. For images, since this is a new manifest type, there is no need to treat layers (or "blobs") as special case references.

@SteveLasker
Copy link
Contributor Author

Thanks, @dmcgowan,
KISS (Keep It Silly Simple) is definitely a key goal. The manifest should provide just enough information to do what needs to be done, enabling registries to work generically over different artifacts, while providing client tools the info they need to work with their specific artifacts.

These could be categorized into two types of references, weak references and strong references.

We've played with the collections a few ways, including a single collection that contained a direction and strictness (weak/strong, hard/soft, lose/tight?).

Again, please don't read too much into the names as we wanted to figure if the structure worked, and we'd figure out the proper names later, but here was an example of the single collection:

{
  "mediaType": "application/vnd.oci.artifact.manifest.v1+json",
  "artifactType": "application/vnd.cncf.notary.v2",
  "config": {
    "mediaType": "application/vnd.cncf.notary.config.v2",
    "digest": "sha256:b5b2b2c507a0944348e0303114d8d93aaaa081732b86451d9bce1f432a537bc7",
    "size": 102
  },
  "references": [
    {
      "mediaType": "application/vnd.cncf.notary.v2.json",
      "digest": "sha256:9834876dcfb05cb167a5c24953eba58c4ac89b1adf57f28f2f9d09af107ee8f0",
      "size": 32654,
      "direction": "child",
      "strictness": "weak|strong"
    }
  ]
}

The problem with a single collection:

  • Mixes content stored as blobs (layers/blobs/config) with content stored as a manifest (links to an image)
  • MediaType could be parsed to understand which mediaTypes would be stored as manifests vs. blobs, but that requires the registry to know about all the mediaTypes. Again, something we're trying to avoid as file systems can store any file. The file.extension is optional with an optional way to register how extensions are visualized.

By using different collections to represent:

  • blobs of content: to make up the thing, stored as blobs (downward references)
  • dependencies, dependent-upon, required/s: which are reverse pointers. This avoids a property required for tracking when/if it should be deleted
  • references, weak-references, loose-references: which are the things that are good to know, enabling validation, copying and visualization, but don't get tracked for deletion. Although, a client CLI could parse these and ask the registry to delete these if that's the experience wanted.

metadata

There's lots of interesting metadata scenarios. Some are known at the time the artifact is submitted. Others are later. The problem is how do we address metadata added after, as we need a way to add without changing the digest. I'm punting this to the OCI Metadata Service round of discussions.

artifactType/mediaType

Not all artifacts will have config, yet different artifacts may share the same config schema. In ORAS, we had to explicitly enable the scenario of defining a manifest.config.mediaType, without having a config blob. But, this was mostly a concession to avoid having to rev the image-index.manifest to identify what type of artifact the schema represented.

Since we're defining a new manifest, it seemed time to lift the artifactType property to the root. This enables the manifest.config.mediaType to be decoupled from the manifest.artifactType, allowing them to rev, or even be defined independently. I could see a few different artifact types sharing the same config schema, such as different ways to represent images with different compression formats, or even the new IBM z/OS types.
If/when we get a clean new artifact.manifest, I could see all non-container image artifacts moving from using the current oci-image manifest to this artifact.manifest as they'd have more freedom to define references, and cleanup other aspects. Who know, maybe OCI Image manifest v2 might switch as well...

there is no need to treat layers (or "blobs") as special case references. [from dependencies]

You're correct, these are both "hard" references from the artifact manifest perspective. The two main differences noted above is directionality for ref-counting, and whether the registry looks in the blob store for content or the manifest store.

@sudo-bmitch
Copy link

Will references in an artifact always be local to the current repository? I think Helm breaks that logic with image references, but Helm also breaks a lot of the artifact logic since the image names themselves could be templated to point to a different location by a values.yaml after pulling the chart, so it may be worth excluding Helm image references from any of the chained reference logic. I suspect having everything in the same repo is important for GC, but also useful for portability of manifests and their artifacts.

@sudo-bmitch
Copy link

What are the methods to lookup an artifact? Is it only with a query using a manifest sha, or can tags point to artifacts? If so, do we need to namespace a tag lookup to the type of artifact we are looking for, so that artifacts with tags don't accidentally collide with images or other artifacts in the same repo? That question comes from looking at how TUF could possibly be implemented with artifacts, and they may want e.g. a "snapshot" TUF artifact in a repo, that applies to multiple manifests, and that they can lookup and update at any time.

Copy link

@nishakm nishakm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is pretty comprehensive from the CNAB point of view. I think more information is needed at the container level. Can I make use of your examples to play around with scenarios? I've put some of the scenarios in comments on artifact-manifest.md.

@@ -0,0 +1,462 @@
# OCI Artifact Manifest

The OCI artifact manifest provides a means to define a wide range of artifacts, including a chain of dependencies of related artifacts. It provides a means to define multiple collections of types, including blobs, dependent artifacts and referenced artifacts.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think what we want is a list of related artifacts. The SBoM of the artifact would typically have the name of the artifact and its relation to other artifacts defined in the manifest. The SPDX spec has a comprehensive list of possible relationships to pick from https://spdx.github.io/spdx-spec/7-relationships-between-SPDX-elements/. Defining "dependencies" and "provides" relationships in the SBoM takes the burden away from defining it here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My working assumption how SBoMs would be represented in a registry would be a new artifact, with a dependency to the thing artifact it's declaring it's an SBoM of. Essentially, it would be the same as a Notary v2 signature.
The SBoM artifact would have its own blobs, optionally a config, and a dependency (manifest entry in the latest example). It would not have a tag, has the tag of an SBoM isn't really interesting as the SBoM is an "enhancement" to an existing tagged artifact in a repo.


To support artifact movement to various registry and namespace structures, the registry and path must not be embedded within the artifact definition. Client CLIs and configurations will provide default locations and mappings for where to find the referenced content.

Artifacts that reference other artifacts must include an OCI Artifact Descriptor which includes the `manifest type`, `digest`, `size` and `repo:tag` of the artifact, however it will defer resolution of the reference to client tools that MAY reconstitute the references from multiple repositories and/or registries.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It sounds like we expect artifacts to be identified by repo:tag. So if we have a container image wordpress:v5 do we expect a related SBoM to be named wordpress-sbom:v5? Can we upload singleton blobs and then reference that in wordpress-bundle:v5?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Artifacts that are things customers want to directly reference would have a :tag. However, I'm trying to identify enhancements (SBoM, Signatures, GPL source, ...) as things that are dependent upon another artifact, but don't really get shown on their own.
Notice the difference with this image: https://github.com/opencontainers/artifacts/raw/8760ac5e42bc7a36802559481b34e2f4e8584492/artifact-manifest/media/repo-listing-flat.svg

and this image:

https://github.com/opencontainers/artifacts/raw/8760ac5e42bc7a36802559481b34e2f4e8584492/artifact-manifest/media/repo-listing-attributed.svg

The first shows the artifacts as an equal collection. The second shows a set of artifacts as attributes to the thing they're enhancing.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I think about this scheme from the perspective of a user who does not want to see the enhancements until they have to, the query would be something like "do you have the signatures/SBoM/sources for mysql@sha256:digest?". By what mechanism would a client find out when "pull mysql:8" downloads an image manifest with no reference to the other enhancements? Would we need another endpoint for this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add some examples for the list APIs to get the content. But, yes, there will be a new listing API as we must support push, discover, pull to complete the experiences.
The premise is a client can ask for artifactType of application/vnd.openssf.sbom.v1+json to get just the SBoM documents for the mysql:8 image. You could also query by the digest of the MySQL image.


![notary v2 signature](media/notaryv2-signature.svg)

The Notary v2 signature would reference an artifact, such as the `wordpress:v5` image above. Notice the directionality of the references. One or more signatures may be added to a registry after the image was persisted. While an image knows of it's layers, and a Notary v2 signature knows of its config and blob, the Notary v2 signature declares a dependency to the artifact it's signing. The visualization indicates the references through solid lines as these reference types are said to be hard references. Just as the layers of an OCI Image are deleted (*ref-counted -1*), the blobs of a signature are deleted (*ref-counted -1*) when the signature is deleted. Likewise, when an artifact is deleted, the signature would be deleted (*ref-counted -1*) as the signatures have no value without the artifact they are signing.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a situation where someone uses an image say debian:bullseye which comes with a list of signatures, and adds dependencies for their middleware, say Go, and then pushes the image as golang-debian:1.16, would the debian signatures still be linked? How can one verify where the non-golang dependencies came from? Do those signatures get added as well? My concern is that each layer in the container image is it's own build and release pipeline, and they can get pretty complicated (the wordpress Dockerfile is actually a good example of how complicated).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know some early versions of the tern SBoM focused on adding layers to an image. That has some challenges as you're changing the image-spec, and breaking the digest/tag of the thing you're adding.

In this design, you add the SBoM as a new artifact, and add a reference to the other manifest. See the notary.v2 signature of mysql:8 example.
This keep the artifacts separate, but referenced, enabling the current image runtimes to continue, while adding SBoM's that are also signed.

"digest": "sha256:b5b2b2c507a0944348e0303114d8d93aaaa081732b86451d9bce1f432a537bc7",
"size": 102
},
"blobs": [
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suppose Notary signatures were used to sign the base OS of mysql:8, I would imagine the blob and dependency list would be appended to in building the final mysql:8 image. Do you think so? In which case, how would a client know which signature is for which image?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We haven't targeted layer signing yet.

"size": 32654
}
],
"dependencies": [
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to me the relationship between the artifacts needs more description at this level. I was hoping the SBoM would provide that but perhaps that's something that can be helpful to clients.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to add more descriptive text to the examples we've been iterating upon.


```json
{
"schemaVersion": 2,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything here and below will need to be "schemaVersion": 3, I think, as these are breaking changes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually a new schema as we're introducing a new oci.artifact.manifest that would rev independently from oci.image and oci.index.

Since the oci.artifact.manifest is versioned in the mediaType=application/oci.artifact.manifest.v1, I've removed the schemaVersion element.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, that's not how schemaVersion works.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added "schemaVersion": 1 back in as the first scheamVersion for oci.artifact.manifest

],
"references": [
{
"artifact": "wordpress:5.7",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like tags in here. Not sure about repositories...

Does the repo refer to a sibling repository? Child repository? Top-level?

Can this just be under a org.opencontainers.image.ref.name annotation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll have an update later this week that will use a mix of annotations and descriptors to cleanup the references.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The more I think about the repository thing, the more I dislike it. Repositories are a natural security boundary. You're introducing a multi-repository artifact, which presents a lot of problems in terms of authentication and management.

We removed catalog because it violated these boundaries.

Cross-repository mounting is the only thing similar to this that remains, and the authentication for that is really hard to get right. This worries me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, a helm chart, a CNAB and other artifacts are already cross repository artifact types. But, we are describing these as loose references that the client can validate independently. The blobs and manifests that have a depends-on annotation are hard dependencies and must be in the same repo, and must exist to complete the manifest put.
So, I think we're straddling this problem, supporting the artifact types, while not putting an undue burden on auth boundaries that don't already exist today.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would the client do with a helm cross repository artifact reference?

I ask since the values for the helm template could override the default tag or even use a different registry server, and we can't (or at least shouldn't) template the loose reference in the registry, so I feel like any helm client should ignore this field. If we use the field for signing, does a different helm values invalidate the signed chart? If we use the field for mirroring, would mirroring tools potentially copy images that go unused?

I'm having a hard time seeing the value add for including a potentially templated field into a non-templated registry object that doesn't introduce the risk of breakage.

@bergwolf
Copy link

bergwolf commented Feb 2, 2021

The Artifact Manifest idea looks great! It helps to attach artifacts to images without affecting existing container images and thus keeps backward compatibility.

What's more, it looks super easy to add new artifact types based on the artifact manifest. For example, a nydus artifact manifest would look like:

{
  "schemaVersion": 2,
  "mediaType": "application/vnd.oci.artifact.manifest.v1+json",
  "artifactType": "application/vnd.cncf.nydus.v1",
  "config": {
    "mediaType": "application/vnd.oci.image.manifest.v1.config.json",
    "digest": "sha256:9e988712154fcc2ceda5602eb1d98c1f28299ba6fbf0be49d3717c35a2d76674",
    "size": 1102
  },
  "blobs": [
    {
      "mediaType": "application/vnd.cncf.nydus.bootstrap.v1.tar+gzip",
      "digest":
"sha256:f6bb0822fe567c98959bb87aa316a565eb1ae059c46fa8bba65b573b4489b44d",
      "size": 32654
    },
    {
      "mediaType": "application/vnd.cncf.nydus.blob.v1",
      "digest":
"sha256:f6bb0822fe567c98959bb87aa316a565eb1ae059c46fa8bba65b573b4489b44d",
      "size": 72832
    },
    {
      "mediaType": "application/vnd.cncf.nydus.blob.v1",
      "digest":
"sha256:f6bb0822fe567c98959bb87aa316a565eb1ae059c46fa8bba65b573b4489b44d",
      "size": 928324
    }
  ],
  "references": [
    {
      "artifact": "mysql:8",
      "artifactType": "application/vnd.oci.image.manifest.v1.config.json",
      "mediaType": "application/vnd.oci.image.manifest.v1.config.json",
      "digest":
"sha256:3c3a4604a545cdc127456d94e421cd355bca5b528f4a9c1905b15da2eb4a4c6b",
      "size": 16724
    }
  ]
}

For those who are not familiar with nydus, it is an image acceleration service that hugely reduced the time of pulling container image by on demand reading image contents when container starts. It is currently widely used in both Alibaba and Ant Group. nydus is open source and maintained as part of the CNCF incubator project Dragonfly. That's why I'm suggesting the same application/vnd.cncf prefix like the other artifact types.

The nydus artifact manifest follows the same schema for other artifact types, while a new artifact type and two media types are added:

  • "artifactType": "application/vnd.cncf.nydus.v1"
  • "mediaType": "application/vnd.cncf.nydus.bootstrap.v1.tar+gzip"
  • "mediaType": "application/vnd.cncf.nydus.blob.v1"

And It has a references relationship with the original mysql:8 container image. These information would help registry to index and show the relationship between different image, as well as help container runtime to choose if it wants to launch containers with image acceleration.

Currently we have an nydus image annotation/os.feature hack to hide nydus image details from registry. However, with the OCI artifact manifest, we can abandon such hack and have the registry support natively and smoothly.

@SteveLasker would it make sense to list nydus as one of the supported artifact types to show how the artifact manifest spec can help other artifact types?

@SteveLasker
Copy link
Contributor Author

would it make sense to list nydus as one of the supported artifact types to show how the artifact manifest spec can help other artifact types?

Yes, it would be great to add the example, as it's a new example I hadn't yet thought of. But, if it fits, even better. Let me digest a bit more and align with the new-new manifests collection with an annotation for references.

"digest": "sha256:3c3a4604a545cdc127456d94e421cd355bca5b528f4a9c1905b15da2eb4a4c6b",
"size": 16724,
"annotations": {
"oci.distribution.relationship": "depends-on"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hard dependencies affecting functions like garbage collections should not in annotations, which is an optional section.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good point. Having annotations for client or even filtering information is super interesting. Having a registry implement garbage collection and ref-counting seems less structured and should be surfaced as a first class property.
I'll make another PR with some comparison examples.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See latest PR commit for A/B options:

@bergwolf
Copy link

bergwolf commented Feb 3, 2021

@SteveLasker Thanks a lot! I tried to amend the above nydus artifact manifest example to fit the new-new manifests collection with an annotation for references. Please see if I understand the new schema correctly:

{
  "mediaType": "application/vnd.oci.artifact.manifest.v1+json",
  "artifactType": "application/vnd.cncf.nydus.v1",
  "config": {
    "mediaType": "application/vnd.oci.image.manifest.v1.config.json",
    "digest": "sha256:9e988712154fcc2ceda5602eb1d98c1f28299ba6fbf0be49d3717c35a2d76674",
    "size": 1102
  },
  "blobs": [
    {
      "mediaType": "application/vnd.cncf.nydus.bootstrap.v1.tar+gzip",
      "digest": "sha256:f6bb0822fe567c98959bb87aa316a565eb1ae059c46fa8bba65b573b4489b44d",
      "size": 32654
    },
    {
      "mediaType": "application/vnd.cncf.nydus.blob.v1",
      "digest": "sha256:f6bb0822fe567c98959bb87aa316a565eb1ae059c46fa8bba65b573b4489b44d",
      "size": 72832
    },
    {
      "mediaType": "application/vnd.cncf.nydus.blob.v1",
      "digest": "sha256:f6bb0822fe567c98959bb87aa316a565eb1ae059c46fa8bba65b573b4489b44d",
      "size": 928324
    }
  ],
  "manifests": [
    {
      "mediaType": "application/vnd.oci.image.manifest.v1+json",
      "digest": "sha256:8c3a4604a545cdc127456d94e421cd355bca5b528f4a9c1905b15da2eb4a4c31",
      "size": 1578,
      "annotations": {
        "oci.distribution.relationship": "references",
        "oci.distribution.artifact": "mysql:8",
        "oci.distribution.artifactType": "application/vnd.oci.image.v1"
      }
    }
  ]
}

To explain it in more details:

  • the nydus artifact manifest is identified by application/vnd.cncf.nydus.v1 artifact type;
  • each nydus image has a bootstrap layer (which is a container rootfs file system metadata pack) and one or more blob layer (which is compressed/chunked container rootfs data). We use two new media types (application/vnd.cncf.nydus.bootstrap.v1.tar+gzip and application/vnd.cncf.nydus.blob.v1) to describe them;
  • both the two nydus media type blobs can be refcounted just like other media types
  • the difference between the two new nydus media type blobs is that, container runtime only need to pull a small nydus bootstrap before starting a container, and the nydus blobs can be fetched from registry in a deferred on-demand manner;
  • it has a reference relationship with the original mysql:8 image, which can be persisted either within the same registry or in a different registry

With such an artifact manifest,

  • At the registry side, it can list/show a container image together with its nydus accelerated image.
  • At container runtime side, when given a nydus artifact manifest as a container image, it can choose to either pull the nydus image to start container quickly, or just pull the original container image if nydus components are not available.


### Helm Charts & CNAB

A Helm chart can represent the images it references within the chart. These references are loose references as they may be persisted in different registries, or may change as a values file is updated. However, the chart may also be persisted together as a collection of artifacts in a registry. The lines are dotted to represent the loose reference. Deleting the `wordpress-chart:v5` may, or may not delete the images as the images have value unto themselves.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it mean the manifest of the chart in this PR only reflects the case when the images and chart are persisted as a collection and the images are NOT loosely referenced?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an area of churn, and I'm going to create two options for how we can support the two distinct scenarios.
A Notary v2 signature, an SBoM are enhancements to an existing artifact. They depend-upon the thing they're enhancing to have meaning. When the artifact they enhance is deleted, these artifacts would also be deleted.

However, in the Helm case, a helm chart can reference other images, which may, or may not be stored in the same registry. The combination of the digest and artifact name are ways to identify it as a unique entity (digest) or stable/updateable tag. The registry will store these values in the manifest, but won't do anything, unless a client tells it to do something.

For instance, compared to the Notary & SBoM example above, a helm chart can be copied to another registry or deleted without impacting the images they reference.
However, the oci-reg CLI (imagined) would be able to read the manifest and do a pull and or copy of the helm chart and its referenced images. If the client says use-digest a registry can identify the image by its digest as it's unique, regardless of where in the registry it lives. If it says use-tag, it would need a registry.config to assist where to find that tag.

@reasonerjt
Copy link

@SteveLasker For some of the artifacts such as CNAB or Helm charts(v3), there are tools to package them as OCI artifacts and store them in OCI registries such as helm v3, so does this PR mean to introduce a break-change to the existing artifacts?

"size": 16724
}
],
"manifests": [
Copy link

@reasonerjt reasonerjt Feb 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it assume the referenced manifest is persisted in the same registry?
This breaks a very common usage pattern that the referenced images are set in values.yaml of a chart, and the deployer of a chart will set the images when he issues helm install

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is they can be stored in the same registry and the registry can understand the references, genericially. This enables oci-reg copy scenarios and optional oci-reg delete --with-references scenarios. But, you're correct that this information is duplicated in the helm chart, as this artifact.manifest didn't (still doesn't) exist, yet. If/when this artifact.manifest gets adopted, would a combination of this manifest reference, along with oci-reg.config enable a helm chart to evolve, where the chart references the thing in the manifest. In theory, this would make it far easier to move charts across registries without having to change the chart.

Copy link
Contributor Author

@SteveLasker SteveLasker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on feedback around the types of collections, I've incorporated the following into the latest:

  • Changed dependencies to manifests. This represents an oci.artifact.manifest to have "dependencies" on a collection of blobs, a collection of manifests, and an optional config blob.
  • Added A/B options for splitting out the references.
    • OPTION A uses the manifests collection with an annotation: "oci.distribution.relationship": "depends-on". The pros are this is a single manifests collection. The cons are it's not a great design for a registry to read an annotation for managing garbage collection of core components.
    • OPTION B creates a second references collection of manifests. All descriptors in this collection are lose/weak references that may not be resolved in the registry. And, not subject to garbage collection. A client MAY delete-references, as noted in the oci-reg cli example.

@SteveLasker
Copy link
Contributor Author

@SteveLasker For some of the artifacts such as CNAB or Helm charts(v3), there are tools to package them as OCI artifacts and store them in OCI registries such as helm v3, so does this PR mean to introduce a break-change to the existing artifacts?

You could think of it as a new version. Let's say a new version of CNAB and/or Helm could use this new manifest, but that's a choice for these communities to make. As with anything that's already shipped, with limitations, it's always a choice for whether a change provides enough value. It's my hope the references solves many of the limitations of Artifacts v1, based on the image-manifest that makes it worth the change, and tooled in such a way it adds enough value with minimal breaking change implications.

@josephschorr
Copy link
Contributor

@SteveLasker Overall looks good. My one question/slight concern is around references: With above (unless I'm mistaken) it would be possible to form a chain of dependent manifests as large or as long as clients specify. Do we have any concerns about this proving hard to mirror? The one major benefit of the current manifest list design is that it is (relatively) flat: a tag points to a single list and the list points to a set of manifests, but it cannot go beyond that.

What is the expected direction for tooling that will need to mirror the current manifests? Walk the manifests to determine the full set to replicate?

@jonjohnsonjr
Copy link

The one major benefit of the current manifest list design is that it is (relatively) flat: a tag points to a single list and the list points to a set of manifests, but it cannot go beyond that.

This is not the case:

From https://github.com/opencontainers/image-spec/blob/master/image-index.md#image-index-property-descriptions

This descriptor property has additional restrictions for manifests. Implementations MUST support at least the following media types:

application/vnd.oci.image.manifest.v1+json

Also, implementations SHOULD support the following media types:

application/vnd.oci.image.index.v1+json (nested index)

Image indexes concerned with portability SHOULD use one of the above media types. Future versions of the spec MAY use a different mediatype (i.e. a new versioned format). An encountered mediaType that is unknown to the implementation MUST be ignored.

And the diagram from https://github.com/opencontainers/image-spec/blob/master/media-types.md#relations

image

@sudo-bmitch
Copy link

sudo-bmitch commented Feb 7, 2021

@josephschorr

What is the expected direction for tooling that will need to mirror the current manifests? Walk the manifests to determine the full set to replicate?

My own intention is to support recursive copies with a user configurable max depth on the recursion. And if the max depth is hit, send back a warning or error.

I'd also want to handle the types of artifacts differently (allowing users to filter in/out). For example, they may want to mirror images and notary signatures that point to those images, but may not be interested in mirroring helm charts that point to the image (along with everything else that helm chart points to).

And I'd want some directionality on the references for mirroring, e.g. helm charts may mirror child images, but child images don't mirror parent helm charts.

That assumes it makes sense to link helm charts and images, which I'm still uncertain of (charts can template an image name, and artifacts don't template a reference).

@josephschorr
Copy link
Contributor

This is not the case:

Right... I forgot that it was intended that manifest lists could reference other lists. We just never did so because they were (practically speaking) never used.

My own intention is to support recursive copies with a user configurable max depth on the recursion. And if the max depth is hit, send back a warning or error.

Perhaps we should formalize this a bit then? I could see a scenario where someone pushes a large-chain of manifests to a repository and then, when the repository is mirrored, it fails at that point. Unsure if it would be better to fail at push time, though.

@sudo-bmitch
Copy link

My own intention is to support recursive copies with a user configurable max depth on the recursion. And if the max depth is hit, send back a warning or error.

Perhaps we should formalize this a bit then? I could see a scenario where someone pushes a large-chain of manifests to a repository and then, when the repository is mirrored, it fails at that point. Unsure if it would be better to fail at push time, though.

The tooling I'm working with is strictly client side, so a server decision to fail would be separate. I could see this being enforced by the registry similar to how user namespaces as the first part of the repository path is enforced by many registries, which is separate from the spec. I'd be interested in the http response codes when the server refuses to accept an artifact for reasons like this.

@SteveLasker SteveLasker changed the title OCI Artifact Manifest OCI Artifact Manifest - with weak reference support Feb 10, 2021
@SteveLasker
Copy link
Contributor Author

I've taken a ton of great feedback that we need more bake-time on the references collection and scenarios, including the registry/repo mapping conversation

I'll be closing this PR, once I revert a few things. I ask folks to please focus on #29 for feedback, as it has the core link-list of manifests required for Notary v2, SBoM and other linked artifact scenarios like IBM and Google signing solutions.

@SteveLasker
Copy link
Contributor Author

What is the expected direction for tooling that will need to mirror the current manifests? Walk the manifests to determine the full set to replicate?

I think we have to first define "mirror". Is a mirror at a registry/repo level? Meaning, whatever is in a given repo is mirrored?
Or, are we talking about gated mirrors, where the user opts-into specific content. Likely at the :tag level?
If at the repo, I suspect the client would pull all content in that repo and keep it current. New events or polling a list API, hopefully with a changedSince type parameter would work.
If at the tag, then it could walk the references and the artifacts that have the target tag referenced in the manifests collection.
Since manifests references must be in the same repo, it's less of a concern, as the repo, or the dependencies can be walked.

Perhaps we should formalize this a bit then?

I worry about the formalization of a dependency count. In some cases, it makes sense, like the 256 registry/namespace character limit. But npm and other package managers have kinda dealt with this. I see this as a client configuration scenario as it just seems hard to know upfront how the dependencies are either circular and closed or endless. I suspect a registry throttling scenario would solve this, but I'd have to think more. With references on-hold, I'm saving my brain cycles for how to best process this till later.

The tooling I'm working with is strictly client side, so a server decision to fail would be separate. I could see this being enforced by the registry similar to how user namespaces as the first part of the repository path is enforced by many registries, which is separate from the spec. I'd be interested in the http response codes when the server refuses to accept an artifact for reasons like this.

This ^ sounds like something to consider as we revisit this scenario. How do all package managers like Pypi, npm, ... manage these types of scenarios?

@SteveLasker
Copy link
Contributor Author

SteveLasker commented Feb 10, 2021

I'm closing this one as it's no longer the active conversation. See #29 for the more focused, iterative approach. Happy to continue the background conversation here if folks want to keep thinking about it.
I've reverted the changes to the latest thinking on a references collection for weak references, supporting a dependency graph.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants