Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Formalize support for zstd compression: v1.1.0 ? #803

Open
thaJeztah opened this issue Apr 20, 2020 · 81 comments
Open

Formalize support for zstd compression: v1.1.0 ? #803

thaJeztah opened this issue Apr 20, 2020 · 81 comments

Comments

@thaJeztah
Copy link
Member

While reviewing moby/moby#40820, I noticed that support for zstd was merged in master (proposal: #787, implementation in #788 and #790), and some runtimes started implementing this;

However, the current (v1.0.1) image-spec does not yet list zstd as a supported compression, which means that not all runtimes may support these images, and the ones that do are relying on a non-finalized specification, which limits interoperability (something that I think this specification was created for in the first place).

I think the current status is not desirable; not only does it limit interoperability (as mentioned), it will also cause complications Golang projects using this specification as a dependency; go modules will default to the latest tagged release, and some distributions (thinking of Debian) are quite strict about the use of unreleased versions. Golang project that want to support zstd would either have to "force" go mod to use a non-released version of the specification, or work around the issue by using a custom implementation (similar to the approach that containerd took: containerd/containerd#3649).

In addition to the above, concerns were raised about the growing list of media-types (#791), and suggestions were made to make this list more flexible.

The Image Manifest Property Descriptions, currently describes:

Implementations MUST support at least the following media types:

  • application/vnd.oci.image.layer.v1.tar
  • application/vnd.oci.image.layer.v1.tar+gzip
  • application/vnd.oci.image.layer.nondistributable.v1.tar
  • application/vnd.oci.image.layer.nondistributable.v1.tar+gzip

Followed by:

...An encountered mediaType that is unknown to the implementation MUST be ignored.

This part is a bit ambiguous (perhaps that's just my interpretation of it though);

  • Should an implementation pull a manifest, and skip (ignore) layers with unknown compression, or should it produce an error?
  • If the +zstd layer mediatype is not in the MUST list, is there any reason for including it in the list of OCI Media Types? After all, any media types not included in the list "could" be supported by an implementation, and must otherwise be ignored.

What's the way forward with this?

  1. Tag current master as v1.1.0, only defining +zstd as a possible compression format for layers, but no requirement for implementations of the v1.1.0 specification to support them
  2. Add the +zstd compression format to the list of required media types, and tag v1.1.0; projects implementing v1.1.0 of the specification MUST support zstd layers, or otherwise implement v1.0.x
  3. Wait for the discussion about "generic" layer types (Taming media types #791, Support is needed for media type extensions #799) to be completed before tagging v1.1.0
  4. Do a v1.1.0 release (1. or 2.), and leave 3. for a future (v1.2.0) release of the specification.

On a side-note, I noticed that the vnd.oci.image.manifest.v1+json was registered, but other mediatypes, including media-types for image layers are not; should they be?

@thaJeztah
Copy link
Member Author

@jonjohnsonjr @vbatts @mikebrow @dmcgowan @SteveLasker ptal

(not sure if this is the right location for this discussion, or if it should be discussed in the OCI call; I just noticed this, so thought I'd write it down 😬 😅)

@vrothberg
Copy link
Contributor

Should an implementation pull a manifest, and skip (ignore) layers with unknown compression, or should it produce an error?

I had similar issues interpreting "ignore". The containers/image library errored out for a couple of weeks last year, which blew up for @tych0. Now, it allows for pulling and storing the images.

In case of a call, I will do my best to join.

@thaJeztah
Copy link
Member Author

I must admit I'm not the most proficient reader of specifications, but good to hear I'm not the only person that was a bit confused by it 😅 (which may warrant expanding that passage a bit to clarify the intent).

I guess "ignoring" will lead to an "error" in any case, because skipping "unknown media types" should likely lead to a failure to calculate the digest 🤔. Still, having some more words to explain would be useful.

@vrothberg
Copy link
Contributor

Thanks, @thaJeztah! I also felt some relief 😄

@tych0, could you elaborate a bit on your use case? I don't want to break you a second time 👼

@dmcgowan
Copy link
Member

I'm not sure (3) solves the underlying problem here. That defines a way for understanding the media type, but it doesn't necessarily mean that clients can handle all possible permutations of a media type. The main issue is that if clients start pushing up images with zstd compression, older (most existing today) clients will not be able to use them. With that in mind, making it a requirement and release 1.1 with this change at least makes that problem more explicit and the solution more clear. Any client which supports OCI image 1.1 can work with zstd, older clients might not. I am not sure the generic layer types is really a specification change as much as a tooling change, it may allow the image spec at that point to support more options. The media types supported here should always be explicit though imo.

@tych0
Copy link
Member

tych0 commented Apr 20, 2020

@tych0, could you elaborate a bit on your use case?

Sure, I'm putting squashfs files in OCI images instead of gzipped tarballs, so I can direct mount them instead of having to extract them first. The "MUST ignore" part of the standard lets me do this, because tools like skopeo happily copy around OCI images with underlying blob types they can't decode.

If we suddenly change the standard to not allow unknown blob types in images and allow tools to reject them, use cases like this will no longer be possible.

Indeed, the standard does not need to change for docker to generate valid OCI images with zstd compression. The hard work goes into the tooling on the other end, but presumably docker has already done that.

It might be worth adding a few additional known blob types to the spec here: https://github.com/opencontainers/image-spec/blob/master/media-types.md#oci-image-media-types but otherwise I don't generally understand the goals of this thread.

@thaJeztah
Copy link
Member Author

If we suddenly change the standard to not allow unknown blob types in images and allow tools to reject them, use cases like this will no longer be possible.

I think in case of Skopeo, Skopeo itself is not consuming the image, and is used as a tool to pull those images; I think that's more the "distribution spec" than the "image spec" ?

I think a runtime that does not support a specific type of layer should be able to reject that layer, and not accept "any" media-type. What use would there be for a runtime to pull an image with (say) image/jpeg as layer-type; should it pull that image and try to run it?

For such cases, I think it'd make more sense to reject the image (/layer).

@tych0
Copy link
Member

tych0 commented Apr 20, 2020

I think in case of Skopeo, Skopeo itself is not consuming the image, and is used as a tool to pull those images; I think that's more the "distribution spec" than the "image spec" ?

No; the distribution spec is for repos serving content over http. skopeo translates to/from OCI images according to the OCI images spec.

I think a runtime that does not support a specific type of layer should be able to reject that layer, and not accept "any" media-type. What use would there be for a runtime to pull an image with (say) image/jpeg as layer-type; should it pull that image and try to run it?

If someone asks you to run something you can't run, I agree an error is warranted. But in the case of skopeo, it is a tool that is perfectly capable of handling layers with mime types it doesn't understand, and I think similar tools should not error out either.

@thaJeztah
Copy link
Member Author

No; the distribution spec is for repos serving content over http. skopeo translates to/from OCI images according to the OCI images spec.

Yeah, poor choice of words; was trying to put in words that Skopeo itself is not the end-consumer of the image (hope I'm making sense).

But in the case of skopeo, it is a tool that is perfectly capable of handling layers with mime types it doesn't understand, and I think similar tools should not error out either.

The confusion in the words picked in the specs is about "mime types it doesn't understand". What makes a tool compliant with the image-spec? Should it be able to parse the manifest, or also be able to process the layers? Is curl | jq compliant?

While I understand the advantage of having some flexibility, if the spec does not dictate anything there, how can I know if an image would work with some tool implementing image-spec "X" ?

Currently it MUST ignore things it doesn't understand, which (my interpretation) says that (e.g.) any project implementing the spec MUST allow said image with an image/jpeg layer. On the other hand, it also should be able to extract an OCI Image into an OCI Runtime bundle. In your use-case, the combination of Skopeo and other tools facilitate this (Skopeo being the intermediary).

For Skopeo's case, even though the mediaType is "unknown to the implementation", Skopeo is able to "handle" / "process" the layer (within the scope it's designed for), so perhaps "unknown" should be changed to something else; e.g.implementations should / must produce an error if they're not able to "handle" / "process" a layer-type.

@tych0
Copy link
Member

tych0 commented Apr 21, 2020

e.g.implementations should / must produce an error if they're not able to "handle" / "process" a layer-type.

That seems like a reasonable clarification to me!

@cyphar
Copy link
Member

cyphar commented Apr 21, 2020

@thaJeztah

Regarding the ambiguity of the MUST clause. The intention of that sentence is to say that implementations should act as though the layer (or manifest) doesn't exist if it doesn't know how to do whatever the user has requested, and should use an alternative layer (or manifest) if possible. This is meant to avoid implementations just breaking and always giving you an error if some extension was added to an image which doesn't concern that implementation -- it must use an alternative if possible rather than giving a hard error. Otherwise any new media-types will cause endless problems.

In the example of pulling image data, arguably the tool supports pulling image data regardless of the media-type so there isn't any issue of it being "unknown [what to do with the blob] to the implementation" -- but if the image pulling is being done in order for an eventual unpacking step then you could argue that it should try to pull an alternative if it doesn't support the image type.

I agree this wording could be a bit clearer though, this change was done during the period of some of the more contentious changes to the image-spec in 2016. Given that the above was the original intention of the language, I don't think it would be a breaking change to better clarify its meaning.

On a side-note, I noticed that the vnd.oci.image.manifest.v1+json was registered, but other mediatypes, including media-types for image layers are not; should they be?

This is being worked on by @SteveLasker. The idea was to first register just one media-type so we get an idea of how the process works, and then to effectively go and register the rest.

@cyphar
Copy link
Member

cyphar commented Apr 21, 2020

Another issue with the current way of representing compression is that the ordering of multiple media type modifiers (such as compression or encryption) isn't really well-specified since MIME technically doesn't support such things. There was some discussion last year about writing a library for dealing with MIME types so that programs can easily handle such types, but I haven't seen much since then.

@SteveLasker
Copy link
Contributor

On a side-note, I noticed that the vnd.oci.image.manifest.v1+json was registered, but other mediatypes, including media-types for image layers are not; should they be?

This is being worked on by @SteveLasker. The idea was to first register just one media-type so we get an idea of how the process works, and then to effectively go and register the rest.

Ack: please assume the other mediaTypes will be registered. I'm providing clarity in the Artifacts Spec to help with both these issues. Once the Artifacts spec is merged, with clarity on the registration process, I'll register the other types.

For the compression, what I think we're saying is this:
Tools that work specifically on a type, for instance runnable images like application/vnd.oci.image.config.v1+json should know about all layer types for a specific version. In this case, v1 vs. v1.1. The spec for each artifact provides that detail so clients know what they must expect. The artifact specific spec might say compression is optional, and a fallback must be provided. But, I don' know if it's realistic to say a tool could push a new layer type without it being in the spec and be considered valid.

There are other tools, like skopeo, (I think) or ORAS which work on any artifact type pushed to a registry. In these cases, they need to know some conventions to be generic. But, in the case of ORAS, it intentionally doesn't know about a specific artifact type and simply provides auth, push, pull of layers associated with a manifest. It's the outer wrapper, like Helm or Singularity that provide specific details on layer processing.

We have an open agenda for the 4/22 call to discuss.

@thaJeztah
Copy link
Member Author

thaJeztah commented May 26, 2020

I see I forgot to reply to some of the comments

Regarding the ambiguity of the MUST clause. The intention of that sentence is to say that implementations should act as though the layer (or manifest) doesn't exist if it doesn't know how to do whatever the user has requested, and should use an alternative layer (or manifest) if possible. This is meant to avoid implementations just breaking and always giving you an error if some extension was added to an image which doesn't concern that implementation -- it must use an alternative if possible rather than giving a hard error. Otherwise any new media-types will cause endless problems.

So, I was wondering about that: I can see this "work" for a multi-manifest(ish) image, in which case there could be multiple variations of an image (currently used for multi-arch), and I can use "one" of those, but I'm having trouble understanding how this works for a single image.

What if an image has layers with mixed compression?

  • extract only those that I "understand" and try to construct a rootfs?
  • what if I understand all of those compressions? (say, the image has both zstd and gzip compressed layers);
    • should I "pick one", and "cherry-pick" all layers with the same compression?
    • should I "pick all" layers, extract them, and construct the rootfs?

I think it's technically possible to have mixed compressions. For example, in a situation where an existing image is pulled (using, e.g. gzip compressed layers), and extending the image (add a new layer) using zstd, then pushing the image.

However, the "reverse" could also make a valid use-case, to create a "fat/hybrid" image, offering alternative compressions for systems that support it ("gzip" layers for older clients, "zstd" for newer clients that support it).

Looks like this needs further refinement to describe how this should be handled.

Ack: please assume the other mediaTypes will be registered. I'm providing clarity in the Artifacts Spec to help with both these issues. Once the Artifacts spec is merged, with clarity on the registration process, I'll register the other types.

Thanks! I recall seeing a discussion (on the mailing list?) about registering, but noticed "some" were registered, but others were not, so thought I'd check 👍

@justincormack
Copy link

Yes, absolutely agree with Sebastiaan, picking some layers you understand and rejecting the rest is meaningless, and the semantics are not defined. There is no way to construct an image with zstd compression that is compatible with both older and newer clients. This only works for very limited workflows where you synchronously update all your clients and then update the images you generate, it does not work at all for people wanting to distribute public images, for example, where basically you cannot use zstd because there is no way to make an image anyone can use. A manifest list mechanism would be workable, but the current design just doesn't seem fit for purpose, and I think we should revert it.

@giuseppe
Copy link
Member

giuseppe commented Dec 9, 2020

I think the way to move forward is to add support for zstd to the different clients but still keep the gzip compression as the default.

Generating these images should not be the default yet, but the more we postpone zstd support in clients, the more it will take to switch to it.

I don't see anything wrong if an older client, in 1-2 years will fail to pull newer images.

@thaJeztah
Copy link
Member Author

thaJeztah commented Dec 9, 2020 via email

@tych0
Copy link
Member

tych0 commented Dec 9, 2020

What about just adding the clarification you already proposed above, i.e.

e.g.implementations should / must produce an error if they're not able to "handle" / "process" a layer-type.

Doesn't that define it well enough?

@thaJeztah
Copy link
Member Author

Unfortunately, it doesn't, because for runtimes that support both zstd and gzip, selection is now ambiguous.

Take the following example;

{
  "layers": [
    {
      "mediaType": "application/vnd.oci.image.layer.v1.tar+gzip",
      "size": 12345,
      "digest": "sha256:deadbeef"
    },
    {
      "mediaType": "application/vnd.oci.image.layer.v1.tar+zstd",
      "size": 34567,
      "digest": "sha256:badcafe"
    }
  ]
}

The above would be ambiguous, as it could either mean;

  1. A "fat" single layer image, providing alternative layers in zstd and gzip format (for older clients)
  2. A two-layer image, with the first layer in gzip and the second layer in zstd compression

In the above, 1. is a likely scenario for registries that want to provide "modern" compression, but provide backward compatiblity, and 2. is a likely scenario where a "modern" runtime built an image, using a parent image that is not available with zstd compression.

While it's possible to define something along the lines of "MUST" pick one compression, and only use layers with the same compression, this would paint us in a corner, and disallow use-case 2. (and future developments along the same line).

All of this would've been easier if digests were calculated over the non-compressed artifacts (and compression being part of the transport), but that ship has somewhat sailed. Perhaps it would be possible with a new media-type (application/vnd.oci.image.config.v1+json+raw), indicating that layers/blobs in the manifest are to be considered "raw" data (non-compressed, or if compressed, hash was calculated over the data as-is). In that case, clients and registries could negotiate a compression during transport (and for storage in the registry, compression/storage optimisation would be an implementation detail)

@tych0
Copy link
Member

tych0 commented Dec 9, 2020

I don't think case 1 you've provided is legal. Per https://github.com/opencontainers/image-spec/blob/master/manifest.md#image-manifest-property-descriptions we have,

"The final filesystem layout MUST match the result of applying the layers to an empty directory."

So I think the specification already states that it must be case 2.

@giuseppe
Copy link
Member

giuseppe commented Dec 9, 2020

yes, I think it should be case 2, an image made of two different layers. It would be very confusing to support case 1 this way.

@thaJeztah
Copy link
Member Author

The final filesystem layout MUST match the result of applying the layers to an empty directory

"Applying the layers" is very ambiguous combined with the other requirements (more below:)

yes, I think it should be case 2, an image made of two different layers. It would be very confusing to support case 1 this way

Which means that there's no way to have images that are compatible with both any of the existing runtimes and runtimes that support zstd.

As

Implementations MUST support at least the following media types:
...An encountered mediaType that is unknown to the implementation MUST be ignored.

Which means that any of the current runtimes MUST ignore the zstd layer, and then applying the remaining layers.

@tych0
Copy link
Member

tych0 commented Dec 9, 2020

Which means that there's no way to have images that are compatible with both any of the existing runtimes and runtimes that support zstd.

I don't think that's what it means at all. It means it won't work this specific way, but I can imagine other ways in which it would.

Which means that any of the current runtimes MUST ignore the zstd layer, and then applying the remaining layers.

That's why I think your proposed clarification is useful: runtimes who can't "process" the layer should error out when asked to. In particular, that's exactly what will happen in current implementations: they will try to gunzip the zstd blob, realize they can't, and fail.

@thaJeztah
Copy link
Member Author

but I can imagine other ways in which it would.

Can you elaborate on what other ways?

@tych0
Copy link
Member

tych0 commented Dec 9, 2020

Can you elaborate on what other ways?

Sure, but I don't think it's relevant for whether or not zstd support should be in the spec. With your proposed clarification, I think the spec would be very clear about the expected behavior when runtimes encounter blobs they don't understand (and for tools like e.g. skopeo, who can shuttle these blobs around without understanding them, which is my main concern).

We are already using non-standard mime types in layers at my organization, and because the tooling support for this is not very good, right now we just disambiguate by using a "foo-squashfs" tag for images that are squashfs-based, and a "foo" tag for the tar-based ones.

However, since tag names are really just annotations, you could imagine having an additional annotation, maybe "org.opencontainers.ref.layer_type" to go along "org.opencontainers.ref.name" that people use as tags, that would just be the layer type. Then, in a tool like skopeo, you would do something like skopeo copy oci:/source:foo oci:/dest:foo --additiona-filter=org.opencontainers.ref.layer_type=zstd (or maybe skopeo would introduce a shorthand for this). Tools could then ignore layer types their users aren't interested or they don't know how to support. If there's no manifest with the tag matching the filters that a client knows how to consume, it would fail.

To make this backwards compatible, I suspect always listing the tar-based manifest as the first one in the image would mostly work, assuming tools don't check for multiple images with the same tag and fail. But maybe it wouldn't, I haven't really played around with it. In any case, just using tags to disambiguate works totally fine, even though it's ugly and better tooling support would be appreciated.

@SteveLasker
Copy link
Contributor

Adding new compression formats to a specific type is goodness to bring that artifact forward with new capabilities. Providing consistent behavior across an ecosystem of successful deployment of multiple versions seems the problem.
Isn't this effectively what versioning provides?
While a newer client might know how to process old and new compression formats, how do we get to the point where we have some stability.?
This seems like a pivot for getting a different result, based on what capabilities the client supports.
If the client supports version 1 and 2, it should default to version 2.
If the client only supports version 1, it knows to pull version 1.
If the registry only has version 2, there's a failure state.

This is very akin to the multi-arch approach.
The client asks the registry for hello-world:latest and also states it's ARM.
The registry says, umm, I don't have an arm version of hello-world:latest, so it fails.

I'm not saying we should actually use multi-arch manifests, but the concept is what we seem to need here.

For reference, we debated this with Teleport. We didn't want to change the user model, or require image owners to publish a new format. When someone pushes content to a teleport enabled registry, we automatically convert it. When the client makes a request, it sends header information that says it supports teleport. The registry can then hand back teleport references to blobs.

So, there are two models to consider here:

  1. The end to end container format has a new compression format, and it appears to be a version change.
  2. The compression format can be handled on the server.

This is also similar to what registries do with docker and OCI manifests. They get converted on the fly. I recognize converting a small json file is far quicker than multi-gb blobs.

Ultimately, it seems like we need to incorporate the full end to end experience and be careful to not destabilize the e2e container ecosystem while we provide new enhancements and optimizations.

@thaJeztah
Copy link
Member Author

and for tools like e.g. skopeo, who can shuttle these blobs around without understanding them, which is my main concern

(IIUC) tools like skopeo should not be really affected for your specific use-case as they for that use-case are not handling the actual image, and are mainly used as a tool to do a full download of whatever artifacts/blobs are referenced (also see my earlier comments #803 (comment) and #803 (comment))

However, since tag names are really just annotations, you could imagine having an additional annotation, maybe "org.opencontainers.ref.layer_type" to go along "org.opencontainers.ref.name" that people use as tags, that would just be the layer type. Then, in a tool like skopeo, you would do something like skopeo copy oci:/source:foo oci:/dest:foo --additiona-filter=org.opencontainers.ref.layer_type=zstd

I feel like this is now replicating what manifest-lists were for (a list of alternatives to pick from); manifest lists currently allow differentiating on architecture, and don't have a dimension for "compression type". Adding that would be an option, but (for distribution/registry) may mean an extra roundtrip (image/tag -> os/architecture variant -> layer-compression variant), or add a new dimension besides "platform".

Which looks to be what @SteveLasker is describing as well;

I'm not saying we should actually use multi-arch manifests, but the concept is what we seem to need here.

Regarding;

This is also similar to what registries do with docker and OCI manifests. They get converted on the fly. I recognize converting a small json file is far quicker than multi-gb blobs.

Docker manifests are OCI manifests; I think the only conversion currently still present is for old (Schema 2 v1) manifest (related discussion on that in opencontainers/distribution-spec#212), and is being discussed to deprecate / disable (docker/roadmap#173)

I'd be hesitant to start extracting and re-compressing artifacts. This would break the contract of content addressability, or more specific: what guarantee do I have that the re-compressed artifact has the same content as the artifact that was pushed?. If we want to separate compression from artifacts, then #803 (comment) is probably a better alternative;

All of this would've been easier if digests were calculated over the non-compressed artifacts (and compression being part of the transport)

@jonjohnsonjr
Copy link
Contributor

FYI #880 may be interesting to folks.

@almson
Copy link

almson commented May 31, 2022

Please add zstd. Pulling images is so slow because of decompression.

@zvonkok
Copy link

zvonkok commented Jul 3, 2024

/cc @zvonkok

@sudo-bmitch
Copy link
Contributor

For those that aren't supporting zstd today, what is preventing it? Of the tools that don't support it yet and what is blocking the PR to add support?

For those that require this feature, do we have metrics showing that decompression is a slower step than the network speed to download the blob? Without that, then decompressing during the download (instead of waiting for the download to complete) would provide a performance improvement without zstd.

@giuseppe
Copy link
Member

giuseppe commented Jul 3, 2024

For those that require this feature, do we have metrics showing that decompression is a slower step than the network speed to download the blob? Without that, then decompressing during the download (instead of waiting for the download to complete) would provide a performance improvement without zstd.

For Podman/CRI-O, one zstd feature we are interested in using is "skippable frames" so we can embed some metadata in the compressed stream.

In addition to the faster decompression, zstd also requires less CPU time.

On my machine, using pigz for comparison as it is much faster than GNU gzip, I get:

$ dd if=/dev/urandom bs=1G count=1 | pigz - > /tmp/blob.gz
$ dd if=/dev/urandom bs=1G count=1 | zstd - > /tmp/blob.zstd

$ \time -v pigz -d < /tmp/blob.gz > /dev/null
	Command being timed: "pigz -d"
	User time (seconds): 0.27
	System time (seconds): 0.44
	Percent of CPU this job got: 200%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.35

$ \time -v zstd -d < /tmp/blob.zstd > /dev/null
	Command being timed: "zstd -d"
	User time (seconds): 0.14
	System time (seconds): 0.15
	Percent of CPU this job got: 206%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.14

@sudo-bmitch
Copy link
Contributor

For Podman/CRI-O, one zstd feature we are interested in using is "skippable frames" so we can embed some metadata in the compressed stream.

What is preventing Podman/CRI-O from supporting these features today? Is there a PR blocked because of OCI?

In addition to the faster decompression, zstd also requires less CPU time.

Is this from content that was pulled from a registry, or content stored locally in a compressed state?

@giuseppe
Copy link
Member

giuseppe commented Jul 3, 2024

For Podman/CRI-O, one zstd feature we are interested in using is "skippable frames" so we can embed some metadata in the compressed stream.

What is preventing Podman/CRI-O from supporting these features today? Is there a PR blocked because of OCI?

nothing really :-) We are planning to use zstd by default on Fedora 41: https://fedoraproject.org/wiki/Changes/zstd:chunked

In addition to the faster decompression, zstd also requires less CPU time.

Is this from content that was pulled from a registry, or content stored locally in a compressed state?

stored locally in a compressed state

@sudo-bmitch
Copy link
Contributor

We are planning to use zstd by default on Fedora 41: https://fedoraproject.org/wiki/Changes/zstd:chunked

Do we need to differentiate between zstd and zstd+chunked?

@giuseppe
Copy link
Member

giuseppe commented Jul 3, 2024

We are planning to use zstd by default on Fedora 41: https://fedoraproject.org/wiki/Changes/zstd:chunked

Do we need to differentiate between zstd and zstd+chunked?

No, it is still a valid zstd file. Clients that do not use the additional metadata will simply ignore it.

@septatrix
Copy link

We are planning to use zstd by default on Fedora 41: https://fedoraproject.org/wiki/Changes/zstd:chunked

Do we need to differentiate between zstd and zstd+chunked?

No, it is still a valid zstd file. Clients that do not use the additional metadata will simply ignore it.

What is the media type of zstd:chunked? If it is application/vnd.oci.image.layer.v1.tar+zstd:chunked it should still become part of the spec such that tools will know that they can use it (even if they just treat it as a normal zstd file). Otherwise tools will always need to upload images with both formats even though they are identical.

@giuseppe
Copy link
Member

application/vnd.oci.image.layer.v1.tar+zstd:chunked

it is application/vnd.oci.image.layer.v1.tar+zstd

@septatrix
Copy link

application/vnd.oci.image.layer.v1.tar+zstd:chunked

it is application/vnd.oci.image.layer.v1.tar+zstd

How does a client then know if the layer is chunked? Does it always need to fetch the header and somehow determine if it is chunked using magic bytes?

@giuseppe
Copy link
Member

How does a client then know if the layer is chunked? Does it always need to fetch the header and somehow determine if it is chunked using magic bytes?

it could do that, or use the annotations we added for that layer, e.g.:

        {
            "MIMEType": "application/vnd.oci.image.layer.v1.tar+zstd",
            "Digest": "sha256:9efd019b05bc504fcc4d0e244f3c996eb2c739f3274ab5cc746e0f421044c041",
            "Size": 113639090,
            "Annotations": {
                "io.github.containers.zstd-chunked.manifest-checksum": "sha256:f67017010afe34d9a5df4c1f65c6ff7ac7a452b57e7febea91d80ed9da51841e",
                "io.github.containers.zstd-chunked.manifest-position": "111910713:1066869:6231165:1"
            }
        }

so it can immediately fetch the TOC and validate it against the checksum that is recorded in the manifest as well

@sudo-bmitch
Copy link
Contributor

I just spent a bit of time rereading this thread (it's a long one) and a lot has happened since the original issue.

  1. This didn't get resolved before the v1.1.0 release, but that doesn't mean we can't sort things out for a future release.
  2. The "MUST ignore" text confused (gestures wildly) everyone, so it was replaced with "Implementations storing or copying image manifests MUST NOT error on encountering a mediaType that is unknown to the implementation."
  3. We currently say "Implementations MUST support at least the following media types..." with only a list of uncompressed and gzip media types. I don't know if we are ready to make zstd a MUST, but adding a SHOULD list would make sense to me.
  4. In the image-index.md, we now have the text "If multiple manifests match a client or runtime's requirements, the first matching entry SHOULD be used" which perhaps should be adjusted slightly to clarify the runtime cannot distinguish between the entries. This was designed to give a bit of forward compatibility where issues like this could be resolved with an added entry in the index and an annotation in the descriptor in the index as a flag to the runtime. Some of this might be resolved with the Image Compatibility Working Group which is taking a break right now while folks attempt possible implementations in their free time.
  5. A lot of discussion on supporting multiple compression types has moved towards solving this with transport level compression, which the registry could pre-cache. But everyone is hesitant to go that direction knowing that older clients and registries would fallback to uncompressed layers, a very bad intermediate state.
  6. I'd expect a manifest with mixed compression layers to be possible, if not common, since there's value to maintaining the base image layers without recompressing them. That allows the layers to be mounted across repositories, reduces storage costs, and enables clients to dedup layers more efficiently.

Given everything that has happened, what is left to complete before resolving this issue?

  • The one item I'm seeing is adding a "SHOULD" list of layer media types to manifest.md.
  • Transport level compression may need a working group to sort out the issues of the intermediate state.
  • If there's interest in supporting runtimes selecting from multiple index entries for multiple compression options and graceful fallbacks, that might be good to work on the Image Compatibility WG.
  • Adding some detail around zstd chunked to layer.md might be useful for compatibility between the various tools. Perhaps @giuseppe would be up for that. That could also be tracked as a separate issue.

Anything that I'm missing? If not, I can work on a PR for the manifest.md layers text so we can close this out.

@tianon
Copy link
Member

tianon commented Sep 5, 2024

(adding the meaningful bit of my thoughts from today's call)

A lot of discussion on supporting multiple compression types has moved towards solving this with transport level compression, which the registry could pre-cache. But everyone is hesitant to go that direction knowing that older clients and registries would fallback to uncompressed layers, a very bad intermediate state.

IMO, the intermediate state of "many older tools don't support the layers at all" (zstd) is worse than "many older tools will have increased storage, but everything will be functional" (uncompressed / transport compression) 👀

One of these is a really viable intermediate state to allow actual transition (that happens to also solve other interesting problems like layer digest vs DiffID) while the other is a complete non-starter (at least, from my own perspective as a large publisher of a number of popular images).

@tianon
Copy link
Member

tianon commented Sep 5, 2024

(oh, and I'm +1 on adding zstd to the "SHOULD support" list)

@slonopotamus
Copy link

2 cents: the real world (containerd, buildkit, docker, buildah, etc) already supports zstd. The fact that image spec lags behind just introduces a discrepancy. From technical POV, zstd is just unlimately superior to gzip in all aspects. In order to continue to make sense, image spec should declare zstd as a supported format.

@tianon
Copy link
Member

tianon commented Sep 5, 2024

The image spec was updated to support it, back in #788, which was part of every RC and the GA of v1.1.0.

This issue is more about saying that implementations "SHOULD" support it, which is mostly semantics because, as you've said, all the major tools already do.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests