Taming media types #791

jonjohnsonjr · 2019-09-12T17:57:20Z

Posted this in #787 originally but noticed it was closed, so I doubt anyone would ever see it.

~~Do we care if OCI media types are valid RFC6838 media types? E.g. "zstd" isn't defined in the Structured Syntax Suffix Registry.~~

Media types that make use of a named structured syntax SHOULD use the appropriate registered "+suffix" for that structured syntax when they are registered. By the same token, media types MUST NOT be given names incorporating suffixes for structured syntaxes they do not actually employ. "+suffix" constructs for as-yet unregistered structured syntaxes SHOULD NOT be used, given the possibility of conflicts with future suffix definitions.

This ties into #747 as well. I'm not sure if the +enc double-suffix is valid (though I mostly skimmed that RFC and might misunderstand the grammar described).

I was thinking about how I'd go about adding support for zstd in various places, and I'm a bit worried that, as we add more dimensions to the media types, we're going to end up with a combinatoric explosion of valid media type strings. As a registry/client author, it's currently easy enough to hard-code these strings and special-case e.g. nondistributable layers or the compression algorithm; however, I can imagine that comparing hard-coded strings will become sub-optimal at some point in the future, and we'd be better off parsing these strings into a structured form.

Any thoughts on whether we should try to follow that RFC? FWIW, there's a procedure for registering a suffix, if anyone wants to take a stab at adding +zstd.
Is there blessed code anywhere for parsing OCI media types? Or is everyone just directly comparing these strings?

cc @vbatts @lumjjb @stevvooe

The text was updated successfully, but these errors were encountered:

vbatts · 2019-09-12T18:34:05Z

aww dang. I your comment in an open tab for too long and lost track of follow-up. Thanks for opening a new thread. Funny, I just chatted with @lumjjb and posed the question about the multiple '+' suffix. As for tackling the `+zstd` that I recently helped merge, I am not sure that waiting on IANA is _entirely_ required? surely there is a way to declare locally defined addendum to those suffixes?

jonjohnsonjr · 2019-09-12T19:00:00Z

I am not sure that waiting on IANA is entirely required?

Right, I'm not proposing we revert that or anything, but it would be nice to follow the existing standard so that we can just adopt their semantics, then upstream extensions to the suffix registry if they seem reasonable (so clients that aren't OCI-aware can possibly interoperate with OCI registries).

I'm imagining a better way to consume these strings, here's a strawman:

type OCIMediaType struct {
	// Maybe "vnd.oci" is parsed out?
	// * vnd.oci.descriptor
	// * vnd.oci.layout.header
	// * vnd.oci.image.manifest
	// * vnd.oci.image.index
	// * vnd.oci.image.config
	Type string

	// Usually "v1"
	Version string

	// * json
	// * tar
	Encoding string

	// * gzip
	// * zstd
	Compression string

	// See https://github.com/opencontainers/image-spec/issues/747
	Encryption string

	// Self explanatory
	Distributable bool
}

func ParseMediaType(s string) OCIMediaType {
	// an implementation that takes advantage of the rfc6838 grammar
}

This way, if I want to add zstd support, I can just check:

if mt.Compression == "zstd" {}

rather than:

if mt == v1.MediaTypeImageLayerZstd || mt == v1.MediaTypeImageLayerNonDistributableZstd {}

Similarly, the most obvious way to test if something is a layer is some form of:

if mt == v1.MediaTypeImageLayer || mt == v1.MediaTypeImageLayerGzip || mt == v1.MediaTypeImageLayerZstd || mt == v1.MediaTypeImageLayerNonDistributable || mt == v1.MediaTypeImageLayerNonDistributableGzip || mt == v1.MediaTypeImageLayerNonDistributableZstd {
	// yep it's a layer
}

which feels silly.

I think doing this is only really possible if we have a well defined grammar in the first place. Since once already exists, we should probably use it 😄

vbatts · 2019-09-12T19:20:17Z

OHMAN

On 12/09/19 12:00 -0700, jonjohnsonjr wrote: This way, if I want to add zstd support, I can just check: ```go if mt.Compression == "zstd" {} ```

I agree that is nice and cleaner

I think doing this is only really possible if we have a well defined grammar in the first place. Since once already exists, we should probably use it 😄

:-D

lumjjb · 2019-09-12T19:47:44Z

This ties into #747 as well. I'm not sure if the +enc double-suffix is valid (though I mostly skimmed that RFC and might misunderstand the grammar described).

Agreed. It also looks like the intention is not to have multiple suffixes since it represents an encoding... I will change away from the +enc suffix.

This way, if I want to add zstd support, I can just check:
if mt.Compression == "zstd" {}

And if mt.Encrypted :). I'm a fan of this struct representation!

dmcgowan · 2019-09-12T21:28:26Z

@jonjohnsonjr is ordering important when applying them though, for example from an unwrapping perspective. Not sure if that is defined as part of that specification.

For example
+gzip+enc usually means the type was compressed, then encrypted
+enc+gzip could mean encrypted then compressed (I know this wouldn't make sense, but the order is important for processing)

How I am considering parsing is just to split it into a base and array of extensions. Those extensions could be typed but I don't think they should be flattened into a struct.

jonjohnsonjr · 2019-09-12T21:58:18Z

@dmcgowan definitely, which is why I'm nervous about having multiple suffixes. I'm consdering something like this:

func main() {
	mt := "application/vnd.oci.image.layer.nondistributable.v1.tar+gzip+enc"

	// OCIMediaType {
	//   Type: "application/vnd.oci.image.layer.nondistributable.v1.tar+gzip",
	//   Encoding: "enc",
	// }
	parsed := ParseMediaType(mt)

	// OCIMediaType {
	//   Type: "application/vnd.oci.image.layer.nondistributable.v1.tar",
	//   Encoding: "gzip",
	// }
	parsed = ParseMediaType(parsed.Type)

	// OCIMediaType {
	//   Type: "application/vnd.oci.image.layer.nondistributable.v1",
	//   Encoding: "tar",
	// }
	parsed = ParseMediaType(parsed.Type)
}

Clients could progressively unwrap these types until they hit the encoding they care about or one of the concrete mediatypes:

application/vnd.oci.descriptor.v1
application/vnd.oci.layout.header.v1
application/vnd.oci.image.manifest.v1
application/vnd.oci.image.index.v1
application/vnd.oci.image.layer.v1
application/vnd.oci.image.config.v1
application/vnd.oci.image.layer.nondistributable.v1

I'm also kind of frustrated with how nondistributable layers are expressed... it doesn't really fit into this model. You can't pull it off the end like compression or encryption :/

Encryption/versions complicate things as well. Right now it's not too bad to treat "application/vnd.oci.image.layer.nondistributable.v1" as an opaque thing, but if we ever rev v1 -> v2, we'll double the number of mediaTypes. Adding +enc would double them again, and we'd have 12 different flavors of nondistributable layers.

How I am considering parsing is just to split it into a base and array of extensions. Those extensions could be typed but I don't think they should be flattened into a struct.

Makes sense, so something like this?

type OCIMediaType struct {
	// * application/vnd.oci.descriptor.v1.json
	// * application/vnd.oci.layout.header.v1.json
	// * application/vnd.oci.image.manifest.v1.json
	// * application/vnd.oci.image.index.v1.json
	// * application/vnd.oci.image.config.v1.json
        // * application/vnd.oci.image.layer.nondistributable.v1.tar
        // * application/vnd.oci.image.layer.v1.tar
	Type string

	// * ["gzip"]
	// * ["enc", "gzip"]
	Extensions []string
}

I guess it doesn't really matter what the implementation is. As long as we have a well defined ordering of extensions, clients can do whatever they want.

jonjohnsonjr · 2019-09-12T22:41:40Z

I'm also a bit worried that things like opencontainers/artifacts#11 gaining traction could cement us in a bad spot if we aren't careful.

Do these all fit in our world?

application/vnd.cncf.helm.config.v1+json
application/vnd.cncf.helm.chart.content.layer.v1+tar
application/vnd.sylabs.sif.config.v1+json
appliciation/vnd.sylabs.sif.layer.tar

For example, application/vnd.cncf.helm.chart.content.layer.v1+tar is inconsistent with everything else (+tar instead of .tar). I suspect that's unintentional, but we have no grammar or parser or validation for media types, so who knows? How are clients supposed to handle that when they see it? cc @SteveLasker @jdolitsky

Also, the singularity layer spelled application wrong and is unversioned, whereas the config has v1. Is that intentional? Does it matter or is the version number opaque to clients? cc @ikaneshiro @vsoch

lumjjb · 2019-09-12T22:51:51Z

Base on the regexp from https://tools.ietf.org/html/rfc6838#section-4.2, it seems like having multiple "+"s is not illegal but only the last one is considered the structured syntax suffix based on the spec.

vsoch · 2019-09-12T22:57:21Z

Why does sif end in layer.tar? A sif image could conceptually be one layer, but it's definitely not a tar. Is the idea that you have to tar the image up and extract the SIF on some filesystem?

The spec does have a version and if you look at the build it also can be grabbed from the git commit - but I'm not sure that's exposed beyond using the sif tool to inspect the header, and afaik it hasn't been changed. I'm not sure who wrote that (or why it has spelling mistakes!) but @ikaneshiro would be best to ask.

jonjohnsonjr · 2019-09-12T23:02:22Z

only the last one is considered the structured syntax

This makes sense, the last one should be the format of the payload and trimming the suffix should yield the unwrapped payload's media type.

dmcgowan · 2019-09-12T23:21:30Z

is inconsistent with everything

@jonjohnsonjr Do you think the artifacts repo should provide guidance on the media types? The ship has kind of sailed on the v1 OCI types (I agree nondistributable says nothing about the content, it should have been an annotation). Also everyone doesn't need a custom media type for everything, the reason layers aren't just "application/tar" is because the tar must be processed in a specific way to handle whiteouts. As for "+tar", I'm not sure that is completely wrong if the type is just defining how to process a set of files, but not sure why they did something inconsistent.

jonjohnsonjr · 2019-09-12T23:50:39Z

Do you think the artifacts repo should provide guidance on the media types?

Definitely, but they're going to look to image-spec for inspiration, and we should have a good answer. We could document the reasoning behind the current media types so that they don't just cargo cult what's here without thinking about why. I think a small grammar on top of rfc6838 would go a long way towards keeping things consistent and usable.

The ship has kind of sailed on the v1 OCI types

I kind of agree, but I suspect there's not that much relevant adoption, especially since the distribution spec isn't really finalized. Regardless, we should fix this if we ever bump to v2.

As for "+tar", I'm not sure that is completely wrong if the type is just defining how to process a set of files, but not sure why they did something inconsistent.

Right, I don't think it's "wrong", but it would be much nicer if it were consistent 😄

SteveLasker · 2019-09-13T01:06:23Z

If you're asking for a standard for how different compression formats should be applied to layers, regardless of type. That sounds like greatness. And, yes, I would really like to better understand the reasoning for vnd/, + and all the other formats in the mediaTypes. In the artifacts repo, we describe the type formatting:

For the computers, artifacts are defined by setting the manifest.config.mediaType to a globally unique value. Defining unique values enables registries and tooling to differentiate types.

Note: The config.mediaType of application/vnd.oci.image.config.v1+json is reserved for artifacts intended to be instanced by [docker][docker] and [containerD][containerd].
Each artifact shall have its own unique type.

The following format is used to differentiate the type of artifact:

application/vnd.[org|company].[objectType].[optionalSubType].config.[version]+json

I think we're talking about how tools for a specific artifact types handle their layers vs. how a registry handles multiple artifact types.

For each artifact type, including an OCI image, there are a set of valid layer types. If someone introduces new layer types, how would the various tools know how to parse those layers?

But, I also think we should recognize that there are different "types" being pushed to registries. "docker" images, which should equate to OCI Images have a set of known types. The image-spec speaks to the specific layers the docker/oci image format support.

Singularity images are unique to the tooling around singularity, and shouldn't have to follow the conventions for an OCI Image. As @vsoch suggests, they shouldn't have to convert their format just to match the docker/oci image format. They can use their native .sif format. That was the big value for them leveraging OCI Artifacts. They no longer have to impersonate a docker image. Same with Helm, OPA, Marky Mark and who knows what else gets added. Each have their own sandbox to play in. While registries can focus on their value.

Take the security scanning scenario. I have a registry that I want to secure. A security scanning solution will scan all the artifacts. If it knows how to scan a docker/oci image, it can report on its results as a known state. When it encounters a Helm chart, or Singularity image, it shouldn't report an error, rather it either knows how to scan it or not. It knows whether it can scan it because of its manifest.config.mediaType. For the types it knows about, it needs to know how to parse the layers.

The purpose of OCI Artifacts is to formalize what's already happening. This allows registries to do what they do best, authenticate, accept, store, serve stuff. While, artifact authors manage their types, and tools around their types know how to work with them.

This is why we're proposing well-known-types where artifact authors submit their types. In this case the misspelling could get caught before becoming "well known"

For the docker/oci image, if we think all tools should support a new type, they would submit a change to the vnd.oci.image.1 definition. Once approved, tools that support docker/oci images know what to expect and can process them, including the security scanning products.

ikaneshiro · 2019-09-13T16:21:13Z

@jonjohnsonjr @vsoch
The spelling mistake, tar extension and lack of version number on the layer mediaType are oversights on my part. I plan on making the mediaType name in line with what @SteveLasker has referenced as it falls under the artifact umbrella.

I'll move the discussion around that to https://github.com/sylabs/singularity/issues/4437 though. Don't want to detract from this more general discussion.

vsoch · 2019-09-13T16:21:50Z

Thank you @ikaneshiro !

dmcgowan · 2019-09-17T18:07:09Z

I put this on the agenda for tomorrow's weekly dev meeting. If we can't get enough people there tomorrow to discuss we can push it out a week too.

vbatts · 2019-09-18T10:53:50Z

i will not be able to make the call, but I feel the discussion is on the right track. I do like the way Jon laid out an interface for it, but since order matters, this feels like it'll just be a huge switch statement that is comparing whole strings.

…

On Tue, Sep 17, 2019 at 8:07 PM Derek McGowan ***@***.***> wrote: I put this on the agenda for tomorrow's weekly dev meeting. If we can't get enough people there tomorrow to discuss we can push it out a week too. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#791?email_source=notifications&email_token=AAAQL2LGLVMRDKOWYNUXEJDQKEME5A5CNFSM4IWHWHX2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD65M3DI#issuecomment-532336013>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAAQL2NHO7MK6F4YQY55IT3QKEME5ANCNFSM4IWHWHXQ> .

jdolitsky · 2019-09-18T20:58:42Z

I'd like more clarification somewhere on valid mediatypes (maybe that is the artifacts repo?) .. the Helm mediatypes are using +tar/ +json as that is the recommendation we had at the time, but now sounding like an oversight https://v3.helm.sh/docs/topics/registries/#where-are-my-charts

mikebrow · 2019-09-18T22:42:02Z

I'd like to see scenarios drawn out for how the +encrypted/encoded layer type suffix would work from build to snapshot for the community at large, before we say yeah let's support a generic media type suffix. Has that been done yet?

It would make more sense to me if it was say +multipart/encrypted with a first portion of the binary providing the necessary detail for decryption and/or +type.encrypted where type indicates which of a plurality of generally supported encryption algorithms for existing and new encryption types possibly specified over here iana but where we could also add more as needed.

lumjjb · 2019-09-18T23:02:24Z

I'd like to see scenarios drawn out for how the +encrypted/encoded layer type suffix would work from build to snapshot for the community at large, before we say yeah let's support a generic media type suffix. Has that been done yet?

Yup. @stefanberger, @dmcgowan worked on this together with the new stream plugin from @crosbymichael. There is also an implementation in containers/image library as an open PR - with consumption by skopeo and cri-o.

mikebrow · 2019-09-20T03:55:08Z

I'd like to see scenarios drawn out for how the +encrypted/encoded layer type suffix would work from build to snapshot for the community at large, before we say yeah let's support a generic media type suffix. Has that been done yet?

Yup. @stefanberger, @dmcgowan worked on this together with the new stream plugin from @crosbymichael. There is also an implementation in containers/image library as an open PR - with consumption by skopeo and cri-o.

I see code, but I don't see soup to nuts scenario descriptions. I think I remember seeing some decks where scenarios were talked about in general but don't see it in the links. Thinking about this from a spec perspective.

I don't see in the code, yet, how you handle all the various encryption types, is there a plain text header in the layer, or some magic number?

jonjohnsonjr · 2019-09-20T18:37:56Z

Posting back here, @estesp kindly took some notes in the OCI call: https://hackmd.io/El8Dd2xrTlCaCG59ns5cwg#Notes1

Some action items from @dmcgowan

Add a description to image-spec of the media type format. We should describe existing media types and probably reference the RFC. Perhaps some guidance for clients of how to interpret the media types, but that is covered by the RFC.
Update the artifacts repo with explicit guidance for creating new media types that are easily consumable by clients.
Come up with a plan for zstd compatibility in 1.1 release

jdolitsky · 2019-09-22T20:19:52Z

@jonjohnsonjr it looks like action item 2 is already in WIP: opencontainers/artifacts#6

There seems to be alot of examples of using + there in mediatypes.. we can probably use some comments there on what's appropriate.

jzelinskie · 2019-09-23T20:50:00Z

I think we don't need to require that a compression format be a part of RFC6838, but rather all maintainers should agreed that the format should be added and under what name. This will avoid registry authors from supporting every type of compression under the sun and if there is a format that would be valuable to add, we don't have to wait for anyone else to be able to add it to the standard.

We should keep artifacts discussion on that GitHub repository's issue tracker and scope this discussion purely to application of our spec to containers.

jonjohnsonjr · 2019-12-12T00:33:21Z

Something I just ran into re: nondistributable layers... this mostly involves distribution, but most of the relevant conversation has been happening in this repo.

This PR says that unrecognized media types must be ignored: #759

Which was the result of this issue to ensure flexibility in media types: opencontainers/distribution-spec#55

So I'm wondering, from a distribution perspective, what it means to ignore something? My first impression was to just completely ignore it by not even validating that the descriptor references a blob that has actually been uploaded, but I think that's probably unhelpful behavior. Nondistributable media types seem like an exception that proves the rule here; i.e. that any non-manifest should be expected to reference an uploaded blob, unless it's a nondistributable media type.

That thought reminded me of this thread, where we're throwing in the towel on nondistributable media types, but I think that's a bad idea. It would be great if we could define an annotation for nondistributability such that non-layer media types can also be outside of the blobstore, e.g. for directly referencing blobs on GitHub instead of within GCR (inspired by discussion with @vbatts).

WDYT @dmcgowan

jonjohnsonjr · 2020-06-08T21:31:08Z

Do we care if OCI media types are valid RFC6838 media types? E.g. "zstd" isn't defined in the Structured Syntax Suffix Registry.

FWIW this is no longer true 🎉

jonjohnsonjr · 2020-12-22T22:21:33Z

This is a real problem, not just for me: moby/buildkit#1879

cc @tonistiigi come share in my pain

AkihiroSuda mentioned this issue Sep 13, 2019

Generic layer support containerd/containerd#3649

Merged

vsoch mentioned this issue Sep 13, 2019

Clarify and correct OCI Artifact Type for SIF apptainer/singularity#4437

Closed

lumjjb mentioned this issue Oct 9, 2019

Proposal to add Encrypted Layer Mediatype #775

Open

jonjohnsonjr mentioned this issue Oct 16, 2019

Layer digests don't match pushed digests using docker google/go-containerregistry#572

Closed

thaJeztah mentioned this issue Apr 20, 2020

Formalize support for zstd compression: v1.1.0 ? #803

Open

jonjohnsonjr mentioned this issue Feb 18, 2021

Proposal: Add content-encoding support to spec for posting blobs opencontainers/distribution-spec#235

Open

jonjohnsonjr mentioned this issue Mar 17, 2021

Define the data field #826

Merged

jonjohnsonjr mentioned this issue Apr 21, 2021

Added encryption mediatype doc opencontainers/artifacts#15

Closed

jonjohnsonjr mentioned this issue Aug 26, 2021

Question: Cross Repo References oras-project/artifacts-spec#26

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Taming media types #791

Taming media types #791

jonjohnsonjr commented Sep 12, 2019 •

edited

Loading

vbatts commented Sep 12, 2019 via email

jonjohnsonjr commented Sep 12, 2019 •

edited

Loading

vbatts commented Sep 12, 2019 via email

lumjjb commented Sep 12, 2019

dmcgowan commented Sep 12, 2019 •

edited

Loading

jonjohnsonjr commented Sep 12, 2019

jonjohnsonjr commented Sep 12, 2019 •

edited

Loading

lumjjb commented Sep 12, 2019 via email

vsoch commented Sep 12, 2019

jonjohnsonjr commented Sep 12, 2019

dmcgowan commented Sep 12, 2019

jonjohnsonjr commented Sep 12, 2019

SteveLasker commented Sep 13, 2019

ikaneshiro commented Sep 13, 2019

vsoch commented Sep 13, 2019

dmcgowan commented Sep 17, 2019

vbatts commented Sep 18, 2019 via email

jdolitsky commented Sep 18, 2019

mikebrow commented Sep 18, 2019

lumjjb commented Sep 18, 2019

mikebrow commented Sep 20, 2019 •

edited

Loading

jonjohnsonjr commented Sep 20, 2019

jdolitsky commented Sep 22, 2019

jzelinskie commented Sep 23, 2019

jonjohnsonjr commented Dec 12, 2019

jonjohnsonjr commented Jun 8, 2020

jonjohnsonjr commented Dec 22, 2020

Taming media types #791

Taming media types #791

Comments

jonjohnsonjr commented Sep 12, 2019 • edited Loading

vbatts commented Sep 12, 2019 via email

jonjohnsonjr commented Sep 12, 2019 • edited Loading

vbatts commented Sep 12, 2019 via email

lumjjb commented Sep 12, 2019

dmcgowan commented Sep 12, 2019 • edited Loading

jonjohnsonjr commented Sep 12, 2019

jonjohnsonjr commented Sep 12, 2019 • edited Loading

lumjjb commented Sep 12, 2019 via email

vsoch commented Sep 12, 2019

jonjohnsonjr commented Sep 12, 2019

dmcgowan commented Sep 12, 2019

jonjohnsonjr commented Sep 12, 2019

SteveLasker commented Sep 13, 2019

ikaneshiro commented Sep 13, 2019

vsoch commented Sep 13, 2019

dmcgowan commented Sep 17, 2019

vbatts commented Sep 18, 2019 via email

jdolitsky commented Sep 18, 2019

mikebrow commented Sep 18, 2019

lumjjb commented Sep 18, 2019

mikebrow commented Sep 20, 2019 • edited Loading

jonjohnsonjr commented Sep 20, 2019

jdolitsky commented Sep 22, 2019

jzelinskie commented Sep 23, 2019

jonjohnsonjr commented Dec 12, 2019

jonjohnsonjr commented Jun 8, 2020

jonjohnsonjr commented Dec 22, 2020

jonjohnsonjr commented Sep 12, 2019 •

edited

Loading

jonjohnsonjr commented Sep 12, 2019 •

edited

Loading

dmcgowan commented Sep 12, 2019 •

edited

Loading

jonjohnsonjr commented Sep 12, 2019 •

edited

Loading

mikebrow commented Sep 20, 2019 •

edited

Loading