-
Notifications
You must be signed in to change notification settings - Fork 659
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support the media type application/vnd.oci.image.layer.v1.raw
#1197
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Shiming Zhang <[email protected]>
95c112e
to
9964a8f
Compare
+1
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This gets a soft NACK from me. Knowing the path in the image is something the artifact should be unaware of. Different containers may want to mount the content of the same artifact in different locations. If there is a tight connection between the artifact and the image, then the artifact should likely be shipped as a new layer in the image and image-spec already defines how layers are added. Also, a media type that indicates the content is an unknown byte steam is already handled by using application/octet-stream.
An "unknown" media type breaks artifacts outside of the Kubernetes context, or creates artifacts that can only be used when structured in the Kubernetes mandated format. E.g. WASM would need to redefine their artifact specification, or ship two separate artifacts, allow their content to be mounted as a volume.
I think it would be better for Kubernetes to support arbitrary artifacts with an enhanced volume syntax that specifies which blob in the artifact to mount with a specific filename. I mentioned that in kubernetes/enhancements#4642 (review) but that was pushed off to a future enhancement so that the simplified proposal could be implemented faster.
Agreed with Brandon; if compression is the bottleneck, I'd suggest using uncompressed ( |
Thanks for your reply, I've updated my content accordingly and commented on the previous discussion for anyone who might be interested in moving over here. I think this is a great optimization for large files in images. I look forward to a discussion that will lead to a more acceptable solution. 🙏🙏🙏 |
How will we persuade all container runtimes to adopt this special layer format; how will we provide compatibility for runtimes that don't yet include that support? |
Alternative: define a new type OCI artefact (not a layer) that represents a chunk of application data. This can then be mapped to something that looks like a file and mounted where a container can access it. For example, using a CSI driver and Kubernetes. |
You have a point, there seems to be a precedent for this approach. such as application/vnd.in-toto+json, https://oci.dag.dev/?image=docker/dockerfile:1.5.1 crane manifest docker/dockerfile: 1.5.1 | jq .
{
"schemaVersion": 2,
"mediaType": "application/vnd.oci.image.index.v1+json",
"manifests": [
...
{
"mediaType": "application/vnd.oci.image.manifest.v1+json",
"digest": "sha256:cd6383b1260aee71593cb70bdd44d50daf1ba142ed54191972fce08694ddfe35",
"size": 839,
"platform": {
"architecture": "unknown",
"os": "unknown"
},
"annotations": {
"vnd.docker.reference.digest": "sha256:e7748818724fa5f622da18698f9f5b16e0f32e5a6b9af888fd84053eb48e9cfd",
"vnd.docker.reference.type": "attestation-manifest"
}
},
...
]
} crane manifest docker/dockerfile@sha256:cd6383b1260aee71593cb70bdd44d50daf1ba142ed54191972fce08694ddfe35 | jq .
{
"schemaVersion": 2,
"mediaType": "application/vnd.oci.image.manifest.v1+json",
"config": {
"mediaType": "application/vnd.oci.image.config.v1+json",
"digest": "sha256:083187ca1ac1b4eebff9e2c982bb462d947a34e6d93be8a7d33491cf933881c9",
"size": 241
},
"layers": [
{
"mediaType": "application/vnd.in-toto+json",
"digest": "sha256:4297014349dde3f52863838747b5a4bacf2058d0951dae5b48025741611496a7",
"size": 39882,
"annotations": {
"in-toto.io/predicate-type": "https://spdx.dev/Document"
}
},
{
"mediaType": "application/vnd.in-toto+json",
"digest": "sha256:ff81f2987309a567da898c358437b2943cc297007b99491a4ea13f014b90a449",
"size": 14542,
"annotations": {
"in-toto.io/predicate-type": "https://slsa.dev/provenance/v0.2"
}
}
]
} |
The OCI guidance for packaging artifacts can be found at: https://github.com/opencontainers/image-spec/blob/main/artifacts-guidance.md Specifically, the image manifest covers how to set the media types on the layers already: https://github.com/opencontainers/image-spec/blob/main/manifest.md#guidelines-for-artifact-usage To take the common made up artifact, let's assume I've shipped a cat picture as an artifact. Today, that would be packaged as an With that example, how does the cat picture artifact know the path that it should be mounted in every container? Some of those containers could be web servers, each serving content from a different path. Other containers could be a wasm tool converting the format. And other containers could be ML models validating or training on pictures. With all of these containers, I don't see how the artifact knows its path that applies equally to all of them. If an artifact is so specific that it can only be used with one image, then an image should be created with that artifact as a new layer. The result would be much more portable. If gzip compression is the problem in that scenario, the layer can be zstd compressed, or uncompressed, using the current spec already. |
I generally agree; however a downside of this is that the thin tar wrapper makes the blob different from the original input; it has a different checksum, size etc which obscures its linkage and provenance. But I also think the AI model use case is one where OCI artifacts are just more appropriate; the fact that it's architecture independent data among other things argues for that. |
There are other approaches to address the core requirement and better aligned with existing workflows? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After thinking about this more, I'm switching to a hard NACK. To support artifacts in the KEP, that should be taken up on the K8s side. I don't think we should contort the definition of an OCI image to attempt to force them to accept what should be distributed as an OCI artifact.
On the Kubernetes side, this wouldn't go into core Kubernetes either - it's most likely to be an extension you could add to a cluster if you want to. |
This defines a new MediaType that is the original file without any processing, and marks its path in the annotation.
This will be very helpful for the large language model, because the model is just too big (such as 100 GB), and not packaging it would be a huge optimization.
This MediaType is not just for Kubernetes.
It provides a standardized way to handle raw data efficiently for different container environments.
Faster bootstrapping:
On HDDs devices, Unpacking layers can also slow it down.
This step can be removed to speed up bootstrapping after the first image pull.
This is good for places where things need to be up and running fast.
Storage Space Savings:
This MediaType can save almost half of the storage space,
which is crucial for devices with limited storage.
I expected something like this
COPY --type=raw ./file /file
User Story: https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/4639-oci-volume-source