-
Notifications
You must be signed in to change notification settings - Fork 652
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
how to share blobs across multiple images in image-layout spec #811
Comments
The content store directory under /var/lib/containerd has completely nothing to do with OCI. It just happens to have similar directory structure. OCI image archives with index.json are fully supported with |
Isn’t "org.opencontainers.image.ref.name" annotation already supporting multi-image archives? |
Dang, it does, @AkihiroSuda ? I missed that. So I could put multiple images, as long as each root is in index.json and has the image name? And |
@AkihiroSuda I just read through the spec again (prompted by your explanation), both for image-layout and annotations. I also read through the various extended discussions on this in the issues. Here is what I concluded, based on what the above and what you said:
So if I pulled down of
And the index.json would look something like: {
"schemaVersion": 2,
"manifests": [
{
"mediaType": "application/vnd.oci.image.index.v1+json",
"size": 1638,
"digest": "sha256: 9a839e63dad54c3a6d1834e29692c8492d93f90c59c978c1ed79109ea4fb9a54",
"annotations": {
"org.opencontainers.image.ref.name": "docker.io/library/alpine:3.11"
}
}
]
} And if I had both
Is that correct? |
This part is expected to be "3.11" according to the Implementor's Note
|
I saw that note @AkihiroSuda , which had me wondering most about that part. This brings me back to the question, which I highlighted above. How would the image-layout format then handle having the blobs for both Granted, the example is a bit contrived, since those two share no blobs as far as I can tell, but it matters for the use cases of:
Also, I notice that the spec for the annotation for that one say:
That sounds like it is intended to support a full image name, not just tag. |
Also @AkihiroSuda did I get the rest of it right, in terms of what |
Also @AkihiroSuda I renamed the issue, once you explained that it does support them, but it is a question of how. If I need to rename it again after your answers to the above, happy to do so. |
@deitch did you ever get anywhere with this? I have a scenario where I need to distribute as files/tarballs a number of images before they get loaded into a registry (remote location has no internet access to copy between registries directly). The the images share several layers between them as they are built on the same base image, but they end up duplicated in the exported tarballs. I was hoping that maybe converting to a single OCI image layout with multiple index.json entries could be my saving grace. |
I've been treating the Layout directory as a repository, where multiple images are referenced by different tag names in the index.json, and the blobs directory is deduplicated since the names would collide. It does require tooling performs their own GC on orphaned blobs if you change a tag and delete the reference to a old manifest. My own implementation is in regclient, where you can run:
Which would create a directory called |
I did not @sudo-bmitch . In cases where I have total control over how the image-layout is produced and consumed, I do what was listed above: "org.opencontainers.image.ref.name": "docker.io/library/alpine:3.11" That is not what the spec says, which should be: "org.opencontainers.image.ref.name": "3.11" For example, we do this in linuxkit and some other software. I really would like to have a standard where a single Maybe @AkihiroSuda has some more input? He knows this better than I do (by a long shot). |
Circling back to this and rereading the thread, I've been treating the OCI Layout directory as a distinct "repository" from an upstream registry. So the "3.11" would be the tag inside the Layout directory, and it just happens to be a copy of the image from "docker.io/library/alpine" but could just as easily be a copy of the "registry.example.com/private-mirror/alpine" repo. I do see the value for container engines in having a "here's the original repository name" annotation, since they treat all content as a local copy of an upstream repository, and they treat an OCI Layout as an alternate transport, rather than a repository itself. This would be breaking for tools that treat the Layout directory as a repository, since it allows multiple manifests with the same tag and different origin repositories, making it impossible to list the tags in the repository and select a manifest by tag. Given the two possible treatments of this directory, I think we can see interoperability issues between tools working with the Layout that would be good for OCI to resolve. |
That is what it comes down to @sudo-bmitch . The current OCI spec supports a single image and not multiple, although it is very close. So close, and so useful, that others (containerd, linuxkit, etc.) adopt it almost entirely and then do something slightly different for the index. My position is not that the current system does support it (it does not), but that it is so close, and so useful, that it should, so let's do it. |
That's not actually true -- the spec explicitly says both are valid (and always has): 😅 🙈 Lines 33 to 43 in 8797c3f
|
The way that image-layout is written, it cannot (or is very hard to; perhaps I should be less definite in the statement) support multiple images sharing blobs.
The "entrypoint" of the directory is
index.json
, which, essentially, is the root image index of the single image (whatever name and tag that image is, which is not visible from inside the directory).If, however, I have two images, I need two distinct root directories, each with its own
blobs/sha256/
andindex.json
. This despite the fact that these two images may actually share 9 out of 10 layers. The content ofblobs/sha256/
is, by definition, content-addressable, so there is zero conflict in having a shared directory for two or more images; only theindex.json
would be different.containerd itself handles this by having a shared
blobs/sha256/
dir and apparently ignoringindex.json
in favour of its boltdb for pointing to the root of the image. I suspect that is partially so that blobs can be shared, and partially because of the need for multiple concurrent reads and writes of the metadata, requiring a real database.It would be good if the spec supported multiple images with some, all or no shared content, and a single shared content directory. Nothing would force anyone to use it - you still could have multiple directories - but it would create a lot of efficiency options.
I sort of can get around this now on disk making
blobs/
a symlink, but it is messy.I see a few possible options:
index.json
, each in its own directory, so you might have animages/
dir in the spec, the files one each of which (or subdirs) represent a different image, pointing to the shared blobs.index.json
but instead of the current structure, it really is an index ofimage:tag
-> root (manifest or index), which already is inblobs/sha256/
anyways under the current spec. This has an issue with writing of multiple potential writes at once, but anyone looking to have real write concurrency should be using a database (like containerd does)Coming back around: is there a current "correct" way to have blobs shared across images in the current image-layout spec, and if not, what can we do to get there?
The text was updated successfully, but these errors were encountered: