-
Notifications
You must be signed in to change notification settings - Fork 208
Add gRPC API to push metadata of container image #1358
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
pipecd-bot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
Code coverage for golang is
|
|
/lgtm |
|
Code coverage for golang is
|
pkg/model/imagemetadata.proto
Outdated
|
|
||
| import "validate/validate.proto"; | ||
|
|
||
| message ImageMetadata { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm thinking about the model's name. I wonder ImageMetadata is a little bit versatile name. This is for only the latest image. It's a bit weird to fetch from here when we want to have the latest image.
And I guess instead of Metadata, Manifest is more readable according to here (maybe).
Therefore, how about LatestImage or LatestImageManifest?
Any objection is welcome as I'm still debating.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be honest I was thinking about those 2 things too.
About the naming
MetadataorManifest
When I was thinking which is better, I checked the definition of docker image manifest at here
https://docs.docker.com/registry/spec/manifest-v2-2/
and by running the following command I got a schema that does not fit our purpose.
docker manifest inspect docker.io/library/alpine:latest
{
"schemaVersion": 2,
"mediaType": "application/vnd.docker.distribution.manifest.list.v2+json",
"manifests": [
{
"mediaType": "application/vnd.docker.distribution.manifest.v2+json",
"size": 528,
"digest": "sha256:074d3636ebda6dd446d0d00304c4454f468237fdacf08fb0eeac90bdbfa1bac7",
"platform": {
"architecture": "amd64",
"os": "linux"
}
},
{
"mediaType": "application/vnd.docker.distribution.manifest.v2+json",
"size": 528,
"digest": "sha256:096ebf69d65b5dcb3756fcfb053e6031a3935542f20cd7a8b7c59e1b3cb71558",
"platform": {
"architecture": "arm",
"os": "linux",
"variant": "v6"
}
},
{
"mediaType": "application/vnd.docker.distribution.manifest.v2+json",
"size": 528,
"digest": "sha256:299294be8699c1b323c137f972fd0aa5eaa4b95489c213091dcf46ef39b6c810",
"platform": {
"architecture": "arm",
"os": "linux",
"variant": "v7"
}
},
{
"mediaType": "application/vnd.docker.distribution.manifest.v2+json",
"size": 528,
"digest": "sha256:549694ea68340c26d1d85c00039aa11ad835be279bfd475ff4284b705f92c24e",
"platform": {
"architecture": "arm64",
"os": "linux",
"variant": "v8"
}
},
{
"mediaType": "application/vnd.docker.distribution.manifest.v2+json",
"size": 528,
"digest": "sha256:a2c49065a4a5161095f5d60c527f8b8ab3d4a4e0af79333eeccf4dd260aae64d",
"platform": {
"architecture": "386",
"os": "linux"
}
},
{
"mediaType": "application/vnd.docker.distribution.manifest.v2+json",
"size": 528,
"digest": "sha256:1f2e2eda6c64b2d73549fa66378f5c3474ba1fad4ef7ed94fd7298ca9e8862fd",
"platform": {
"architecture": "ppc64le",
"os": "linux"
}
},
{
"mediaType": "application/vnd.docker.distribution.manifest.v2+json",
"size": 528,
"digest": "sha256:4217ef60b4b11942f76d992798c5904782ebb86f70742800644da32ba3a6f25b",
"platform": {
"architecture": "s390x",
"os": "linux"
}
}
]
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
About prefixing
Lastestor not
I think the model name is fine because it represents the image entity (a specific entity may be the latest one or not).
But I was thinking about the name of DB collection. Firstly, I named the collection as LatestImageMetadata and the model is ImageMetadata but after all, I unified them.
Happy to hear more of your thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But after rethinking about the prefix, I realized that the struct is containing the ID field that was concatenated just by project id and repo name but not the tag. So prefixing the name with Latest looks better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and by running the following command I got a schema that does not fit our purpose.
Ok, I totally agree with not using Manifest for the name. Metadata also looks good, but now I remember that I've thought ImageReference looks better. It identifies the specific image.
See: https://docs.docker.com/registry/spec/api/#pulling-an-image
How about this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the struct is containing the ID field that was concatenated just by project id and repo name but not the tag. So prefixing the name with Latest looks better.
I feel the same way 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Btw, there are ImageRef and ImageName in the model package as mentioned before.)
https://github.com/pipe-cd/pipe/blob/ace3150fd61adddeaa8c1057c575e5b13b428404/pkg/model/image_name.go#L44-L54
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agreed with your suggestion mentioned at #1358 (comment), hence either ImageMetadata or ImageReference is our answer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm starting to feel "reference" is appropriate again. The "reference" clearly represents a specific image that has a unique digest. Wheres "metadata" seems to indicate immutable information about the repository.
Hmm... happy to hear your thought.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think "ImageMetadata" is still referring to the image, not to the repository.
But I'm also starting to get the feeling that "ImageReference" is better. (Also because of already existing ImageRef)
Let me update the name.
Nice catch!
|
Btw, I am also thinking about changing the design of the model and the way of storing it inside DB. Currently, we only save the latest one, but we can change to save all of the entities as normal and use the pros
cons
So could you @nakabonne @khanhtc1202 vote which way do you want to apply? 👎 for current way |
We're on the same page. The truth is I felt that may be going to be needed after supporting the Docker Registry HTTP API V2 (#1350) As you may know the polling interval can be quite short as that's configurable. But it's nothing, all cons look trivial to me. |
|
@khanhtc1202 Actually not sure we're really able to implement that (a couple of considerations exist including authentication). But worth looking into! |
|
Thanks for the thoughts from both of you. Let me apply it. |
|
Code coverage for golang is
|
|
Updated! |
|
Code coverage for golang is
|
pkg/datastore/imagereference_test.go
Outdated
| wantErr bool | ||
| }{ | ||
| { | ||
| name: "Invalid image metadata", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit
| name: "Invalid image metadata", | |
| name: "Invalid image reference", |
pkg/model/imagereference.proto
Outdated
| // The image digest. | ||
| string digest = 3; | ||
| // The image tag. | ||
| string tag = 4 [(validate.rules).string.min_len = 1]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For our initial use case, a string is enough, but technically, multiple tags can be associated with a single image reference. For future use, how about making it a comma-separeted list?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On second thought, we currently allow the digest to be not unique. So we can keep it string as-is, and deal with it as other references. Performing docker images can represent my mind well.
$ docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
nakabonne-public-test 0.1.0 0a0da5a852e7 2 weeks ago 194MB
nakabonne-public-test latest 0a0da5a852e7 2 weeks ago 194MB
(Those IMAGE ID is the same but represents as another image reference)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the future, if we want references with the same digest, we just need to perform List(digest=0a0da5a852e7). Then we will have:
- nakabonne-public-test:0.1.0
- nakabonne-public-test:latest
- gcr.io/nakabonne/nakabonne-test:0.1.0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using array or comma-separeted string can lead to difficulty for our query at API.
Especially in our case, we support multiple DB backends.
So I want it to be as simple as possible.
The user can store multiple tags on separate entities if they want.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good, let's do so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After rethinking about this, I realized that ImageWatcher only needs to use "repo_name", not "tag" in the query.
And one more reason, by storing them in the same entity we can avoid using transaction/batched-write while adding multiple entities at the same time.
So let me change this too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Yeah, it definitely the best design I think.
| // The repository name. | ||
| string repo_name = 2 [(validate.rules).string.min_len = 1]; | ||
| // The image digest. | ||
| string digest = 3; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just a comment: This must be unique, so I initially thought it's better to use this as an id, while it's enough to be an optional field. Never mind.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found that the digest must not be unique. Unlike other general container registries, we have to handle multiple repositories (including gcr.io, ecr.amazonaws.com and docker hub). So it often happens that multiple references with the same digest to be registered.
$ docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
nakabonne-test foo 87c41dfd667d 13 days ago 194MB
000000000000.dkr.ecr.ap-northeast-1.amazonaws.com/nakabonne-test foo 87c41dfd667d 13 days ago 194MB
Really never mind!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as far as I know, the combination of repository and tag is used to identify container images, the digest just used to compare 2 images, which have the same digest means they are the same. Image digest is hard to read/use so they do use repository and tag combination instead on refer to an image. In our case, digest should be just saved as external metadata to be used for future purposes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the digest is optional?
Because we don't actually need this value, and if it's required then the user will have to get that value to set.
Our user does not only use the docker they may use bazel and other tools to build their container images.
Why not use the digest as the ID?
An image digest is a hash of an image. So It depends on the image's manifest & layer.
docker images --digests
REPOSITORY TAG DIGEST IMAGE ID CREATED SIZE
alpine latest sha256:3c7497bf0c7af93428242d6176e8f7905f2201d8fc5861f45be7a346b5f23436 389fef711851 3 weeks ago 5.58MB
The same image (same digest) can be stored in multiple registries.
|
Except for a nit comment I left, LGTM |
|
🚀 |
|
|
||
| message PushImageReferenceRequest { | ||
| string repo_name = 1 [(validate.rules).string.min_len = 1]; | ||
| string tag = 2 [(validate.rules).string.min_len = 1]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although we should keep the tag in the model as a single string, at least this should be an array.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right
|
/hold |
|
/approve cancel |
|
/hold cancel |
|
/approve |
What this PR does / why we need it:
Which issue(s) this PR fixes:
Fixes #1357
Does this PR introduce a user-facing change?: