Skip to content

Conversation

@nghialv
Copy link
Member

@nghialv nghialv commented Jan 6, 2021

What this PR does / why we need it:

Which issue(s) this PR fixes:

Fixes #1357

Does this PR introduce a user-facing change?:

NONE

Copy link
Collaborator

@pipecd-bot pipecd-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GO_LINTER

Some issues were detected while linting go source files in your changes.

@pipecd-bot
Copy link
Collaborator

COVERAGE

Code coverage for golang is 34.15%. This pull request increases coverage by 0.14%.

File Function Base Head Diff
pkg/app/api/grpcapi/api.go API.PushImageMetadata -- 0.00% +0.00%
pkg/datastore/imagemetadata.go NewImageMetadataStore -- 100.00% +100.00%
pkg/datastore/imagemetadata.go imageMetadataStore.PutImageMetadata -- 83.33% +83.33%
pkg/datastore/imagemetadata.go imageMetadataStore.GetImageMetadata -- 100.00% +100.00%
pkg/model/imagemetadata.go GenerateImageMetadataID -- 0.00% +0.00%
pkg/datastore/mock.go MockDataStore.Put 0.00% 100.00% +100.00%
pkg/datastore/mock.go MockDataStoreMockRecorder.Put 0.00% 100.00% +100.00%

@khanhtc1202
Copy link
Member

/lgtm

@pipecd-bot pipecd-bot added the lgtm label Jan 6, 2021
@pipecd-bot pipecd-bot removed the lgtm label Jan 6, 2021
@pipecd-bot
Copy link
Collaborator

COVERAGE

Code coverage for golang is 34.15%. This pull request increases coverage by 0.14%.

File Function Base Head Diff
pkg/app/api/grpcapi/api.go API.PushImageMetadata -- 0.00% +0.00%
pkg/datastore/imagemetadata.go NewImageMetadataStore -- 100.00% +100.00%
pkg/datastore/imagemetadata.go imageMetadataStore.PutImageMetadata -- 83.33% +83.33%
pkg/datastore/imagemetadata.go imageMetadataStore.GetImageMetadata -- 100.00% +100.00%
pkg/model/imagemetadata.go GenerateImageMetadataID -- 0.00% +0.00%
pkg/datastore/mock.go MockDataStoreMockRecorder.Put 0.00% 100.00% +100.00%
pkg/datastore/mock.go MockDataStore.Put 0.00% 100.00% +100.00%


import "validate/validate.proto";

message ImageMetadata {
Copy link
Member

@nakabonne nakabonne Jan 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking about the model's name. I wonder ImageMetadata is a little bit versatile name. This is for only the latest image. It's a bit weird to fetch from here when we want to have the latest image.
And I guess instead of Metadata, Manifest is more readable according to here (maybe).

Therefore, how about LatestImage or LatestImageManifest?

Any objection is welcome as I'm still debating.

Copy link
Member Author

@nghialv nghialv Jan 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be honest I was thinking about those 2 things too.

About the naming Metadata or Manifest

When I was thinking which is better, I checked the definition of docker image manifest at here
https://docs.docker.com/registry/spec/manifest-v2-2/

and by running the following command I got a schema that does not fit our purpose.

docker manifest inspect docker.io/library/alpine:latest
{
   "schemaVersion": 2,
   "mediaType": "application/vnd.docker.distribution.manifest.list.v2+json",
   "manifests": [
      {
         "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
         "size": 528,
         "digest": "sha256:074d3636ebda6dd446d0d00304c4454f468237fdacf08fb0eeac90bdbfa1bac7",
         "platform": {
            "architecture": "amd64",
            "os": "linux"
         }
      },
      {
         "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
         "size": 528,
         "digest": "sha256:096ebf69d65b5dcb3756fcfb053e6031a3935542f20cd7a8b7c59e1b3cb71558",
         "platform": {
            "architecture": "arm",
            "os": "linux",
            "variant": "v6"
         }
      },
      {
         "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
         "size": 528,
         "digest": "sha256:299294be8699c1b323c137f972fd0aa5eaa4b95489c213091dcf46ef39b6c810",
         "platform": {
            "architecture": "arm",
            "os": "linux",
            "variant": "v7"
         }
      },
      {
         "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
         "size": 528,
         "digest": "sha256:549694ea68340c26d1d85c00039aa11ad835be279bfd475ff4284b705f92c24e",
         "platform": {
            "architecture": "arm64",
            "os": "linux",
            "variant": "v8"
         }
      },
      {
         "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
         "size": 528,
         "digest": "sha256:a2c49065a4a5161095f5d60c527f8b8ab3d4a4e0af79333eeccf4dd260aae64d",
         "platform": {
            "architecture": "386",
            "os": "linux"
         }
      },
      {
         "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
         "size": 528,
         "digest": "sha256:1f2e2eda6c64b2d73549fa66378f5c3474ba1fad4ef7ed94fd7298ca9e8862fd",
         "platform": {
            "architecture": "ppc64le",
            "os": "linux"
         }
      },
      {
         "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
         "size": 528,
         "digest": "sha256:4217ef60b4b11942f76d992798c5904782ebb86f70742800644da32ba3a6f25b",
         "platform": {
            "architecture": "s390x",
            "os": "linux"
         }
      }
   ]
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

About prefixing Lastest or not

I think the model name is fine because it represents the image entity (a specific entity may be the latest one or not).

But I was thinking about the name of DB collection. Firstly, I named the collection as LatestImageMetadata and the model is ImageMetadata but after all, I unified them.

Happy to hear more of your thoughts?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But after rethinking about the prefix, I realized that the struct is containing the ID field that was concatenated just by project id and repo name but not the tag. So prefixing the name with Latest looks better.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and by running the following command I got a schema that does not fit our purpose.

Ok, I totally agree with not using Manifest for the name. Metadata also looks good, but now I remember that I've thought ImageReference looks better. It identifies the specific image.
See: https://docs.docker.com/registry/spec/api/#pulling-an-image
How about this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the struct is containing the ID field that was concatenated just by project id and repo name but not the tag. So prefixing the name with Latest looks better.

I feel the same way 😄

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agreed with your suggestion mentioned at #1358 (comment), hence either ImageMetadata or ImageReference is our answer.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm starting to feel "reference" is appropriate again. The "reference" clearly represents a specific image that has a unique digest. Wheres "metadata" seems to indicate immutable information about the repository.

Hmm... happy to hear your thought.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think "ImageMetadata" is still referring to the image, not to the repository.
But I'm also starting to get the feeling that "ImageReference" is better. (Also because of already existing ImageRef)
Let me update the name.

Nice catch!

@nghialv
Copy link
Member Author

nghialv commented Jan 6, 2021

Btw, I am also thinking about changing the design of the model and the way of storing it inside DB.

Currently, we only save the latest one, but we can change to save all of the entities as normal and use the list(ordered by created_at, limit = 1) to get the latest one.

pros

  • make a better feeling because of unifying DB design with other entities (🤣 )
  • easier to scale/update in the future (e.g. the old entity still be in the database so we may need them in the future)
  • keep all entities to be immutable
  • able to avoid Latest prefix

cons

  • unnecessary data is constantly increasing (but not much because the entity's size is small)
  • have to use list instead of get to find the latest one (but not a problem because we have just 1 call per 5m, and we can improve by using list to find all imagemetadatas for a specific project at piped or using cache at API if needed)

So could you @nakabonne @khanhtc1202 vote which way do you want to apply?

👎 for current way
👍 for the new way

@nakabonne
Copy link
Member

the old entity still be in the database so we may need them in the future

We're on the same page. The truth is I felt that may be going to be needed after supporting the Docker Registry HTTP API V2 (#1350)

As you may know the polling interval can be quite short as that's configurable. But it's nothing, all cons look trivial to me.

@khanhtc1202
Copy link
Member

We're on the same page. The truth is I felt that may be going to be needed after supporting the Docker Registry HTTP API V2 (#1350)

Nice to hear, I suppose our user would need this #1350 in the near future, so +1 for saving all metadata.

@nakabonne
Copy link
Member

@khanhtc1202 Actually not sure we're really able to implement that (a couple of considerations exist including authentication). But worth looking into!

@nghialv
Copy link
Member Author

nghialv commented Jan 6, 2021

Thanks for the thoughts from both of you. Let me apply it.

@pipecd-bot
Copy link
Collaborator

COVERAGE

Code coverage for golang is 34.04%. This pull request increases coverage by 0.03%.

File Function Base Head Diff
pkg/app/api/grpcapi/api.go API.PushImageMetadata -- 0.00% +0.00%
pkg/datastore/imagemetadata.go NewImageMetadataStore -- 100.00% +100.00%
pkg/datastore/imagemetadata.go imageMetadataStore.AddImageMetadata -- 100.00% +100.00%

@nghialv
Copy link
Member Author

nghialv commented Jan 6, 2021

Updated!
Nice review. Thank you.

@pipecd-bot
Copy link
Collaborator

COVERAGE

Code coverage for golang is 34.04%. This pull request increases coverage by 0.03%.

File Function Base Head Diff
pkg/app/api/grpcapi/api.go API.PushImageReference -- 0.00% +0.00%
pkg/datastore/imagereference.go NewImageReferenceStore -- 100.00% +100.00%
pkg/datastore/imagereference.go imageReferenceStore.AddImageReference -- 100.00% +100.00%

wantErr bool
}{
{
name: "Invalid image metadata",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit

Suggested change
name: "Invalid image metadata",
name: "Invalid image reference",

// The image digest.
string digest = 3;
// The image tag.
string tag = 4 [(validate.rules).string.min_len = 1];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For our initial use case, a string is enough, but technically, multiple tags can be associated with a single image reference. For future use, how about making it a comma-separeted list?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On second thought, we currently allow the digest to be not unique. So we can keep it string as-is, and deal with it as other references. Performing docker images can represent my mind well.

$ docker images
REPOSITORY                                                         TAG                 IMAGE ID            CREATED             SIZE
nakabonne-public-test                                              0.1.0               0a0da5a852e7        2 weeks ago         194MB
nakabonne-public-test                                              latest              0a0da5a852e7        2 weeks ago         194MB

(Those IMAGE ID is the same but represents as another image reference)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the future, if we want references with the same digest, we just need to perform List(digest=0a0da5a852e7). Then we will have:

  • nakabonne-public-test:0.1.0
  • nakabonne-public-test:latest
  • gcr.io/nakabonne/nakabonne-test:0.1.0

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using array or comma-separeted string can lead to difficulty for our query at API.
Especially in our case, we support multiple DB backends.
So I want it to be as simple as possible.
The user can store multiple tags on separate entities if they want.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good, let's do so.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After rethinking about this, I realized that ImageWatcher only needs to use "repo_name", not "tag" in the query.
And one more reason, by storing them in the same entity we can avoid using transaction/batched-write while adding multiple entities at the same time.
So let me change this too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Yeah, it definitely the best design I think.

// The repository name.
string repo_name = 2 [(validate.rules).string.min_len = 1];
// The image digest.
string digest = 3;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a comment: This must be unique, so I initially thought it's better to use this as an id, while it's enough to be an optional field. Never mind.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found that the digest must not be unique. Unlike other general container registries, we have to handle multiple repositories (including gcr.io, ecr.amazonaws.com and docker hub). So it often happens that multiple references with the same digest to be registered.

$ docker images
REPOSITORY                                                         TAG                 IMAGE ID            CREATED             SIZE
nakabonne-test                                                     foo                 87c41dfd667d        13 days ago         194MB
000000000000.dkr.ecr.ap-northeast-1.amazonaws.com/nakabonne-test   foo                 87c41dfd667d        13 days ago         194MB

Really never mind!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as far as I know, the combination of repository and tag is used to identify container images, the digest just used to compare 2 images, which have the same digest means they are the same. Image digest is hard to read/use so they do use repository and tag combination instead on refer to an image. In our case, digest should be just saved as external metadata to be used for future purposes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the digest is optional?

Because we don't actually need this value, and if it's required then the user will have to get that value to set.
Our user does not only use the docker they may use bazel and other tools to build their container images.

Why not use the digest as the ID?

An image digest is a hash of an image. So It depends on the image's manifest & layer.

docker images --digests
REPOSITORY                            TAG                   DIGEST                                                                    IMAGE ID       CREATED         SIZE
alpine                                latest                sha256:3c7497bf0c7af93428242d6176e8f7905f2201d8fc5861f45be7a346b5f23436   389fef711851   3 weeks ago     5.58MB

The same image (same digest) can be stored in multiple registries.

@nakabonne
Copy link
Member

Except for a nit comment I left, LGTM
/lgtm

@pipecd-bot pipecd-bot added the lgtm label Jan 7, 2021
@khanhtc1202
Copy link
Member

🚀
/approve


message PushImageReferenceRequest {
string repo_name = 1 [(validate.rules).string.min_len = 1];
string tag = 2 [(validate.rules).string.min_len = 1];
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although we should keep the tag in the model as a single string, at least this should be an array.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right

@nghialv
Copy link
Member Author

nghialv commented Jan 7, 2021

/hold

@nakabonne
Copy link
Member

/approve cancel

@pipecd-bot pipecd-bot removed the lgtm label Jan 7, 2021
@nghialv
Copy link
Member Author

nghialv commented Jan 7, 2021

/hold cancel

@nakabonne
Copy link
Member

/approve

@pipecd-bot
Copy link
Collaborator

APPROVE

This pull request is APPROVED by nakabonne.

Approvers can cancel the approval by writing /approve cancel in a comment. Any additional commits also will change this pull request to be not-approved.

@pipecd-bot pipecd-bot merged commit 57b9ed1 into master Jan 7, 2021
@pipecd-bot pipecd-bot deleted the imagemetadata branch January 7, 2021 03:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add gRPC API to push metadata of container image

5 participants