Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

All timestamped images push to Docker Hub on merge to main #759

Closed
kylos101 opened this issue Mar 3, 2022 · 6 comments
Closed

All timestamped images push to Docker Hub on merge to main #759

kylos101 opened this issue Mar 3, 2022 · 6 comments
Assignees
Labels
bug Something isn't working team: workspace

Comments

@kylos101
Copy link
Collaborator

kylos101 commented Mar 3, 2022

Bug description

Given this action run
When we inspect the images pushed we push to DockerHub
Then we expect full, and anything depending on it to change (because rust chunk is part of full combination)
and we expect workspace-rust image to have changed (because the chunk changed)
and we expect other images to not have changed (because they do not depend on rust or full) <-- this is the issue, unexpected images were pushed.

Why is this an issue? It wastes time and storage to upload images to Docker Hub which have not really changed.

Steps to reproduce

Run the merge to main action

Expected output

Ideally, only images which have changed (because we cache in GAR) will be uploaded to Docker Hub.

Why are we unnecessarily pushing images to Docker Hub when the content hasn't changed? How can we do better?

Refer to the Anything Else section (below) for supporting details.

Example repository

n/a

Anything else?

This shows two images are the same, but the digest's vary, and well they both exist in Docker Hub.

#!/bin/bash
go install github.com/csweichel/oci-tool@latest

oci-tool layer --platform=linux-amd64 list docker.io/gitpod/workspace-node:2022-03-03-19-59-28 | jq .[].digest | sort > one.txt
oci-tool layer --platform=linux-amd64 list docker.io/gitpod/workspace-node:2022-03-03-17-15-56 | jq .[].digest | sort > two.txt

diff one.txt two.txt

I'm not sure if this is something we'll have to live with, if there's an improvement we should make to our actions (like limit when we run combine), or if its a change we should make in dazzle for combine itself.

@kylos101 kylos101 added the bug Something isn't working label Mar 3, 2022
@kylos101 kylos101 changed the title All images pushed to Docker Hub on merge to main All timestamped images push to Docker Hub on merge to main Mar 3, 2022
@kylos101 kylos101 moved this to Scheduled in 🌌 Workspace Team Mar 3, 2022
@kylos101
Copy link
Collaborator Author

kylos101 commented Mar 3, 2022

@princerachit as a heads up, I added this to our improvements epic.

@princerachit
Copy link
Contributor

I think this has something to do with combine, although the chunks haven't changed, for some reason combine output is a new image

@princerachit
Copy link
Contributor

This behaviour of creation of a new image is tied with dazzle's combine step.
It creates a new image everytime because the sha is updated based on current time:

https://github.com/gitpod-io/dazzle/blob/9a56c6d5caddff2796508852076d91aa6dba8f8b/pkg/dazzle/combiner.go#L148-L150

And digest is created here from above:

https://github.com/gitpod-io/dazzle/blob/9a56c6d5caddff2796508852076d91aa6dba8f8b/pkg/dazzle/combiner.go#L191-L196

Since this is a new image there is no way to distinguish it with existing image.

We can change this behaviour of new image creation in two steps:

  1. Remove the variables (I think time is the only suitable candidate as of now) from image config so it generates same digest
  2. Modify the combine step to look for an image with calculated digest before building it
  3. Do NOT create an image if it already exists

However, this would still not solve us the problem of multiple tags. We would need to write scripts in our action to NOT tag an image with new timestamp if it was not updated in say last X minutes.

IMO this is complicated and does not benefit us compare to the effort we put, the reason is:

  1. Image tags may be pushed to Dockerhub but the layers that they are made up of do not change so a tag X and Y may have different sha but their layers have the same sha so it is just reused.
  2. The tag step is run in parallel for each step[1], [2], so we are not wasting any extra time in tagging the images.
  3. The sync step takes at most 4s to verify that an image already exists in the Dockerhub before skiping the sync for it. This is NOT done in parallel and will increase with the number of tags that we have.

I suggest that we should rather have an action which deletes old image tags from GAR to reduce the time spent in (3) rd point above

@princerachit princerachit self-assigned this Mar 7, 2022
@kylos101
Copy link
Collaborator Author

kylos101 commented Mar 7, 2022

Hi @princerachit , please avoid this issue, I need to research what you shared and dig in more to better understand.

@kylos101 kylos101 moved this from Scheduled to Blocked in 🌌 Workspace Team Mar 7, 2022
@kylos101
Copy link
Collaborator Author

kylos101 commented Mar 7, 2022

Moved to blocked for now and assigning myself.

@kylos101 kylos101 assigned kylos101 and unassigned princerachit Mar 7, 2022
@kylos101
Copy link
Collaborator Author

kylos101 commented Apr 4, 2022

I am going to close this for now. I've changed our main build, so that it authenticates with GAR (again) when pushing to Docker Hub, to avoid the 1h timeout.

@kylos101 kylos101 closed this as completed Apr 4, 2022
Repository owner moved this from Blocked to Done in 🌌 Workspace Team Apr 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working team: workspace
Projects
No open projects
Archived in project
Development

No branches or pull requests

2 participants