Set retention for staging images #525

stp-ip · 2019-12-19T15:12:01Z

We currently set a 60d retention on staging storage and staging gcb storage, but don't enforce any retention for images.

Staging images should be discouraged from being used and therefore adding a retention policy would help setting the right expectations as well as keep our storage needs lower in the long run.

I am proposing the same 60d retention to keep things the same across all staging retention settings. Happy for other suggestions.

Additional notes:
Currently GCR itself doesn't provide retention settings. We could create the retention on the GCR created bucket, but I assume this could lead to weird issues.
The other option could run a prow job every week to clean up older images.

"Manual" removal script example: https://gist.github.com/ahmetb/7ce6d741bd5baa194a3fac6b1fec8bb7

rajibmitra · 2020-02-14T20:29:16Z

I would like to work on prow job that will clean up the older images.

rajibmitra · 2020-02-15T04:09:58Z

/assign fiorm

fejta-bot · 2020-05-18T07:15:24Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot · 2020-06-17T07:58:52Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

fejta-bot · 2020-07-17T08:39:00Z

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

k8s-ci-robot · 2020-07-17T08:39:16Z

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

spiffxp · 2020-08-03T18:10:23Z

/reopen

k8s-ci-robot · 2020-08-03T18:10:36Z

@spiffxp: Reopened this issue.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

spiffxp · 2020-08-03T18:14:46Z

I agree 60d for GCR is reasonable. Staging GCR repos have images older than 60d. They should not, or people are going to assume they can use them in perpetuity.

This came up because @msau42 mentioned CSI images were close to 60d and was concerned they would expire and break kubernetes CI. They won't.

$ export prj=k8s-staging-csi; for b in $prj $prj-gcb artifacts.$prj.appspot.com; do echo $b: $(gsutil lifecycle get gs://$b); done
k8s-staging-csi: {"rule": [{"action": {"type": "Delete"}, "condition": {"age": 60}}]}
k8s-staging-csi-gcb: {"rule": [{"action": {"type": "Delete"}, "condition": {"age": 60}}]}
artifacts.k8s-staging-csi.appspot.com: gs://artifacts.k8s-staging-csi.appspot.com/ has no lifecycle configuration.

We should give SIG Storage time to promote their images to k8s.gcr.io and update tests to use them. Then I think we should implement this.

/assign @thockin @dims @bartsmykla
as a heads up

thockin · 2020-08-03T18:26:56Z

I don't object in theory. There isn't a good mechanism to do it, short of writing our own daily things that loops over every staging repo and nukes old images.

bartsmykla · 2020-08-04T07:51:26Z

I can help with our own solution :-)

msau42 · 2020-08-05T21:10:01Z

Is there a recommended way to do canary testing with the 60d removal? For example, in csi, our periodic canary testing tests multiple repos' canary images in one job. But some repos are more active than others, and the inactive ones may not have any merges for > 60d. Is there a way we can keep the canary images around to facilitate this workflow without having to promote the canary tag?

thockin · 2020-08-05T21:27:33Z

Define canary? If you need them long-term, why not promote them?

…

On Wed, Aug 5, 2020 at 2:10 PM Michelle Au ***@***.***> wrote: Is there a recommended way to do canary testing with the 60d removal? For example, in csi, our periodic canary testing tests multiple repos' canary images in one job. But some repos are more active than others, and the inactive ones may not have any merges for > 60d. Is there a way we can keep the canary images around to facilitate this workflow without having to promote the canary tag? — You are receiving this because you were assigned. Reply to this email directly, view it on GitHub <#525 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABKWAVH5FBZLYUJENK6BUTDR7HDDPANCNFSM4J5IUKCA> .

msau42 · 2020-08-05T21:44:39Z

canary in our case is actually images with a "canary" tag. Every pr that merges, we build and repush the "canary" tag. We have a specific canary job that's configured to test using images with the canary tag. We do end up promoting those images with official release tags, and we have separate jobs that test using release images, but we will still have a canary job that tests against head of everything.

thockin · 2020-08-05T23:33:54Z

Yeah, you would not want to promote those, and tags are not mutable in prod anyway. So the goal is to keep the last N builds (N may be 1), regardless of age or whether that build was promoted or not? That seems like a recipe for flakes, no? If we were building our own retirement, we could (for example) always leave a specific tag or something, I guess? But I am still shaky on the idea.

…

On Wed, Aug 5, 2020 at 2:44 PM Michelle Au ***@***.***> wrote: canary in our case is actually images with a "canary" tag. Every pr that merges, we build and repush the "canary" tag. We have a specific canary job that's configured to test using images with the canary tag. We do end up promoting those images with official release tags, and we have separate jobs that test using release images, but we will still have a canary job that tests against head of everything. — You are receiving this because you were assigned. Reply to this email directly, view it on GitHub <#525 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABKWAVHPSXVOCLOUCJEIQP3R7HHFNANCNFSM4J5IUKCA> .

msau42 · 2020-08-06T02:58:27Z

If there's another way we could achieve a "test against latest for all images, even if some of the repos are inactive", open to suggestions.

pohly · 2020-08-06T07:08:40Z

Perhaps we can add a periodic job which rebuilds canary images once a month? The added bonus is that we'll notice if something breaks in the build environment (shouldn't happen, but one never knows...) before actually trying to build a proper release.

thockin · 2020-08-06T16:44:57Z

Why do you want to test against head as opposed to the latest release? Even if that release is explicitly a "daily snapshot" or something? I guess I don't fully know what all the images are or what you are testing...

…

On Thu, Aug 6, 2020 at 12:08 AM Patrick Ohly ***@***.***> wrote: Perhaps we can add a periodic job which rebuilds canary images once a month? The added bonus is that we'll notice if something breaks in the build environment (shouldn't happen, but one never knows...) before actually trying to build a proper release. — You are receiving this because you were assigned. Reply to this email directly, view it on GitHub <#525 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABKWAVALGQLZ2WSHQRQ6GEDR7JJINANCNFSM4J5IUKCA> .

msau42 · 2020-08-06T16:52:52Z

I think a daily snapshot also works. It's similar to Patrick's suggestion but more frequent. We'll need to redo our tooling to be able to query/find the daily rc, but it's feasible.

I'm curious what other projects are doing, because I can't imagine we're the only ones.

thockin · 2020-08-06T16:59:19Z

Most don't seem to be doing much, yet, or they are releasing all components together.

…

On Thu, Aug 6, 2020 at 9:53 AM Michelle Au ***@***.***> wrote: I think a daily snapshot also works. It's similar to Patrick's suggestion but more frequent. We'll need to redo our tooling to be able to query/find the daily rc, but it's feasible. I'm curious what other projects are doing, because I can't imagine we're the only ones. — You are receiving this because you were assigned. Reply to this email directly, view it on GitHub <#525 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABKWAVDF52AM46CMVNTYOXDR7LNXHANCNFSM4J5IUKCA> .

pohly · 2020-08-06T17:18:45Z

We'll need to redo our tooling to be able to query/find the daily rc, but it's feasible.

So you want "canary-" tags because then a test failure can be reproduced locally with the exact same images? I'm not sure how important that is. If it's a genuine issue that hasn't been fixed yet, a more recent "canary" should still expose it, and if it doesn't, would we really dig into such a failure when it no longer occurs?

pohly · 2020-08-06T18:14:08Z

or they are releasing all components together

Built from the same repo? That works because the tip-of-branch components can all be built in the same test job.

But Kubernetes-CSI uses several different repos and then needs to collect the output from different build jobs for a combined test job.

pohly · 2020-08-07T10:56:46Z

I guess we could set up a canary job which checks out all of the relevant repos and then builds everything anew each time it runs. 🤷

thockin · 2020-08-07T15:40:56Z

So there are several options here, and I can't know which makes the most sense for your goals on this. IF we NEED to, we could maybe add one label that we always retain, but IMO that suggests something else is fishy. Why would you keep a "canary" for a long time and NOT promote it or replace it?

…

On Fri, Aug 7, 2020 at 3:57 AM Patrick Ohly ***@***.***> wrote: I guess we could set up a canary job which checks out all of the relevant repos and then builds everything anew each time it runs. 🤷 — You are receiving this because you were assigned. Reply to this email directly, view it on GitHub <#525 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABKWAVAHQZJIEEIF4YOLN53R7PMXZANCNFSM4J5IUKCA> .

msau42 · 2020-08-07T16:00:28Z

We do promote our canaries. The main challenge here is we have two test jobs, one configured to only run against canaries, and one configured to run against release images. The one configured to run against canary is going to be prone to expiration of staging images for inactive repos unless we have something that will periodically build new images.

spiffxp · 2020-10-27T16:37:59Z

/lifecycle frozen

pohly · 2020-10-28T18:55:11Z

We do promote our canaries. ... The one configured to run against canary is going to be prone to expiration of staging images for inactive repos

I think you meant "we don't promote our canaries", right?

unless we have something that will periodically build new images

Here's a PR which tentatively defines a job which refreshes "canary" for one repo: kubernetes/test-infra#19734

Release candidates are still problematic. We sometimes need those while preparing new sidecars for an upcoming Kubernetes release. On the other hand, the time period where we do need them might be small enough that the normal retention period is okay, so this might not be a problem?

msau42 · 2020-10-28T20:35:14Z

What I meant was we do promote canary builds to official release version tags. We don't promote the "canary" tag.

Yes I think we can treat release candidates separately. We don't want to promote release candidates and merge any tests that depend on release candidates in k/k

spiffxp · 2021-02-18T18:55:08Z

From https://cloud.google.com/container-registry/docs/managing#deleting_images:

"Do not apply Cloud Storage retention policies to storage buckets used by Container Registry. These policies to not work for managing images in Container Registry storage buckets." - so we're following the recommended guidance by not setting these
https://github.com/sethvargo/gcr-cleaner is a not-official-Google-product that could accomplish this

spiffxp · 2021-07-16T18:21:58Z

/milestone v1.23

ameukam · 2021-12-14T22:09:48Z

/milestone clear

ameukam · 2024-03-03T13:57:56Z

/milestone v1.32

stp-ip added the wg/k8s-infra label Dec 19, 2019

stp-ip mentioned this issue Jan 8, 2020

support community owned GCS buckets for uploading results to TestGrid #332

Closed

k8s-ci-robot assigned rajibmitra Feb 15, 2020

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 18, 2020

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jun 17, 2020

k8s-ci-robot closed this as completed Jul 17, 2020

k8s-ci-robot reopened this Aug 3, 2020

k8s-ci-robot assigned bartsmykla, dims and thockin Aug 3, 2020

thockin removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Aug 3, 2020

pohly mentioned this issue Sep 2, 2020

Moving images away from quay.io? kubernetes-csi/csi-release-tools#86

Closed

k8s-ci-robot added the lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. label Oct 27, 2020

spiffxp mentioned this issue Oct 29, 2020

[Umbrella Issue] Create a Image Promotion process #157

Closed

spiffxp added the priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. label Jan 22, 2021

spiffxp mentioned this issue Apr 1, 2021

dl.k8s.io: Redirect CI URIs to Kubernetes Community infra #1857

Merged

k8s-ci-robot added this to the v1.23 milestone Jul 16, 2021

k8s-ci-robot added sig/k8s-infra Categorizes an issue or PR as relevant to SIG K8s Infra. and removed wg/k8s-infra labels Sep 29, 2021

k8s-ci-robot removed this from the v1.23 milestone Dec 14, 2021

dims assigned ameukam and unassigned dims Jan 31, 2022

k8s-ci-robot added this to the v1.32 milestone Mar 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Set retention for staging images #525

Set retention for staging images #525

stp-ip commented Dec 19, 2019 •

edited

Loading

rajibmitra commented Feb 14, 2020

rajibmitra commented Feb 15, 2020 •

edited

Loading

fejta-bot commented May 18, 2020

fejta-bot commented Jun 17, 2020

fejta-bot commented Jul 17, 2020

k8s-ci-robot commented Jul 17, 2020

spiffxp commented Aug 3, 2020

k8s-ci-robot commented Aug 3, 2020

spiffxp commented Aug 3, 2020

thockin commented Aug 3, 2020

bartsmykla commented Aug 4, 2020

msau42 commented Aug 5, 2020

thockin commented Aug 5, 2020 via email

msau42 commented Aug 5, 2020

thockin commented Aug 5, 2020 via email

msau42 commented Aug 6, 2020

pohly commented Aug 6, 2020

thockin commented Aug 6, 2020 via email

msau42 commented Aug 6, 2020

thockin commented Aug 6, 2020 via email

pohly commented Aug 6, 2020

pohly commented Aug 6, 2020

pohly commented Aug 7, 2020

thockin commented Aug 7, 2020 via email

msau42 commented Aug 7, 2020

spiffxp commented Oct 27, 2020

pohly commented Oct 28, 2020

msau42 commented Oct 28, 2020

spiffxp commented Feb 18, 2021 •

edited

Loading

spiffxp commented Jul 16, 2021

ameukam commented Dec 14, 2021

ameukam commented Mar 3, 2024

Set retention for staging images #525

Set retention for staging images #525

Comments

stp-ip commented Dec 19, 2019 • edited Loading

rajibmitra commented Feb 14, 2020

rajibmitra commented Feb 15, 2020 • edited Loading

fejta-bot commented May 18, 2020

fejta-bot commented Jun 17, 2020

fejta-bot commented Jul 17, 2020

k8s-ci-robot commented Jul 17, 2020

spiffxp commented Aug 3, 2020

k8s-ci-robot commented Aug 3, 2020

spiffxp commented Aug 3, 2020

thockin commented Aug 3, 2020

bartsmykla commented Aug 4, 2020

msau42 commented Aug 5, 2020

thockin commented Aug 5, 2020 via email

msau42 commented Aug 5, 2020

thockin commented Aug 5, 2020 via email

msau42 commented Aug 6, 2020

pohly commented Aug 6, 2020

thockin commented Aug 6, 2020 via email

msau42 commented Aug 6, 2020

thockin commented Aug 6, 2020 via email

pohly commented Aug 6, 2020

pohly commented Aug 6, 2020

pohly commented Aug 7, 2020

thockin commented Aug 7, 2020 via email

msau42 commented Aug 7, 2020

spiffxp commented Oct 27, 2020

pohly commented Oct 28, 2020

msau42 commented Oct 28, 2020

spiffxp commented Feb 18, 2021 • edited Loading

spiffxp commented Jul 16, 2021

ameukam commented Dec 14, 2021

ameukam commented Mar 3, 2024

stp-ip commented Dec 19, 2019 •

edited

Loading

rajibmitra commented Feb 15, 2020 •

edited

Loading

spiffxp commented Feb 18, 2021 •

edited

Loading