-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ROX-15523: Copy expiration to cloud resources (GKE) #895
base: master
Are you sure you want to change the base?
Conversation
A single node development cluster (infra-pr-895) was allocated in production infra for this PR. CI will attempt to deploy us.gcr.io/stackrox-infra/infra-server:0.7.8-47-gf4fdcf80c2 to it. 🔌 You can connect to this cluster with:
🛠️ And pull infractl from the deployed dev infra-server with:
🚲 You can then use the dev infra instance e.g.:
Further Development☕ If you make changes, you can commit and push and CI will take care of updating the development cluster. 🚀 If you only modify configuration (chart/infra-server/configuration) or templates (chart/infra-server/{static,templates}), you can get a faster update with:
LogsLogs for the development infra depending on your @stackrox.com authuser: Or:
|
This reverts commit f1debcd.
to prevent missed updates
Copy Infra cluster lifespan(expiration) metadata from workflow to gke cluster. Applying an expiration label to cloud resources enables janitor to find expired unneeded resources without interacting with Infra.
The Infra lifespan is recorded as a duration string in an argo workflow annotation (a field within the custom resource).
Applying the label to the cloud resources is flavor-specific, and so I think it should be in the flavor specific workflow or image and not in the infra code. (adding for GKE first)
Infra expects workflows to suspend when cluster creation is complete, and destroy clusters on workflow resume. No hooks or retries are executing when a workflow is suspended. This change adds a loop to the gke workflow checking if the lifespan changed and code to stop the workflow if flagged as not requiring resume (having cluster destroy in a workflow onExit hook).
Questions:
[ ] Why aren't all destroys set as onExit? Cluster destroys can be performed onExit for suspended and not-suspended workflows. It appears some were onExit in older Infra workflows.