OCPBUGS-9745: add graph-data image annotation to operand pod #164

PratikMahajan · 2023-02-21T21:38:42Z

No description provided.

wking · 2023-02-28T00:27:15Z

controllers/names.go

 }

+func nameGraphDataPod(instance *cv1.UpdateService) string {
+	return instance.Name + "-graph-data-init-sha"


can this be -tag-digest or something, to avoid being longer than the 14-character -policy-engine or -graph-builder? I know we've had name-is-too-long issues in the past, and I'm not clear on what happens if the name returned by these helpers ends up growing long.

Also, handling this per-instance means that if we have multiple UpdateService sharing the same graph-data pullspec, we'll be launching redundant Pods to figure out that digest. In practice, maybe that situation is rare enough that we don't have to worry about trying to centralize tag-to-digest lookup at the operator level where results can be shared between multiple instances.

controllers/updateservice_controller.go

wking · 2023-02-28T00:33:40Z

controllers/updateservice_controller.go

+
+	if err != nil && apiErrors.IsNotFound(err) || found.Status.Phase == "Succeeded"{
+		reqLogger.Info("Creating Pod", "Namespace", pod.Namespace, "Name", pod.Name)
+		err := r.Client.Create(ctx, pod)


Probably need a Delete in the Succeeded case. And also maybe a delete in the "we used to be a by-tag reference, but now the graph-data pullspec is a by-digest reference, so we don't need to bother launching Pods anymore" case?

added a delete on success condition

Is there a chance of racing with someone else (is it possible that we run multiple replicas?) creating the pod? It is possible that we Get, see nothing, something else creates the pod, we create and fail with conflict... I'm fine if we say that that cannot happen realistically.

the way we're avoiding that here is.

start a pod with time.wait for 5 min.

once the wait finishes, the pod goes into success condition

we reconcile every 5 minutes. and when we see that the pod is completed, we'll just delete it.

deleting will kick off the reconcile again to create the pod.

because, we're creating a pod, it can only have 1 pod running at a time with the given name.

because, we're creating a pod, it can only have 1 pod running at a time with the given name

Petr was pointing out that other actors (e.g. a runaway second cincinnati-operator replica) might also create a pod with the same name. And actually, because ensureGraphDataSHA is checking a single UpdateService, it seems like an operator managing several by-tag UpdateService would probably struggle like:

Reconciling UpdateService A, create a graph-data-tag-digest pod.

Reconciling UpdateService B, pod is still running, return "", nil.

Reconciling UpdateService A, pod is still running, return "", nil.

Reconciling UpdateService B, pod is Succeeded, return the image ID. But this is bad, because that image ID was for the by-tag pullspec from UpdateService A, not the pullspec from B.

Instead, you'll want to somehow uniquify by pullspec-under test. Perhaps SetControllerReference allows you to clearly associate a pod with the appropriate UpdateService. And then Kube's built-in garbage collection will handle deleting any pods whose UpdateService is deleted.

controllers/updateservice_controller.go

LalatenduMohanty · 2023-03-01T21:12:27Z

/hold putting a temporary hold to make sure it does not get merged without my review-comments.

controllers/updateservice_controller.go

petr-muller · 2023-03-02T13:36:07Z

controllers/updateservice_controller.go

+
+	if err != nil && apiErrors.IsNotFound(err) || found.Status.Phase == "Succeeded"{
+		reqLogger.Info("Creating Pod", "Namespace", pod.Namespace, "Name", pod.Name)
+		err := r.Client.Create(ctx, pod)


Is there a chance of racing with someone else (is it possible that we run multiple replicas?) creating the pod? It is possible that we Get, see nothing, something else creates the pod, we create and fail with conflict... I'm fine if we say that that cannot happen realistically.

controllers/updateservice_controller.go

openshift-ci-robot · 2023-06-01T15:54:05Z

@PratikMahajan: This pull request references Jira Issue OCPBUGS-9745, which is invalid:

expected the bug to target the "4.14.0" version, but it targets "4.13.0" instead

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

Details

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

wking · 2023-06-01T16:34:02Z

/jira refresh

openshift-ci-robot · 2023-06-01T16:34:09Z

@wking: This pull request references Jira Issue OCPBUGS-9745, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug

bug is open, matching expected state (open)
bug target version (4.14.0) matches configured target version for branch (4.14.0)
bug is in the state ASSIGNED, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @jiajliu

Details

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

LalatenduMohanty · 2023-06-08T17:06:17Z

/hold cancel

PratikMahajan · 2023-06-08T17:25:16Z

/hold

PratikMahajan · 2023-06-26T13:24:17Z

/hold cancel

controllers/updateservice_controller.go

we're tuning up a graph-data pod that refreshes after every 5 min and fetched the latest graph-data image digest which we use to keep a track if the image tag was updated. Logic: 1. we reconcile every 5 minutes 2. We tune up a graph-data pod which runs for 5 min 3. after the graph-data pod is in succeeded condition, the reconcile loop will delete it to create a new one. This is where we get the latest digest for the image. 4. we add this image as an annotation to the deployment. So, everytime the annotation/image changes, it'll tune up new operand pods. We'll not be creating the graph-data pod if the image is referenced by digest instead of tag. We're creating just one graph-data pod for OSUS.

petr-muller

/retest

openshift-ci · 2023-06-29T12:06:46Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: petr-muller, PratikMahajan

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [PratikMahajan,petr-muller]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

petr-muller · 2023-06-29T13:03:06Z

quay is sad, failures are likely related

openshift-ci-robot · 2023-06-29T13:34:14Z

/retest-required

Remaining retests: 0 against base HEAD 25c8b26 and 2 for PR HEAD 8f7c466 in total

PratikMahajan · 2023-07-05T14:47:19Z

/retest

openshift-ci · 2023-07-05T15:31:40Z

@PratikMahajan: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/operator-e2e	`6a19659`	link	true	`/test operator-e2e`

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

PratikMahajan · 2023-07-06T17:57:26Z

/override ci/prow/operator-e2e-latest-osus-414 ci/prow/operator-e2e-latest-osus-414
issues with cluster coming up, tests on other OCP versions have passed, OSUS is OCP version agnostic and it would be safe to override the tests!

openshift-ci · 2023-07-06T17:59:35Z

@PratikMahajan: Overrode contexts on behalf of PratikMahajan: ci/prow/operator-e2e-latest-osus-414

Details

In response to this:

/override ci/prow/operator-e2e-latest-osus-414 ci/prow/operator-e2e-latest-osus-414
issues with cluster coming up, tests on other OCP versions have passed, OSUS is OCP version agnostic and it would be safe to override the tests!

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

PratikMahajan · 2023-07-06T18:02:14Z

/override ci/prow/operator-e2e-414

openshift-ci · 2023-07-06T18:05:08Z

@PratikMahajan: Overrode contexts on behalf of PratikMahajan: ci/prow/operator-e2e-414

Details

In response to this:

/override ci/prow/operator-e2e-414

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

openshift-ci-robot · 2023-07-06T18:08:58Z

@PratikMahajan: Jira Issue OCPBUGS-9745: All pull requests linked via external trackers have merged:

openshift/cincinnati-operator#164

Jira Issue OCPBUGS-9745 has been moved to the MODIFIED state.

Details

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 21, 2023

openshift-ci bot requested review from LalatenduMohanty and wking February 21, 2023 21:39

openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 21, 2023

wking reviewed Feb 28, 2023

View reviewed changes

controllers/updateservice_controller.go Outdated Show resolved Hide resolved

wking reviewed Feb 28, 2023

View reviewed changes

controllers/updateservice_controller.go Outdated Show resolved Hide resolved

wking reviewed Feb 28, 2023

View reviewed changes

controllers/updateservice_controller.go Show resolved Hide resolved

wking reviewed Feb 28, 2023

View reviewed changes

controllers/updateservice_controller.go Outdated Show resolved Hide resolved

PratikMahajan force-pushed the graph-data-restart branch 3 times, most recently from 68c2f8a to d2fbb66 Compare March 1, 2023 18:51

PratikMahajan changed the title ~~[wip] add graph-data image annotation to operand pod~~ add graph-data image annotation to operand pod Mar 1, 2023

openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 1, 2023

openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 1, 2023

LalatenduMohanty suggested changes Mar 1, 2023

View reviewed changes

controllers/updateservice_controller.go Outdated Show resolved Hide resolved

controllers/updateservice_controller.go Outdated Show resolved Hide resolved

controllers/updateservice_controller.go Show resolved Hide resolved

openshift-ci bot assigned LalatenduMohanty Mar 1, 2023

petr-muller reviewed Mar 2, 2023

View reviewed changes

openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 6, 2023

PratikMahajan force-pushed the graph-data-restart branch 2 times, most recently from 8221393 to 49cefa5 Compare March 6, 2023 19:43

openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 6, 2023

PratikMahajan force-pushed the graph-data-restart branch 2 times, most recently from e7a125d to 6a19659 Compare March 9, 2023 19:34

PratikMahajan requested a review from LalatenduMohanty March 30, 2023 15:54

PratikMahajan changed the title ~~add graph-data image annotation to operand pod~~ OCPBUGS-9745: add graph-data image annotation to operand pod Jun 1, 2023

openshift-ci-robot added jira/severity-moderate Referenced Jira bug's severity is moderate for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. labels Jun 1, 2023

openshift-ci-robot added the jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. label Jun 1, 2023

openshift-ci-robot added jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. and removed jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Jun 1, 2023

openshift-ci bot requested a review from jiajliu June 1, 2023 16:34

openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 8, 2023

openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 8, 2023

openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 26, 2023

petr-muller reviewed Jun 27, 2023

View reviewed changes

controllers/updateservice_controller.go Outdated Show resolved Hide resolved

controllers/updateservice_controller.go Outdated Show resolved Hide resolved

PratikMahajan force-pushed the graph-data-restart branch from 6a19659 to 8f7c466 Compare June 28, 2023 19:17

petr-muller approved these changes Jun 29, 2023

View reviewed changes

openshift-ci bot assigned petr-muller Jun 29, 2023

openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jun 29, 2023

openshift-merge-robot merged commit 20dfabf into openshift:master Jul 6, 2023

OCPBUGS-9745: add graph-data image annotation to operand pod #164

OCPBUGS-9745: add graph-data image annotation to operand pod #164

Uh oh!

Conversation

PratikMahajan commented Feb 21, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

LalatenduMohanty commented Mar 1, 2023

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

openshift-ci-robot commented Jun 1, 2023

Uh oh!

wking commented Jun 1, 2023

Uh oh!

openshift-ci-robot commented Jun 1, 2023

Uh oh!

LalatenduMohanty commented Jun 8, 2023

Uh oh!

PratikMahajan commented Jun 8, 2023

Uh oh!

PratikMahajan commented Jun 26, 2023

Uh oh!

Uh oh!

Uh oh!

petr-muller left a comment

Choose a reason for hiding this comment

Uh oh!

openshift-ci bot commented Jun 29, 2023

Uh oh!

petr-muller commented Jun 29, 2023

Uh oh!

openshift-ci-robot commented Jun 29, 2023

Uh oh!

PratikMahajan commented Jul 5, 2023

Uh oh!

openshift-ci bot commented Jul 5, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

PratikMahajan commented Jul 6, 2023

Uh oh!

openshift-ci bot commented Jul 6, 2023

Uh oh!

PratikMahajan commented Jul 6, 2023

Uh oh!

openshift-ci bot commented Jul 6, 2023

Uh oh!

openshift-ci-robot commented Jul 6, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

openshift-ci bot commented Jul 5, 2023 •

edited

Loading