-
Notifications
You must be signed in to change notification settings - Fork 33
OCPBUGS-9745: add graph-data image annotation to operand pod #164
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OCPBUGS-9745: add graph-data image annotation to operand pod #164
Conversation
controllers/names.go
Outdated
| } | ||
|
|
||
| func nameGraphDataPod(instance *cv1.UpdateService) string { | ||
| return instance.Name + "-graph-data-init-sha" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can this be -tag-digest or something, to avoid being longer than the 14-character -policy-engine or -graph-builder? I know we've had name-is-too-long issues in the past, and I'm not clear on what happens if the name returned by these helpers ends up growing long.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, handling this per-instance means that if we have multiple UpdateService sharing the same graph-data pullspec, we'll be launching redundant Pods to figure out that digest. In practice, maybe that situation is rare enough that we don't have to worry about trying to centralize tag-to-digest lookup at the operator level where results can be shared between multiple instances.
|
|
||
| if err != nil && apiErrors.IsNotFound(err) || found.Status.Phase == "Succeeded"{ | ||
| reqLogger.Info("Creating Pod", "Namespace", pod.Namespace, "Name", pod.Name) | ||
| err := r.Client.Create(ctx, pod) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably need a Delete in the Succeeded case. And also maybe a delete in the "we used to be a by-tag reference, but now the graph-data pullspec is a by-digest reference, so we don't need to bother launching Pods anymore" case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added a delete on success condition
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a chance of racing with someone else (is it possible that we run multiple replicas?) creating the pod? It is possible that we Get, see nothing, something else creates the pod, we create and fail with conflict... I'm fine if we say that that cannot happen realistically.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the way we're avoiding that here is.
- start a pod with time.wait for 5 min.
- once the wait finishes, the pod goes into success condition
- we reconcile every 5 minutes. and when we see that the pod is completed, we'll just delete it.
- deleting will kick off the reconcile again to create the pod.
because, we're creating a pod, it can only have 1 pod running at a time with the given name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because, we're creating a pod, it can only have 1 pod running at a time with the given name
Petr was pointing out that other actors (e.g. a runaway second cincinnati-operator replica) might also create a pod with the same name. And actually, because ensureGraphDataSHA is checking a single UpdateService, it seems like an operator managing several by-tag UpdateService would probably struggle like:
- Reconciling UpdateService A, create a
graph-data-tag-digestpod. - Reconciling UpdateService B, pod is still running, return
"", nil. - Reconciling UpdateService A, pod is still running, return
"", nil. - Reconciling UpdateService B, pod is
Succeeded, return the image ID. But this is bad, because that image ID was for the by-tag pullspec from UpdateService A, not the pullspec from B.
Instead, you'll want to somehow uniquify by pullspec-under test. Perhaps SetControllerReference allows you to clearly associate a pod with the appropriate UpdateService. And then Kube's built-in garbage collection will handle deleting any pods whose UpdateService is deleted.
68c2f8a to
d2fbb66
Compare
|
/hold putting a temporary hold to make sure it does not get merged without my review-comments. |
|
|
||
| if err != nil && apiErrors.IsNotFound(err) || found.Status.Phase == "Succeeded"{ | ||
| reqLogger.Info("Creating Pod", "Namespace", pod.Namespace, "Name", pod.Name) | ||
| err := r.Client.Create(ctx, pod) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a chance of racing with someone else (is it possible that we run multiple replicas?) creating the pod? It is possible that we Get, see nothing, something else creates the pod, we create and fail with conflict... I'm fine if we say that that cannot happen realistically.
8221393 to
49cefa5
Compare
e7a125d to
6a19659
Compare
|
@PratikMahajan: This pull request references Jira Issue OCPBUGS-9745, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. DetailsIn response to this: Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/jira refresh |
|
@wking: This pull request references Jira Issue OCPBUGS-9745, which is valid. The bug has been moved to the POST state. 3 validation(s) were run on this bug
Requesting review from QA contact: DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/hold cancel |
|
/hold |
|
/hold cancel |
we're tuning up a graph-data pod that refreshes after every 5 min and fetched the latest graph-data image digest which we use to keep a track if the image tag was updated. Logic: 1. we reconcile every 5 minutes 2. We tune up a graph-data pod which runs for 5 min 3. after the graph-data pod is in succeeded condition, the reconcile loop will delete it to create a new one. This is where we get the latest digest for the image. 4. we add this image as an annotation to the deployment. So, everytime the annotation/image changes, it'll tune up new operand pods. We'll not be creating the graph-data pod if the image is referenced by digest instead of tag. We're creating just one graph-data pod for OSUS.
6a19659 to
8f7c466
Compare
petr-muller
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/retest
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: petr-muller, PratikMahajan The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
quay is sad, failures are likely related |
|
/retest |
|
@PratikMahajan: The following test failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
|
/override ci/prow/operator-e2e-latest-osus-414 ci/prow/operator-e2e-latest-osus-414 |
|
@PratikMahajan: Overrode contexts on behalf of PratikMahajan: ci/prow/operator-e2e-latest-osus-414 DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/override ci/prow/operator-e2e-414 |
|
@PratikMahajan: Overrode contexts on behalf of PratikMahajan: ci/prow/operator-e2e-414 DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
@PratikMahajan: Jira Issue OCPBUGS-9745: All pull requests linked via external trackers have merged: Jira Issue OCPBUGS-9745 has been moved to the MODIFIED state. DetailsIn response to this: Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
No description provided.