Skip to content

Conversation

@PratikMahajan
Copy link
Contributor

No description provided.

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 21, 2023
@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 21, 2023
}

func nameGraphDataPod(instance *cv1.UpdateService) string {
return instance.Name + "-graph-data-init-sha"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this be -tag-digest or something, to avoid being longer than the 14-character -policy-engine or -graph-builder? I know we've had name-is-too-long issues in the past, and I'm not clear on what happens if the name returned by these helpers ends up growing long.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, handling this per-instance means that if we have multiple UpdateService sharing the same graph-data pullspec, we'll be launching redundant Pods to figure out that digest. In practice, maybe that situation is rare enough that we don't have to worry about trying to centralize tag-to-digest lookup at the operator level where results can be shared between multiple instances.


if err != nil && apiErrors.IsNotFound(err) || found.Status.Phase == "Succeeded"{
reqLogger.Info("Creating Pod", "Namespace", pod.Namespace, "Name", pod.Name)
err := r.Client.Create(ctx, pod)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably need a Delete in the Succeeded case. And also maybe a delete in the "we used to be a by-tag reference, but now the graph-data pullspec is a by-digest reference, so we don't need to bother launching Pods anymore" case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added a delete on success condition

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a chance of racing with someone else (is it possible that we run multiple replicas?) creating the pod? It is possible that we Get, see nothing, something else creates the pod, we create and fail with conflict... I'm fine if we say that that cannot happen realistically.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the way we're avoiding that here is.

  1. start a pod with time.wait for 5 min.
  2. once the wait finishes, the pod goes into success condition
  3. we reconcile every 5 minutes. and when we see that the pod is completed, we'll just delete it.
  4. deleting will kick off the reconcile again to create the pod.

because, we're creating a pod, it can only have 1 pod running at a time with the given name.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because, we're creating a pod, it can only have 1 pod running at a time with the given name

Petr was pointing out that other actors (e.g. a runaway second cincinnati-operator replica) might also create a pod with the same name. And actually, because ensureGraphDataSHA is checking a single UpdateService, it seems like an operator managing several by-tag UpdateService would probably struggle like:

  1. Reconciling UpdateService A, create a graph-data-tag-digest pod.
  2. Reconciling UpdateService B, pod is still running, return "", nil.
  3. Reconciling UpdateService A, pod is still running, return "", nil.
  4. Reconciling UpdateService B, pod is Succeeded, return the image ID. But this is bad, because that image ID was for the by-tag pullspec from UpdateService A, not the pullspec from B.

Instead, you'll want to somehow uniquify by pullspec-under test. Perhaps SetControllerReference allows you to clearly associate a pod with the appropriate UpdateService. And then Kube's built-in garbage collection will handle deleting any pods whose UpdateService is deleted.

@PratikMahajan PratikMahajan force-pushed the graph-data-restart branch 3 times, most recently from 68c2f8a to d2fbb66 Compare March 1, 2023 18:51
@PratikMahajan PratikMahajan changed the title [wip] add graph-data image annotation to operand pod add graph-data image annotation to operand pod Mar 1, 2023
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 1, 2023
@LalatenduMohanty
Copy link
Member

/hold putting a temporary hold to make sure it does not get merged without my review-comments.

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 1, 2023

if err != nil && apiErrors.IsNotFound(err) || found.Status.Phase == "Succeeded"{
reqLogger.Info("Creating Pod", "Namespace", pod.Namespace, "Name", pod.Name)
err := r.Client.Create(ctx, pod)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a chance of racing with someone else (is it possible that we run multiple replicas?) creating the pod? It is possible that we Get, see nothing, something else creates the pod, we create and fail with conflict... I'm fine if we say that that cannot happen realistically.

@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 6, 2023
@PratikMahajan PratikMahajan force-pushed the graph-data-restart branch 2 times, most recently from 8221393 to 49cefa5 Compare March 6, 2023 19:43
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 6, 2023
@PratikMahajan PratikMahajan force-pushed the graph-data-restart branch 2 times, most recently from e7a125d to 6a19659 Compare March 9, 2023 19:34
@PratikMahajan PratikMahajan changed the title add graph-data image annotation to operand pod OCPBUGS-9745: add graph-data image annotation to operand pod Jun 1, 2023
@openshift-ci-robot openshift-ci-robot added jira/severity-moderate Referenced Jira bug's severity is moderate for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. labels Jun 1, 2023
@openshift-ci-robot openshift-ci-robot added the jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. label Jun 1, 2023
@openshift-ci-robot
Copy link

@PratikMahajan: This pull request references Jira Issue OCPBUGS-9745, which is invalid:

  • expected the bug to target the "4.14.0" version, but it targets "4.13.0" instead

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

Details

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@wking
Copy link
Member

wking commented Jun 1, 2023

/jira refresh

@openshift-ci-robot openshift-ci-robot added jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. and removed jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Jun 1, 2023
@openshift-ci-robot
Copy link

@wking: This pull request references Jira Issue OCPBUGS-9745, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.14.0) matches configured target version for branch (4.14.0)
  • bug is in the state ASSIGNED, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @jiajliu

Details

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci openshift-ci bot requested a review from jiajliu June 1, 2023 16:34
@LalatenduMohanty
Copy link
Member

/hold cancel

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 8, 2023
@PratikMahajan
Copy link
Contributor Author

/hold

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 8, 2023
@PratikMahajan
Copy link
Contributor Author

/hold cancel

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 26, 2023
we're tuning up a graph-data pod that refreshes after every
5 min and fetched the latest graph-data image digest which
we use to keep a track if the image tag was updated.

Logic:
1. we reconcile every 5 minutes
2. We tune up a graph-data pod which runs for 5 min
3. after the graph-data pod is in succeeded condition, the
reconcile loop will delete it to create a new one. This is
where we get the latest digest for the image.
4. we add this image as an annotation to the deployment.
So, everytime the annotation/image changes, it'll tune up
new operand pods.

We'll not be creating the graph-data pod if the image is
referenced by digest instead of tag.
We're creating just one graph-data pod for OSUS.
Copy link
Member

@petr-muller petr-muller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/retest

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jun 29, 2023
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jun 29, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: petr-muller, PratikMahajan

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:
  • OWNERS [PratikMahajan,petr-muller]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@petr-muller
Copy link
Member

quay is sad, failures are likely related

@openshift-ci-robot
Copy link

/retest-required

Remaining retests: 0 against base HEAD 25c8b26 and 2 for PR HEAD 8f7c466 in total

@PratikMahajan
Copy link
Contributor Author

/retest

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jul 5, 2023

@PratikMahajan: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/operator-e2e 6a19659 link true /test operator-e2e

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@PratikMahajan
Copy link
Contributor Author

/override ci/prow/operator-e2e-latest-osus-414 ci/prow/operator-e2e-latest-osus-414
issues with cluster coming up, tests on other OCP versions have passed, OSUS is OCP version agnostic and it would be safe to override the tests!

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jul 6, 2023

@PratikMahajan: Overrode contexts on behalf of PratikMahajan: ci/prow/operator-e2e-latest-osus-414

Details

In response to this:

/override ci/prow/operator-e2e-latest-osus-414 ci/prow/operator-e2e-latest-osus-414
issues with cluster coming up, tests on other OCP versions have passed, OSUS is OCP version agnostic and it would be safe to override the tests!

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@PratikMahajan
Copy link
Contributor Author

/override ci/prow/operator-e2e-414

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jul 6, 2023

@PratikMahajan: Overrode contexts on behalf of PratikMahajan: ci/prow/operator-e2e-414

Details

In response to this:

/override ci/prow/operator-e2e-414

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-merge-robot openshift-merge-robot merged commit 20dfabf into openshift:master Jul 6, 2023
@openshift-ci-robot
Copy link

@PratikMahajan: Jira Issue OCPBUGS-9745: All pull requests linked via external trackers have merged:

Jira Issue OCPBUGS-9745 has been moved to the MODIFIED state.

Details

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/severity-moderate Referenced Jira bug's severity is moderate for the branch this PR is targeting. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants