Skip to content

Conversation

@wking
Copy link
Member

@wking wking commented May 2, 2022

I'd dropped this in cc9292a (#400), claiming:

There's no object status for CRDs or DaemonSets that marks "we are really hurting". The v1.18.0 Kubernetes CRD and DaemonSet controllers do not set any conditions in their operand status (although the API for those conditions exists [2,3]). With this commit, we have very minimal wait logic for either. Sufficiently unhealthy DaemonSet should be reported on via their associated ClusterOperator, and sufficiently unhealthy CRD should be reported on when we fail to push any custom resources consuming them (Task.Run retries will give the API server time to ready itself after accepting a CRD update before the CVO fails its sync cycle).

But from upstream docs:

It might take a few seconds for the endpoint to be created. You can watch the Established condition of your CustomResourceDefinition to be true or watch the discovery information of the API server for your resource to show up.

So I was correct that we will hear about CRD issues when we fail to push a dependent custom resource. But I was not correct in claiming that the CRD controller set no conditions. And the code I removed in cc9292a was in fact looking at the Established condition already.

This commit restores the Established check, but without the previous PollImmediateUntil wait. We'll see how it plays in CI. If we spend too long in between failed sync cycles waiting on new CRDs to show up, we can look at restoring a short wait.

@openshift-ci openshift-ci bot requested review from jottofar and vrutkovs May 2, 2022 22:41
@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 2, 2022
@wking wking force-pushed the restore-CRD-established-check branch 2 times, most recently from c1328ad to 4998c6c Compare May 3, 2022 17:22
@wking
Copy link
Member Author

wking commented May 3, 2022

Update:

Cluster did not complete upgrade: timed out waiting for the condition: Could not update customresourcedefinition "clusteroperators.config.openshift.io" (4 of 792)

From the CVO logs:

$ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/openshift_cluster-version-operator/771/pull-ci-openshift-cluster-version-operator-master-e2e-agnostic-upgrade/1521351550553821184/artifacts/e2e-agnostic-upgrade/gather-extra/artifacts/pods/openshift-cluster-version_cluster-version-operator-5f5f9cd4b4-gwq8w_cluster-version-operator.log | grep customresourcedef | tail -n2
E0503 08:14:05.090838       1 task.go:112] error running apply for customresourcedefinition "clusteroperators.config.openshift.io" (4 of 792): CustomResourceDefinition clusteroperators.config.openshift.io does not declare an Established status condition
E0503 08:14:20.932028       1 task.go:112] error running apply for customresourcedefinition "clusteroperators.config.openshift.io" (4 of 792): CustomResourceDefinition clusteroperators.config.openshift.io does not declare an Established status condition

From the must-gather:

$ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/openshift_cluster-version-operator/771/pull-ci-openshift-cluster-version-operator-master-e2e-agnostic-upgrade/1521351550553821184/artifacts/e2e-agnostic-upgrade/gather-must-gather/artifacts/must-gather.tar | tar xOz registry-build03-ci-openshift-org-ci-op-sbcr982j-stable-initial-sha256-862bbab55fec772e032e10f2765e9221f8a743e6fbcdd2d376bbe10a9e875c79/cluster-scoped-resources/apiextensions.k8s.io/customresourcedefinitions/clusteroperators.config.openshift.io.yaml | yaml2json | jq -r '.status.conditions[] | .lastTransitionTime + " " + .type + "=" + .status + " " + .reason + ": " + .message' 
2022-05-03T05:07:37Z NamesAccepted=True NoConflicts: no conflicts found
2022-05-03T05:07:37Z Established=True InitialNamesAccepted: the initial names have been accepted

Looks like it's declaring Established to me. I've pushed c1328ad -> 4998c6c to log the conditions we did get.

@wking
Copy link
Member Author

wking commented May 3, 2022

upgrade failed to install. Unrelated to this PR, because the installed version is the base branch.

/test e2e-agnostic-upgrade

@wking
Copy link
Member Author

wking commented May 4, 2022

New run failed the same way, and confirms that the CVO code is not seeing the conditions that show up in the must-gather:

$ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/openshift_cluster-version-operator/771/pull-ci-openshift-cluster-version-operator-master-e2e-agnostic-upgrade/1521622968680058880/artifacts/e2e-agnostic-upgrade/gather-extra/artifacts/pods/openshift-cluster-version_cluster-version-operator-644b6dd49d-rkdzj_cluster-version-operator.log | grep 'does not declare' | tail -n2
E0504 03:05:53.570755       1 task.go:112] error running apply for customresourcedefinition "clusteroperators.config.openshift.io" (4 of 791): CustomResourceDefinition clusteroperators.config.openshift.io does not declare an Established status condition: []
I0504 03:06:13.358865       1 sync_worker.go:1098] Update error 4 of 791: UpdatePayloadFailed Could not update customresourcedefinition "clusteroperators.config.openshift.io" (4 of 791) (*errors.errorString: CustomResourceDefinition clusteroperators.config.openshift.io does not declare an Established status condition: [])
$ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/openshift_cluster-version-operator/771/pull-ci-openshift-cluster-version-operator-master-e2e-agnostic-upgrade/1521622968680058880/artifacts/e2e-agnostic-upgrade/gather-must-gather/artifacts/must-gather.tar | tar xOz registry-build03-ci-openshift-org-ci-op-dwkpjrf0-stable-initial-sha256-862bbab55fec772e032e10f2765e9221f8a743e6fbcdd2d376bbe10a9e875c79/cluster-scoped-resources/apiextensions.k8s.io/customresourcedefinitions/clusteroperators.config.openshift.io.yaml | yaml2json | jq -r '.status.conditions[] | .lastTransitionTime + " " + .type + "=" + .status + " " + .reason + ": " + .message'
2022-05-03T23:08:12Z NamesAccepted=True NoConflicts: no conflicts found
2022-05-03T23:08:12Z Established=True InitialNamesAccepted: the initial names have been accepted

@wking wking force-pushed the restore-CRD-established-check branch from 4998c6c to 11b7ecb Compare May 4, 2022 07:26
wking added a commit to wking/cluster-version-operator that referenced this pull request May 4, 2022
I'd dropped this in cc9292a (lib/resourcebuilder: Replace wait-for
with single-shot "is it alive now?", 2020-07-30, openshift#400), claiming:

  There's no object status for CRDs or DaemonSets that marks "we are
  really hurting".  The v1.18.0 Kubernetes CRD and DaemonSet
  controllers do not set any conditions in their operand status
  (although the API for those conditions exists [2,3]).  With this
  commit, we have very minimal wait logic for either.  Sufficiently
  unhealthy DaemonSet should be reported on via their associated
  ClusterOperator, and sufficiently unhealthy CRD should be reported
  on when we fail to push any custom resources consuming them
  (Task.Run retries will give the API server time to ready itself
  after accepting a CRD update before the CVO fails its sync cycle).

But from [1]:

  > It might take a few seconds for the endpoint to be created. You
  > can watch the Established condition of your
  > CustomResourceDefinition to be true or watch the discovery
  > information of the API server for your resource to show up.

So I was correct that we will hear about CRD issues when we fail to
push a dependent custom resource.  But I was not correct in claiming
that the CRD controller set no conditions.  I was probably confused by
e8ffccb (lib: Add autogeneration for some resource* functionality,
2020-07-29, openshift#420), which broke the health-check inputs as described in
002591d (lib/resourcebuilder: Use actual resource in check*Health
calls, 2022-05-03, openshift#771).  The code I removed in cc9292a was in
fact looking at the Established condition already.

This commit restores the Established check, but without the previous
PollImmediateUntil wait.

[1]: https://kubernetes.io/docs/tasks/extend-kubernetes/custom-resources/custom-resource-definitions/#create-a-customresourcedefinition
@wking wking force-pushed the restore-CRD-established-check branch from 11b7ecb to 38596fc Compare May 4, 2022 07:30
wking added a commit to wking/cluster-version-operator that referenced this pull request May 4, 2022
I'd dropped this in cc9292a (lib/resourcebuilder: Replace wait-for
with single-shot "is it alive now?", 2020-07-30, openshift#400), claiming:

  There's no object status for CRDs or DaemonSets that marks "we are
  really hurting".  The v1.18.0 Kubernetes CRD and DaemonSet
  controllers do not set any conditions in their operand status
  (although the API for those conditions exists [2,3]).  With this
  commit, we have very minimal wait logic for either.  Sufficiently
  unhealthy DaemonSet should be reported on via their associated
  ClusterOperator, and sufficiently unhealthy CRD should be reported
  on when we fail to push any custom resources consuming them
  (Task.Run retries will give the API server time to ready itself
  after accepting a CRD update before the CVO fails its sync cycle).

But from [1]:

  > It might take a few seconds for the endpoint to be created. You
  > can watch the Established condition of your
  > CustomResourceDefinition to be true or watch the discovery
  > information of the API server for your resource to show up.

So I was correct that we will hear about CRD issues when we fail to
push a dependent custom resource.  But I was not correct in claiming
that the CRD controller set no conditions.  I was probably confused by
e8ffccb (lib: Add autogeneration for some resource* functionality,
2020-07-29, openshift#420), which broke the health-check inputs as described in
002591d (lib/resourcebuilder: Use actual resource in check*Health
calls, 2022-05-03, openshift#771).  The code I removed in cc9292a was in
fact looking at the Established condition already.

This commit restores the Established check, but without the previous
PollImmediateUntil wait.

[1]: https://kubernetes.io/docs/tasks/extend-kubernetes/custom-resources/custom-resource-definitions/#create-a-customresourcedefinition
@wking wking force-pushed the restore-CRD-established-check branch from 38596fc to 978ac53 Compare May 4, 2022 07:31
wking added a commit to wking/cluster-version-operator that referenced this pull request May 4, 2022
I'd dropped this in cc9292a (lib/resourcebuilder: Replace wait-for
with single-shot "is it alive now?", 2020-07-30, openshift#400), claiming:

  There's no object status for CRDs or DaemonSets that marks "we are
  really hurting".  The v1.18.0 Kubernetes CRD and DaemonSet
  controllers do not set any conditions in their operand status
  (although the API for those conditions exists [2,3]).  With this
  commit, we have very minimal wait logic for either.  Sufficiently
  unhealthy DaemonSet should be reported on via their associated
  ClusterOperator, and sufficiently unhealthy CRD should be reported
  on when we fail to push any custom resources consuming them
  (Task.Run retries will give the API server time to ready itself
  after accepting a CRD update before the CVO fails its sync cycle).

But from [1]:

  > It might take a few seconds for the endpoint to be created. You
  > can watch the Established condition of your
  > CustomResourceDefinition to be true or watch the discovery
  > information of the API server for your resource to show up.

So I was correct that we will hear about CRD issues when we fail to
push a dependent custom resource.  But I was not correct in claiming
that the CRD controller set no conditions.  I was probably confused by
e8ffccb (lib: Add autogeneration for some resource* functionality,
2020-07-29, openshift#420), which broke the health-check inputs as described in
002591d (lib/resourcebuilder: Use actual resource in check*Health
calls, 2022-05-03, openshift#771).  The code I removed in cc9292a was in
fact looking at the Established condition already.

This commit restores the Established check, but without the previous
PollImmediateUntil wait.

[1]: https://kubernetes.io/docs/tasks/extend-kubernetes/custom-resources/custom-resource-definitions/#create-a-customresourcedefinition
@wking wking force-pushed the restore-CRD-established-check branch from 978ac53 to ebb1271 Compare May 4, 2022 07:41
@wking wking changed the title lib/resourcebuilder/apiext: Restore check for Established=True CRDs Bug 2081895: lib/resourcebuilder/apiext: Restore check for Established=True CRDs May 4, 2022
@openshift-ci openshift-ci bot added bugzilla/severity-medium Referenced Bugzilla bug's severity is medium for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. labels May 4, 2022
@openshift-ci
Copy link
Contributor

openshift-ci bot commented May 4, 2022

@wking: This pull request references Bugzilla bug 2081895, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target release (4.11.0) matches configured target release for branch (4.11.0)
  • bug is in the state NEW, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)

Requesting review from QA contact:
/cc @jiajliu

Details

In response to this:

Bug 2081895: lib/resourcebuilder/apiext: Restore check for Established=True CRDs

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci openshift-ci bot requested a review from jiajliu May 4, 2022 23:50
)

// ApplyAPIServicev1 applies the required API service to the cluster.
func ApplyAPIServicev1(ctx context.Context, client apiregclientv1.APIServicesGetter, required *apiregv1.APIService, reconciling bool) (*apiregv1.APIService, bool, error) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just drop DeleteAPIServicev1? Seems like it was added in 0afb8a8 (#438), but that it has never had any consumers.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd leave it in so our resourcebuilder continues to support apply and delete for all the resources.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But looking at this further, this is saying CVO will never apply this resource. Did CVO ever apply the resource such that there could ever be an orphan? If not then yeah, I suppose there's no reason for the delete.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we dropped support for APIService in 5681a70 (#566) saying no 4.8 manifests were using it. Auditing recent z-stream tips:

$ oc adm release extract --to 4.11 quay.io/openshift-release-dev/ocp-release-nightly@sha256:080e5cf5e3e043ac0877b8f545ba2b596016f3fd76f3f457d15060603b3615e1
$ oc adm release extract --to 4.10 quay.io/openshift-release-dev/ocp-release:4.10.13-x86_64
$ oc adm release extract --to 4.9 quay.io/openshift-release-dev/ocp-release:4.9.32-x86_64
$ oc adm release extract --to 4.8 quay.io/openshift-release-dev/ocp-release:4.8.39-x86_64
$ grep -r 'kind: APIService' 4.*
...no hits...

I guess I'll pivot to re-dropping support, and explain that^.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've pushed ebb1271 -> 6e12b9f with a new commit that re-drops APIService handling.

if _, _, err := resourceapply.ApplyDeploymentv1(ctx, b.appsClientv1, typedObject, reconcilingMode); err != nil {
if actual, _, err := resourceapply.ApplyDeploymentv1(ctx, b.appsClientv1, typedObject, reconcilingMode); err != nil {
return err
} else if actual != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So before your change if a resource was IsCreateOnly this would have segfault'ed? Seems like a pretty big hole.

Should the nil check be in each of the functions instead?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, before my change we were passing typedObject through. That manifest content wasn't nil, but it didn't have anything useful in status either. I'm agnostic about nil checks or panics inside the check*Health functions; either way it would be a pretty serious bug to pass nil through to the health checker, and either way I expect we'll hear about it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK. Did not notice that param change.

@jottofar
Copy link
Contributor

jottofar commented May 9, 2022

/hold
/lgtm

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 9, 2022
@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label May 9, 2022
wking added 3 commits May 9, 2022 11:56
The original APIService reconciliation landed in 662e182 (handle
APIService, Service objects, 2018-09-27, openshift#26).  We definitely
reconcile a bunch of Service manifests (e.g. for serving operator
metrics, including the CVO's own Service), but it's not clear what the
use case was for APIService.

DeleteAPIServicev1 landed in 0afb8a8 (Add a manifest annotation to
be used for object deletion, 2020-08-17, openshift#438), but was never
consumed.  We dropped APIService reconciliation support in 5681a70
(Drop APIService support, 2021-06-10, openshift#566).  This commit drops the
unused DeleteAPIServicev1.  Auditing z-stream tips from 4.3 through
4.11:

  $ oc adm release extract --to 4.11 quay.io/openshift-release-dev/ocp-release-nightly@sha256:080e5cf5e3e043ac0877b8f545ba2b596016f3fd76f3f457d15060603b3615e1
  $ oc adm release extract --to 4.10 quay.io/openshift-release-dev/ocp-release:4.10.13-x86_64
  $ oc adm release extract --to 4.9 quay.io/openshift-release-dev/ocp-release:4.9.32-x86_64
  $ oc adm release extract --to 4.8 quay.io/openshift-release-dev/ocp-release:4.8.39-x86_64
  $ oc adm release extract --to 4.7 quay.io/openshift-release-dev/ocp-release:4.7.50-x86_64
  $ oc adm release extract --to 4.6 quay.io/openshift-release-dev/ocp-release:4.6.57-x86_64
  $ oc adm release extract --to 4.5 quay.io/openshift-release-dev/ocp-release:4.5.41-x86_64
  $ oc adm release extract --to 4.4 quay.io/openshift-release-dev/ocp-release:4.4.33-x86_64
  $ oc adm release extract --to 4.3 quay.io/openshift-release-dev/ocp-release:4.3.40-x86_64
  $ grep -r 'kind: APIService' 4.*
  ...no hits...
I'd accidentally broken this in e8ffccb (lib: Add autogeneration
for some resource* functionality, 2020-07-29, openshift#420), when I moved the
health checks out of the Do methods and began passing typedObject into
them.  typedObject is the release manifest, which lacks the status
attributes we want the health checks to cover.

This also catches the generation script up with some past manual
changes:

* 0afb8a8 (Add a manifest annotation to be used for object
  deletion, 2020-08-17, openshift#438).
* 05e1af7 (Log resource diffs on update only in reconcile mode,
  2021-07-13, openshift#628).
I'd dropped this in cc9292a (lib/resourcebuilder: Replace wait-for
with single-shot "is it alive now?", 2020-07-30, openshift#400), claiming:

  There's no object status for CRDs or DaemonSets that marks "we are
  really hurting".  The v1.18.0 Kubernetes CRD and DaemonSet
  controllers do not set any conditions in their operand status
  (although the API for those conditions exists [2,3]).  With this
  commit, we have very minimal wait logic for either.  Sufficiently
  unhealthy DaemonSet should be reported on via their associated
  ClusterOperator, and sufficiently unhealthy CRD should be reported
  on when we fail to push any custom resources consuming them
  (Task.Run retries will give the API server time to ready itself
  after accepting a CRD update before the CVO fails its sync cycle).

But from [1]:

  > It might take a few seconds for the endpoint to be created. You
  > can watch the Established condition of your
  > CustomResourceDefinition to be true or watch the discovery
  > information of the API server for your resource to show up.

So I was correct that we will hear about CRD issues when we fail to
push a dependent custom resource.  But I was not correct in claiming
that the CRD controller set no conditions.  I was probably confused by
e8ffccb (lib: Add autogeneration for some resource* functionality,
2020-07-29, openshift#420), which broke the health-check inputs as described in
002591d (lib/resourcebuilder: Use actual resource in check*Health
calls, 2022-05-03, openshift#771).  The code I removed in cc9292a was in
fact looking at the Established condition already.

This commit restores the Established check, but without the previous
PollImmediateUntil wait.

[1]: https://kubernetes.io/docs/tasks/extend-kubernetes/custom-resources/custom-resource-definitions/#create-a-customresourcedefinition
@wking wking force-pushed the restore-CRD-established-check branch from ebb1271 to 6e12b9f Compare May 9, 2022 19:05
@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label May 9, 2022
@jottofar
Copy link
Contributor

jottofar commented May 9, 2022

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label May 9, 2022
@openshift-ci
Copy link
Contributor

openshift-ci bot commented May 9, 2022

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jottofar, wking

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@wking
Copy link
Member Author

wking commented May 10, 2022

PDB/DNS stuff is unrelated.

/override ci/prow/e2e-agnostic-upgrade

@wking
Copy link
Member Author

wking commented May 10, 2022

/hold cancel

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 10, 2022
@openshift-ci
Copy link
Contributor

openshift-ci bot commented May 10, 2022

@wking: Overrode contexts on behalf of wking: ci/prow/e2e-agnostic-upgrade

Details

In response to this:

PDB/DNS stuff is unrelated.

/override ci/prow/e2e-agnostic-upgrade

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented May 10, 2022

@wking: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-merge-robot openshift-merge-robot merged commit ee40ed5 into openshift:master May 10, 2022
@openshift-ci
Copy link
Contributor

openshift-ci bot commented May 10, 2022

@wking: All pull requests linked via external trackers have merged:

Bugzilla bug 2081895 has been moved to the MODIFIED state.

Details

In response to this:

Bug 2081895: lib/resourcebuilder/apiext: Restore check for Established=True CRDs

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@wking wking deleted the restore-CRD-established-check branch May 10, 2022 03:38
wking added a commit to wking/cluster-version-operator that referenced this pull request May 13, 2022
I'd missed these Gets in 0833bba (lib/resourcebuilder: Use actual
resource in check*Health calls, 2022-05-03, openshift#771).  What I'd thought
was happening:

1. Pass typedObject into check*Health.
2. No status on typedObject, so check*Health passed, even if the
   in-cluster resource was sad.

What was actually happening:

1. Pass typedObject into check*Health.
2. check*Health calls Get to see what's going on in the cluster.
3. Health check appropriately checks the health of the in-cluster resource.

However, now that 0833bba is passing in the just-retrieved
in-cluster resource, the check*Health Get call is an API call that we
don't need to make.  Dropping it here saves us and the API server a
small amount of CPU cycles and network bandwidth.
wking added a commit to wking/cluster-version-operator that referenced this pull request May 13, 2022
I'd missed these Gets in 0833bba (lib/resourcebuilder: Use actual
resource in check*Health calls, 2022-05-03, openshift#771).  What I'd thought
was happening:

1. Pass typedObject into check*Health.
2. No status on typedObject, so check*Health passed, even if the
   in-cluster resource was sad.

What was actually happening:

1. Pass typedObject into check*Health.
2. check*Health calls Get to see what's going on in the cluster.
3. Health check appropriately checks the health of the in-cluster resource.

However, now that 0833bba is passing in the just-retrieved
in-cluster resource, the check*Health Get call is an API call that we
don't need to make.  Dropping it here saves us and the API server a
small amount of CPU cycles and network bandwidth.
wking added a commit to wking/cluster-version-operator that referenced this pull request May 13, 2022
I'd missed these Gets in 0833bba (lib/resourcebuilder: Use actual
resource in check*Health calls, 2022-05-03, openshift#771).  What I'd thought
was happening:

1. Pass typedObject into check*Health.
2. No status on typedObject, so check*Health passed, even if the
   in-cluster resource was sad.

What was actually happening:

1. Pass typedObject into check*Health.
2. check*Health calls Get to see what's going on in the cluster.
3. Health check appropriately checks the health of the in-cluster resource.

However, now that 0833bba is passing in the just-retrieved
in-cluster resource, the check*Health Get call is an API call that we
don't need to make.  Dropping it here saves us and the API server a
small amount of CPU cycles and network bandwidth.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. bugzilla/severity-medium Referenced Bugzilla bug's severity is medium for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants