availableupdates: do not reset lastTransitionTime on unchanged status #964

petr-muller · 2023-09-05T16:56:06Z

The code in evaluateConditionalUpdates correctly uses SetStatusCondition to set conditions, which only updates the
LastTransitionTime field when Status differs between the original and updated state. Previously though, the original state always contained empty conditions, because conditional updates are always obtained from OSUS and the fresh structure was never updated with existing conditions from the in-cluster status.

petr-muller · 2023-09-06T12:40:38Z

/retest

petr-muller · 2023-09-07T12:00:34Z

/test e2e-agnostic-ovn-upgrade-out-of-change

DavidHurta

Overall looks good and functional!

My only concern is the used comparison when searching for an update in updates.

Other than that only nitpicks.

(Note: Only tested locally using the go test subcommand)

pkg/cvo/availableupdates.go

pkg/cvo/availableupdates_test.go

pkg/cvo/availableupdates.go

petr-muller · 2023-09-07T15:28:23Z

/hold
Thanks for the review! Will address.

petr-muller · 2023-09-08T13:10:20Z

@Davoska feedback addressed in https://github.com/openshift/cluster-version-operator/compare/db205e7f884786b064af3c6e1bce8c43be253907..5606ae947e2b3f0e37a08e1ee0a0ce5cb26a849a, PTAL

/hold cancel

DavidHurta

I really like the new changes, thanks! 💪

Just one smallest nitpick of all nitpicks, feel free to unhold or address.

/hold
/lgtm

pkg/cvo/availableupdates_test.go

The code in `evaluateConditionalUpdates` correctly uses `SetStatusCondition` to set conditions, which only updates the `LastTransitionTime` field when `Status` differs between the original and updated state. Previously though, the original state always contained empty conditions, because conditional updates are always obtained from OSUS and the fresh structure was never updated with existing conditions from the in-cluster status.

DavidHurta · 2023-09-08T15:06:35Z

/lgtm

DavidHurta · 2023-09-08T15:11:01Z

/unhold

DavidHurta

I apologize for the late re-review after accepting the changes.

/hold

DavidHurta · 2023-09-09T00:28:21Z

pkg/cvo/availableupdates_test.go

+							Type:   "PromQL",
+							PromQL: &configv1.PromQLClusterCondition{PromQL: string(evalToYes)},
+						},
+					},
+				},
+			},
+			Conditions: []metav1.Condition{
+				{
+					Type:    "Recommended",
+					Status:  metav1.ConditionFalse,


Here we hard-code the field Type to the value of "PromQL", the field PromQL to the value of &configv1.PromQLClusterCondition{PromQL: string(evalToYes)}, and the field Status to the value of metav1.ConditionFalse even though they should be set by the function's parameters ruleType, promql, and by the expected value for the Status depending on the Match result.

I think Petr has things the way he does so he can excercise "what we already know about a conditional update for a particular target, and the new update service response comes in with new information for that same image?". So maybe we move from PromQL to Always like openshift/cincinnati-graph-data#3590. The test suite should confirm that lastTransitionTime is being preserved if we transition from "outgoing PromQL matched and so does the incoming Always" and that the transition time resets if we transition from "outgoing PromQL did not match but the incoming Always does".

I think Petr has things the way he does so he can excercise "what we already know about a conditional update for a particular target, and the new update service response comes in with new information for that same image?".

Oh, I didn't think of it like that, good point! Any of the options seems reasonable.

Unfortunately I cannot claim the intent that Trevor assumes. The truth is that I finished the testing code knowing it is probably a bit rough but I called it day thinking it's not worth the effort to polish it entirely. David is right that this method pretends to be a reusable configurable helper that can prepare fixtures for different situations but in reality it does not. I'll spend some time looking at how I could clean it up even further, maybe taking other Trevor's review items into account as well.

Thanks for being thorough!

I reworked the tests using the existing mock that Trevor suggested, which made some of the code above obsolete. I have dropped osusWithSingleConditionalEdge's illusion of being reusable, hardcoded its content (the value of this helper is that is it creates the complicated but consistent fixture data). When someone wants to add tests they can generalize the code following their future use case.

pkg/cvo/availableupdates_test.go

wking · 2023-09-09T06:42:46Z

pkg/cvo/availableupdates_test.go

+}
+
+func osusWithSingleConditionalEdge(from, to string, ruleType clusterConditionRuleType, promql clusterConditionRuleFakePromql) ([]configv1.ConditionalUpdate, *httptest.Server) {
+	osus := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {


nit: we could possibly simplify this test harness a bit if we pull this out into a new function, because the upstream defaulting, available-updates throttling, and proxy transport lookup don't seem as interesting to cover in tests. You could create a dummy client to pass through to calculateAvailableUpdatesStatus like:

type MockRoundTripper func(r *http.Request) *http.Response func (f MockRoundTripper) RoundTrip(r *http.Request) (*http.Response, error) { return f(r), nil }

and:

httpClient := &http.Client{ Transport: MockRoundTripper(func(r *http.Request) *http.Response { return &http.Response{ StatusCode: http.StatusOK, Body: io.NopCloser(strings.NewReader(fmt.Sprintf(`{ "nodes": [{"version": "%s", "payload": "payload/%s"}, {"version": "%s", "payload": "payload/%s"}], "conditionalEdges": [ { "edges": [{"from": "%s", "to": "%s"}], "risks": [ { "url": "https://example.com/%s", "name": "FourFiveSix", "message": "Four Five Five is just fine", "matchingRules": [{"type": "%s", "promql": { "promql": "%s"}}] } ] } ] } `, from, from, to, to, from, to, to, ruleType, promql ))), }, }, }

or some such without needing to set up an HTTP server that needs to be closed later.

I'm not sure if it's worth the effort to pivot, now that you've already figured out all the plumbing to get this up to the existing level, but it might be worth poking at to see if the pivoting-this-pull cost seems like it might be worth reducing the onboarding-the-next-dev-who-needs-to-understand-this-test-suite cost.

I have considered doing something like that, but the tested functionality felt like deserving a higher-level integration test so I thought test HTTP servers are cheap in Go with httptest. The real complexity is in preparing all the data and expected fixtures.

But I'll reconsider, with further development and feedback the code now is different than when started. Thanks for the suggestion!

I gave the pivot a quick shot and I think we're not really saving that much complexity, the annoying bits with "set up the input/output" would need to stay, we'd just need different plumbing.

But I actually discovered that I like using httptest here. It allows testing the full method and it quite explicitly communicates that the important input comes from a server. The plumbing needed is basically identical to mocking it in the client transport. I liked that I could get rid of the annoying queue stub though :)

petr-muller · 2023-09-11T13:39:50Z

/label tide/merge-method-squash

pkg/cvo/availableupdates_test.go

petr-muller · 2023-09-12T13:30:57Z

[sig-arch] events should not repeat pathologically for ns/openshift-etcd
{  1 events happened too frequently  event happened 36 times, something is wrong: ns/openshift-etcd pod/etcd-guard-ci-op-3w2kvwtx-deec6-m6cvk-master-1 node/ci-op-3w2kvwtx-deec6-m6cvk-master-1 hmsg/cc0a8bd52a - pathological/true reason/ProbeError Readiness probe error: Get "https://10.0.0.6:9980/readyz": net/http: request canceled (Client.Timeout exceeded while awaiting headers) result=reject  body:

looks unrelated
/retest

petr-muller · 2023-09-12T13:45:41Z

/hold cancel

petr-muller · 2023-09-12T17:56:33Z

[sig-node] nodes should not go unready after being upgraded and go unready only once
{  1 nodes violated upgrade expectations:  Node ci-op-zf3sq2f7-deec6-ch4ln-worker-centralus2-vd5kz went unready multiple times: 2023-09-12T15:51:44Z, 2023-09-12T15:54:15Z Node ci-op-zf3sq2f7-deec6-ch4ln-worker-centralus2-vd5kz went ready multiple times: 2023-09-12T15:52:44Z, 2023-09-12T15:54:18Z

Looks unrelated, I'll override next unrelated failure :P
/retest

petr-muller · 2023-09-13T10:06:10Z

: [sig-arch] events should not repeat pathologically for ns/openshift-etcd
{  2 events happened too frequently  event happened 29 times, something is wrong: ns/openshift-etcd pod/etcd-guard-ci-op-p9cdqlzn-deec6-nnqcs-master-1 node/ci-op-p9cdqlzn-deec6-nnqcs-master-1 hmsg/1004945f03 - pathological/true reason/ProbeError Readiness probe error: Get "https://10.0.0.7:9980/readyz": net/http: request canceled (Client.Timeout exceeded while awaiting headers) result=reject

/override ci/prow/e2e-agnostic-ovn-upgrade-into-change

openshift-ci · 2023-09-13T10:06:27Z

@petr-muller: Overrode contexts on behalf of petr-muller: ci/prow/e2e-agnostic-ovn-upgrade-into-change

Details

In response to this:

: [sig-arch] events should not repeat pathologically for ns/openshift-etcd
{  2 events happened too frequently  event happened 29 times, something is wrong: ns/openshift-etcd pod/etcd-guard-ci-op-p9cdqlzn-deec6-nnqcs-master-1 node/ci-op-p9cdqlzn-deec6-nnqcs-master-1 hmsg/1004945f03 - pathological/true reason/ProbeError Readiness probe error: Get "https://10.0.0.7:9980/readyz": net/http: request canceled (Client.Timeout exceeded while awaiting headers) result=reject

/override ci/prow/e2e-agnostic-ovn-upgrade-into-change

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

shellyyang1989 · 2023-09-14T06:57:22Z

Pre-merge test using the dummy cincy json:

{
  "nodes": [
    {
      "version": "4.14.0-0.ci.test-2023-09-14-011048-ci-ln-774gjt2-latest",
      "payload": "registry.build02.ci.openshift.org/ci-ln-774gjt2/release@sha256:a69481840637cbcfdba8889628639c1a15a2fe916d9e179e790dd1100c62c526"
    },
    {
      "version": "4.15.0-0.nightly-2023-09-17-000000",
      "payload": "registry.ci.openshift.org/ocp/release@sha256:66c753e8b75d172f2a3f7ba13363383a76ecbc7ecdc00f3a423bef4ea8560405"
    },
    {
      "version": "4.15.0-0.nightly-2023-09-18-111111",
      "payload": "registry.ci.openshift.org/ocp/release@sha256:a5cd1b44e5b25b8a617d92a1f947297f56fc9bad104c117a8e452f932e1e2fd0"
    },
    {
      "version": "4.15.0-0.nightly-2023-09-19-222222",
      "payload": "registry.ci.openshift.org/ocp/release@sha256:e385a786f122c6c0e8848ecb9901f510676438f17af8a5c4c206807a9bc0bf28"
    }
  ],
  "edges": [
    [0,1],
    [0,2],
    [0,3]
  ],
  "conditionalEdges":[
    {
      "edges": [
        {"from": "4.14.0-0.ci.test-2023-09-14-011048-ci-ln-774gjt2-latest", "to": "4.15.0-0.nightly-2023-09-17-000000"}
      ],
      "risks": [
        {
          "url": "https://bug.example.com/a",
          "name": "SomeInvokerThing",
          "message": "On clusters on default invoker user, this imaginary bug can happen.",
          "matchingRules": [
            {
              "type": "PromQL",
              "promql": {
                "promql": "cluster_installer"
              }
            }
          ]
        },
        {
          "url": "https://bug.example.com/b",
          "name": "SomeChannelThing",
          "message": "On clusters with the channel set to 'buggy', this imaginary bug can happen.",
          "matchingRules": [
            {
              "type": "PromQL",
              "promql": {
                "promql": "group(cluster_version_available_updates{channel=\"buggy\"})\nor\n0 * group(cluster_version_available_updates{channel!=\"buggy\"})"
              }
            }
          ]
        }
      ]
    },
    {
      "edges": [
        {"from": "4.14.0-0.ci.test-2023-09-14-011048-ci-ln-774gjt2-latest", "to": "4.15.0-0.nightly-2023-09-18-111111"}
      ],
      "risks": [
        {
          "url": "https://openshift-release.apps.ci.l2s4.p1.openshiftapps.com/releasestream/4.10.0-0.nightly/release/4.10.0-0.nightly-2021-11-24-075634",
          "name": "ReleaseIsRejected",
          "message": "Too many CI failures on this release, so do not update to it",
          "matchingRules": [
            {
              "type": "Always"
            }
          ]
        }
      ]
    },
    {
      "edges": [
        {"from": "4.14.0-0.ci.test-2023-09-14-011048-ci-ln-774gjt2-latest", "to": "4.15.0-0.nightly-2023-09-19-222222"}
      ],
      "risks": [
        {
          "url": "https://bug.example.com/c",
          "name": "SomeInfrastructureThing",
          "message": "Clusters on nonexist provider, this imaginary bug can happen.",
          "matchingRules": [
            {
              "type": "PromQL",
              "promql": {
                "promql": "cluster_infrastructure_provider{type=~\"nonexist\"}\nor\n0 * cluster_infrastructure_provider"
              }
            }
          ]
        }
      ]
    }
  ]
}

Prior to the change, the lastTransitionTime of conditionalUpdates was changed every 5 minutes no matter whether the condition status was changed or not.

// The lastTransitionTime of 3 conditionalUpdates was 2023-09-13T13:14:08Z after the upstream was patched

# oc get clusterversion -oyaml
    ...
    conditionalUpdates:
    - conditions:
      - lastTransitionTime: "2023-09-13T13:14:08Z"
        message: The update is recommended, because none of the conditional update
          risks apply to this cluster.
        reason: AsExpected
        status: "True"
        type: Recommended
      release:
        image: registry.ci.openshift.org/ocp/release@sha256:e385a786f122c6c0e8848ecb9901f510676438f17af8a5c4c206807a9bc0bf28
        version: 4.15.0-0.nightly-2023-09-19-222222
      risks:
      - matchingRules:
        - promql:
            promql: |-
              cluster_infrastructure_provider{type=~"nonexist"}
              or
              0 * cluster_infrastructure_provider
          type: PromQL
        message: Clusters on nonexist provider, this imaginary bug can happen.
        name: SomeInfrastructureThing
        url: https://bug.example.com/c
    - conditions:
      - lastTransitionTime: "2023-09-13T13:14:08Z"
        message: Too many CI failures on this release, so do not update to it https://openshift-release.apps.ci.l2s4.p1.openshiftapps.com/releasestream/4.10.0-0.nightly/release/4.10.0-0.nightly-2021-11-24-075634
        reason: ReleaseIsRejected
        status: "False"
        type: Recommended
      release:
        image: registry.ci.openshift.org/ocp/release@sha256:a5cd1b44e5b25b8a617d92a1f947297f56fc9bad104c117a8e452f932e1e2fd0
        version: 4.15.0-0.nightly-2023-09-18-111111
      risks:
      - matchingRules:
        - type: Always
        message: Too many CI failures on this release, so do not update to it
        name: ReleaseIsRejected
        url: https://openshift-release.apps.ci.l2s4.p1.openshiftapps.com/releasestream/4.10.0-0.nightly/release/4.10.0-0.nightly-2021-11-24-075634
    - conditions:
      - lastTransitionTime: "2023-09-13T13:14:08Z"
        message: |-
          Could not evaluate exposure to update risk SomeInvokerThing (evaluation is throttled until 13:24:08Z)
            SomeInvokerThing description: On clusters on default invoker user, this imaginary bug can happen.
            SomeInvokerThing URL: https://bug.example.com/a

          Could not evaluate exposure to update risk SomeChannelThing (evaluation is throttled until 13:24:08Z)
            SomeChannelThing description: On clusters with the channel set to 'buggy', this imaginary bug can happen.
            SomeChannelThing URL: https://bug.example.com/b
        reason: EvaluationFailed
        status: Unknown
        type: Recommended
      release:
        image: registry.ci.openshift.org/ocp/release@sha256:66c753e8b75d172f2a3f7ba13363383a76ecbc7ecdc00f3a423bef4ea8560405
        version: 4.15.0-0.nightly-2023-09-17-000000
      risks:
      - matchingRules:
        - promql:
            promql: cluster_installer
          type: PromQL
        message: On clusters on default invoker user, this imaginary bug can happen.
        name: SomeInvokerThing
        url: https://bug.example.com/a
      - matchingRules:
        - promql:
            promql: |-
              group(cluster_version_available_updates{channel="buggy"})
              or
              0 * group(cluster_version_available_updates{channel!="buggy"})
          type: PromQL
        message: On clusters with the channel set to 'buggy', this imaginary bug can
          happen.
        name: SomeChannelThing
        url: https://bug.example.com/b
    ...

// After 5 minutes, the lastTransitionTime of the 3 conditionalUpdates was changed to 2023-09-13T13:19:09Z but none of status were changed

# oc get clusterversion -oyaml
...
    conditionalUpdates:
    - conditions:
      - lastTransitionTime: "2023-09-13T13:19:09Z"
        message: The update is recommended, because none of the conditional update
          risks apply to this cluster.
        reason: AsExpected
        status: "True"
        type: Recommended
      release:
        image: registry.ci.openshift.org/ocp/release@sha256:e385a786f122c6c0e8848ecb9901f510676438f17af8a5c4c206807a9bc0bf28
        version: 4.15.0-0.nightly-2023-09-19-222222
      risks:
      - matchingRules:
        - promql:
            promql: |-
              cluster_infrastructure_provider{type=~"nonexist"}
              or
              0 * cluster_infrastructure_provider
          type: PromQL
        message: Clusters on nonexist provider, this imaginary bug can happen.
        name: SomeInfrastructureThing
        url: https://bug.example.com/c
    - conditions:
      - lastTransitionTime: "2023-09-13T13:19:09Z"
        message: Too many CI failures on this release, so do not update to it https://openshift-release.apps.ci.l2s4.p1.openshiftapps.com/releasestream/4.10.0-0.nightly/release/4.10.0-0.nightly-2021-11-24-075634
        reason: ReleaseIsRejected
        status: "False"
        type: Recommended
      release:
        image: registry.ci.openshift.org/ocp/release@sha256:a5cd1b44e5b25b8a617d92a1f947297f56fc9bad104c117a8e452f932e1e2fd0
        version: 4.15.0-0.nightly-2023-09-18-111111
      risks:
      - matchingRules:
        - type: Always
        message: Too many CI failures on this release, so do not update to it
        name: ReleaseIsRejected
        url: https://openshift-release.apps.ci.l2s4.p1.openshiftapps.com/releasestream/4.10.0-0.nightly/release/4.10.0-0.nightly-2021-11-24-075634
    - conditions:
      - lastTransitionTime: "2023-09-13T13:19:09Z"
        message: |-
          Could not evaluate exposure to update risk SomeInvokerThing (evaluation is throttled until 13:24:08Z)
            SomeInvokerThing description: On clusters on default invoker user, this imaginary bug can happen.
            SomeInvokerThing URL: https://bug.example.com/a

          Could not evaluate exposure to update risk SomeChannelThing (evaluation is throttled until 13:24:08Z)
            SomeChannelThing description: On clusters with the channel set to 'buggy', this imaginary bug can happen.
            SomeChannelThing URL: https://bug.example.com/b
        reason: EvaluationFailed
        status: Unknown
        type: Recommended
      release:
        image: registry.ci.openshift.org/ocp/release@sha256:66c753e8b75d172f2a3f7ba13363383a76ecbc7ecdc00f3a423bef4ea8560405
        version: 4.15.0-0.nightly-2023-09-17-000000
      risks:
      - matchingRules:
        - promql:
            promql: cluster_installer
          type: PromQL
        message: On clusters on default invoker user, this imaginary bug can happen.
        name: SomeInvokerThing
        url: https://bug.example.com/a
      - matchingRules:
        - promql:
            promql: |-
              group(cluster_version_available_updates{channel="buggy"})
              or
              0 * group(cluster_version_available_updates{channel!="buggy"})
          type: PromQL
        message: On clusters with the channel set to 'buggy', this imaginary bug can
          happen.
        name: SomeChannelThing
        url: https://bug.example.com/b
   ...

// After 5 minutes, the lastTransitionTime of the 3 conditionalUpdates was changed to 2023-09-13T13:24:11Z but only one of status were changed

# oc get clusterversion -oyaml
...
    conditionalUpdates:
    - conditions:
      - lastTransitionTime: "2023-09-13T13:24:11Z"
        message: The update is recommended, because none of the conditional update
          risks apply to this cluster.
        reason: AsExpected
        status: "True"
        type: Recommended
      release:
        image: registry.ci.openshift.org/ocp/release@sha256:e385a786f122c6c0e8848ecb9901f510676438f17af8a5c4c206807a9bc0bf28
        version: 4.15.0-0.nightly-2023-09-19-222222
      risks:
      - matchingRules:
        - promql:
            promql: |-
              cluster_infrastructure_provider{type=~"nonexist"}
              or
              0 * cluster_infrastructure_provider
          type: PromQL
        message: Clusters on nonexist provider, this imaginary bug can happen.
        name: SomeInfrastructureThing
        url: https://bug.example.com/c
    - conditions:
      - lastTransitionTime: "2023-09-13T13:24:11Z"
        message: Too many CI failures on this release, so do not update to it https://openshift-release.apps.ci.l2s4.p1.openshiftapps.com/releasestream/4.10.0-0.nightly/release/4.10.0-0.nightly-2021-11-24-075634
        reason: ReleaseIsRejected
        status: "False"
        type: Recommended
      release:
        image: registry.ci.openshift.org/ocp/release@sha256:a5cd1b44e5b25b8a617d92a1f947297f56fc9bad104c117a8e452f932e1e2fd0
        version: 4.15.0-0.nightly-2023-09-18-111111
      risks:
      - matchingRules:
        - type: Always
        message: Too many CI failures on this release, so do not update to it
        name: ReleaseIsRejected
        url: https://openshift-release.apps.ci.l2s4.p1.openshiftapps.com/releasestream/4.10.0-0.nightly/release/4.10.0-0.nightly-2021-11-24-075634
    - conditions:
      - lastTransitionTime: "2023-09-13T13:24:11Z"
        message: |-
          On clusters on default invoker user, this imaginary bug can happen. https://bug.example.com/a

          Could not evaluate exposure to update risk SomeChannelThing (evaluation is throttled until 13:34:11Z)
            SomeChannelThing description: On clusters with the channel set to 'buggy', this imaginary bug can happen.
            SomeChannelThing URL: https://bug.example.com/b
        reason: MultipleReasons
        status: "False"
        type: Recommended
      release:
        image: registry.ci.openshift.org/ocp/release@sha256:66c753e8b75d172f2a3f7ba13363383a76ecbc7ecdc00f3a423bef4ea8560405
        version: 4.15.0-0.nightly-2023-09-17-000000
      risks:
      - matchingRules:
        - promql:
            promql: cluster_installer
          type: PromQL
        message: On clusters on default invoker user, this imaginary bug can happen.
        name: SomeInvokerThing
        url: https://bug.example.com/a
      - matchingRules:
        - promql:
            promql: |-
              group(cluster_version_available_updates{channel="buggy"})
              or
              0 * group(cluster_version_available_updates{channel!="buggy"})
          type: PromQL
        message: On clusters with the channel set to 'buggy', this imaginary bug can
          happen.
        name: SomeChannelThing
        url: https://bug.example.com/b
    ...

After the change, lastTransitionTime is only changed when status is changed.

// The lastTransitionTime of the 3 conditions were 2023-09-14T02:09:21Z after the upstream was patched

...
conditionalUpdates:
    - conditions:
      - lastTransitionTime: "2023-09-14T02:09:21Z"
        message: The update is recommended, because none of the conditional update
          risks apply to this cluster.
        reason: AsExpected
        status: "True"
        type: Recommended
      release:
        image: registry.ci.openshift.org/ocp/release@sha256:e385a786f122c6c0e8848ecb9901f510676438f17af8a5c4c206807a9bc0bf28
        version: 4.15.0-0.nightly-2023-09-19-222222
      risks:
      - matchingRules:
        - promql:
            promql: |-
              cluster_infrastructure_provider{type=~"nonexist"}
              or
              0 * cluster_infrastructure_provider
          type: PromQL
        message: Clusters on nonexist provider, this imaginary bug can happen.
        name: SomeInfrastructureThing
        url: https://bug.example.com/c
    - conditions:
      - lastTransitionTime: "2023-09-14T02:09:21Z"
        message: Too many CI failures on this release, so do not update to it https://openshift-release.apps.ci.l2s4.p1.openshiftapps.com/releasestream/4.10.0-0.nightly/release/4.10.0-0.nightly-2021-11-24-075634
        reason: ReleaseIsRejected
        status: "False"
        type: Recommended
      release:
        image: registry.ci.openshift.org/ocp/release@sha256:a5cd1b44e5b25b8a617d92a1f947297f56fc9bad104c117a8e452f932e1e2fd0
        version: 4.15.0-0.nightly-2023-09-18-111111
      risks:
      - matchingRules:
        - type: Always
        message: Too many CI failures on this release, so do not update to it
        name: ReleaseIsRejected
        url: https://openshift-release.apps.ci.l2s4.p1.openshiftapps.com/releasestream/4.10.0-0.nightly/release/4.10.0-0.nightly-2021-11-24-075634
    - conditions:
      - lastTransitionTime: "2023-09-14T02:09:21Z"
        message: |-
          Could not evaluate exposure to update risk SomeInvokerThing (evaluation is throttled until 02:19:21Z)
            SomeInvokerThing description: On clusters on default invoker user, this imaginary bug can happen.
            SomeInvokerThing URL: https://bug.example.com/a

          Could not evaluate exposure to update risk SomeChannelThing (evaluation is throttled until 02:19:21Z)
            SomeChannelThing description: On clusters with the channel set to 'buggy', this imaginary bug can happen.
            SomeChannelThing URL: https://bug.example.com/b
        reason: EvaluationFailed
        status: Unknown
        type: Recommended
      release:
        image: registry.ci.openshift.org/ocp/release@sha256:66c753e8b75d172f2a3f7ba13363383a76ecbc7ecdc00f3a423bef4ea8560405
        version: 4.15.0-0.nightly-2023-09-17-000000
      risks:
      - matchingRules:
        - promql:
            promql: cluster_installer
          type: PromQL
        message: On clusters on default invoker user, this imaginary bug can happen.
        name: SomeInvokerThing
        url: https://bug.example.com/a
      - matchingRules:
        - promql:
            promql: |-
              group(cluster_version_available_updates{channel="buggy"})
              or
              0 * group(cluster_version_available_updates{channel!="buggy"})
          type: PromQL
        message: On clusters with the channel set to 'buggy', this imaginary bug can
          happen.
        name: SomeChannelThing
        url: https://bug.example.com/b
...

// After around 10 minutes, the status of the last condition was changed from Unknown to False with reason MultipleReasons, and the lastTransitionTime was changed to 2023-09-14T02:20:19Z. The lastTransitionTime of the rest conditions kept at 2023-09-14T02:09:21Z

...
 conditionalUpdates:
    - conditions:
      - lastTransitionTime: "2023-09-14T02:09:21Z"
        message: The update is recommended, because none of the conditional update
          risks apply to this cluster.
        reason: AsExpected
        status: "True"
        type: Recommended
      release:
        image: registry.ci.openshift.org/ocp/release@sha256:e385a786f122c6c0e8848ecb9901f510676438f17af8a5c4c206807a9bc0bf28
        version: 4.15.0-0.nightly-2023-09-19-222222
      risks:
      - matchingRules:
        - promql:
            promql: |-
              cluster_infrastructure_provider{type=~"nonexist"}
              or
              0 * cluster_infrastructure_provider
          type: PromQL
        message: Clusters on nonexist provider, this imaginary bug can happen.
        name: SomeInfrastructureThing
        url: https://bug.example.com/c
    - conditions:
      - lastTransitionTime: "2023-09-14T02:09:21Z"
        message: Too many CI failures on this release, so do not update to it https://openshift-release.apps.ci.l2s4.p1.openshiftapps.com/releasestream/4.10.0-0.nightly/release/4.10.0-0.nightly-2021-11-24-075634
        reason: ReleaseIsRejected
        status: "False"
        type: Recommended
      release:
        image: registry.ci.openshift.org/ocp/release@sha256:a5cd1b44e5b25b8a617d92a1f947297f56fc9bad104c117a8e452f932e1e2fd0
        version: 4.15.0-0.nightly-2023-09-18-111111
      risks:
      - matchingRules:
        - type: Always
        message: Too many CI failures on this release, so do not update to it
        name: ReleaseIsRejected
        url: https://openshift-release.apps.ci.l2s4.p1.openshiftapps.com/releasestream/4.10.0-0.nightly/release/4.10.0-0.nightly-2021-11-24-075634
    - conditions:
      - lastTransitionTime: "2023-09-14T02:20:19Z"
        message: |-
          On clusters on default invoker user, this imaginary bug can happen. https://bug.example.com/a

          Could not evaluate exposure to update risk SomeChannelThing (evaluation is throttled until 02:30:19Z)
            SomeChannelThing description: On clusters with the channel set to 'buggy', this imaginary bug can happen.
            SomeChannelThing URL: https://bug.example.com/b
        reason: MultipleReasons
        status: "False"
        type: Recommended
      release:
        image: registry.ci.openshift.org/ocp/release@sha256:66c753e8b75d172f2a3f7ba13363383a76ecbc7ecdc00f3a423bef4ea8560405
        version: 4.15.0-0.nightly-2023-09-17-000000
      risks:
      - matchingRules:
        - promql:
            promql: cluster_installer
          type: PromQL
        message: On clusters on default invoker user, this imaginary bug can happen.
        name: SomeInvokerThing
        url: https://bug.example.com/a
      - matchingRules:
        - promql:
            promql: |-
              group(cluster_version_available_updates{channel="buggy"})
              or
              0 * group(cluster_version_available_updates{channel!="buggy"})
          type: PromQL
        message: On clusters with the channel set to 'buggy', this imaginary bug can
          happen.
        name: SomeChannelThing
        url: https://bug.example.com/b
...

// The lastTransitionTime of the last condition kept at 2023-09-14T02:20:19Z even if the reason was changed to SomeInvokerThing

...
conditionalUpdates:
    - conditions:
      - lastTransitionTime: "2023-09-14T02:09:21Z"
        message: The update is recommended, because none of the conditional update
          risks apply to this cluster.
        reason: AsExpected
        status: "True"
        type: Recommended
      release:
        image: registry.ci.openshift.org/ocp/release@sha256:e385a786f122c6c0e8848ecb9901f510676438f17af8a5c4c206807a9bc0bf28
        version: 4.15.0-0.nightly-2023-09-19-222222
      risks:
      - matchingRules:
        - promql:
            promql: |-
              cluster_infrastructure_provider{type=~"nonexist"}
              or
              0 * cluster_infrastructure_provider
          type: PromQL
        message: Clusters on nonexist provider, this imaginary bug can happen.
        name: SomeInfrastructureThing
        url: https://bug.example.com/c
    - conditions:
      - lastTransitionTime: "2023-09-14T02:09:21Z"
        message: Too many CI failures on this release, so do not update to it https://openshift-release.apps.ci.l2s4.p1.openshiftapps.com/releasestream/4.10.0-0.nightly/release/4.10.0-0.nightly-2021-11-24-075634
        reason: ReleaseIsRejected
        status: "False"
        type: Recommended
      release:
        image: registry.ci.openshift.org/ocp/release@sha256:a5cd1b44e5b25b8a617d92a1f947297f56fc9bad104c117a8e452f932e1e2fd0
        version: 4.15.0-0.nightly-2023-09-18-111111
      risks:
      - matchingRules:
        - type: Always
        message: Too many CI failures on this release, so do not update to it
        name: ReleaseIsRejected
        url: https://openshift-release.apps.ci.l2s4.p1.openshiftapps.com/releasestream/4.10.0-0.nightly/release/4.10.0-0.nightly-2021-11-24-075634
    - conditions:
      - lastTransitionTime: "2023-09-14T02:20:19Z"
        message: On clusters on default invoker user, this imaginary bug can happen.
          https://bug.example.com/a
        reason: SomeInvokerThing
        status: "False"
        type: Recommended
      release:
        image: registry.ci.openshift.org/ocp/release@sha256:66c753e8b75d172f2a3f7ba13363383a76ecbc7ecdc00f3a423bef4ea8560405
        version: 4.15.0-0.nightly-2023-09-17-000000
      risks:
      - matchingRules:
        - promql:
            promql: cluster_installer
          type: PromQL
        message: On clusters on default invoker user, this imaginary bug can happen.
        name: SomeInvokerThing
        url: https://bug.example.com/a
      - matchingRules:
        - promql:
            promql: |-
              group(cluster_version_available_updates{channel="buggy"})
              or
              0 * group(cluster_version_available_updates{channel!="buggy"})
          type: PromQL
        message: On clusters with the channel set to 'buggy', this imaginary bug can
          happen.
        name: SomeChannelThing
        url: https://bug.example.com/b
...

The test result looks good to me.

Petr, please let me know if there is additional test against it needed

petr-muller · 2023-09-14T07:43:23Z

@shellyyang1989 this looks good, thanks for testing!

shellyyang1989 · 2023-09-14T07:45:07Z

Thanks for confirming, then

/label qe-approved

wking

/lgtm

openshift-ci · 2023-09-14T16:36:54Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: petr-muller, wking

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [petr-muller,wking]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci-robot · 2023-09-18T14:02:21Z

/retest-required

Remaining retests: 0 against base HEAD 16108ef and 2 for PR HEAD 9bd912c in total

openshift-ci · 2023-09-18T20:38:47Z

@petr-muller: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

…openshift#964) * availableupdates: do not reset lastTransitionTime on unchanged status The code in `evaluateConditionalUpdates` correctly uses `SetStatusCondition` to set conditions, which only updates the `LastTransitionTime` field when `Status` differs between the original and updated state. Previously though, the original state always contained empty conditions, because conditional updates are always obtained from OSUS and the fresh structure was never updated with existing conditions from the in-cluster status. * review: use existing mock condition instead of new code * review: use real queue instead of a mock

openshift-ci bot requested review from DavidHurta and LalatenduMohanty September 5, 2023 16:56

openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 5, 2023

petr-muller mentioned this pull request Sep 5, 2023

OCPBUGS-9050: Alert on failing conditional update risk evaluation #961

Merged

petr-muller force-pushed the do-not-reset-last-transition-time branch from f3d24ee to db205e7 Compare September 5, 2023 16:59

This was referenced Sep 7, 2023

OCPBUGS-18454: Match restrictions on ConditionalUpdateRisk name with Condition reason openshift/api#1577

Closed

OCPBUGS-18454: Avoid using risk names as condition reasons when invalid #962

Merged

DavidHurta suggested changes Sep 7, 2023

View reviewed changes

openshift-ci bot assigned DavidHurta Sep 7, 2023

openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 7, 2023

petr-muller force-pushed the do-not-reset-last-transition-time branch from db205e7 to 5606ae9 Compare September 8, 2023 13:09

openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 8, 2023

DavidHurta approved these changes Sep 8, 2023

View reviewed changes

openshift-ci bot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. lgtm Indicates that a PR is ready to be merged. labels Sep 8, 2023

DavidHurta reviewed Sep 8, 2023

View reviewed changes

pkg/cvo/availableupdates_test.go Outdated Show resolved Hide resolved

petr-muller force-pushed the do-not-reset-last-transition-time branch from 5606ae9 to 0375904 Compare September 8, 2023 15:00

openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Sep 8, 2023

openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Sep 8, 2023

openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 8, 2023

DavidHurta suggested changes Sep 9, 2023

View reviewed changes

openshift-ci bot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. and removed lgtm Indicates that a PR is ready to be merged. labels Sep 9, 2023

wking reviewed Sep 9, 2023

View reviewed changes

pkg/cvo/availableupdates_test.go Outdated Show resolved Hide resolved

wking reviewed Sep 9, 2023

View reviewed changes

review: use existing mock condition instead of new code

b557d0d

openshift-ci bot added the tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges. label Sep 11, 2023

wking reviewed Sep 11, 2023

View reviewed changes

pkg/cvo/availableupdates_test.go Outdated Show resolved Hide resolved

review: use real queue instead of a mock

9bd912c

openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 12, 2023

openshift-ci bot added the qe-approved Signifies that QE has signed off on this PR label Sep 14, 2023

wking approved these changes Sep 14, 2023

View reviewed changes

openshift-ci bot assigned wking Sep 14, 2023

openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Sep 14, 2023

openshift-merge-robot merged commit 440aed1 into openshift:master Sep 18, 2023

petr-muller deleted the do-not-reset-last-transition-time branch September 18, 2023 21:04

wking mentioned this pull request Sep 25, 2023

OCPBUGS-19737: pkg/clusterconditions/promql: Warm cache with 1s delay #973

Merged

availableupdates: do not reset lastTransitionTime on unchanged status #964

availableupdates: do not reset lastTransitionTime on unchanged status #964

Uh oh!

Conversation

petr-muller commented Sep 5, 2023

Uh oh!

petr-muller commented Sep 6, 2023

Uh oh!

petr-muller commented Sep 7, 2023

Uh oh!

DavidHurta left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

petr-muller commented Sep 7, 2023

Uh oh!

petr-muller commented Sep 8, 2023

Uh oh!

DavidHurta left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

DavidHurta commented Sep 8, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DavidHurta commented Sep 8, 2023

Uh oh!

DavidHurta left a comment

Choose a reason for hiding this comment

Uh oh!

DavidHurta Sep 9, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wking Sep 9, 2023

Choose a reason for hiding this comment

Uh oh!

DavidHurta Sep 11, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

petr-muller Sep 11, 2023

Choose a reason for hiding this comment

Uh oh!

petr-muller Sep 11, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!

wking Sep 9, 2023

Choose a reason for hiding this comment

Uh oh!

petr-muller Sep 11, 2023

Choose a reason for hiding this comment

Uh oh!

petr-muller Sep 11, 2023

Choose a reason for hiding this comment

Uh oh!

petr-muller commented Sep 11, 2023

Uh oh!

Uh oh!

petr-muller commented Sep 12, 2023

Uh oh!

petr-muller commented Sep 12, 2023

Uh oh!

petr-muller commented Sep 12, 2023

Uh oh!

petr-muller commented Sep 13, 2023

Uh oh!

openshift-ci bot commented Sep 13, 2023

Uh oh!

shellyyang1989 commented Sep 14, 2023

DavidHurta left a comment •

edited

Loading

DavidHurta commented Sep 8, 2023 •

edited

Loading

DavidHurta Sep 9, 2023 •

edited

Loading

DavidHurta Sep 11, 2023 •

edited

Loading

shellyyang1989 commented Sep 14, 2023 •

edited

Loading