Skip to content

Conversation

@petr-muller
Copy link
Member

The code in evaluateConditionalUpdates correctly uses SetStatusCondition to set conditions, which only updates the
LastTransitionTime field when Status differs between the original and updated state. Previously though, the original state always contained empty conditions, because conditional updates are always obtained from OSUS and the fresh structure was never updated with existing conditions from the in-cluster status.

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 5, 2023
@petr-muller petr-muller force-pushed the do-not-reset-last-transition-time branch from f3d24ee to db205e7 Compare September 5, 2023 16:59
@petr-muller
Copy link
Member Author

/retest

@petr-muller
Copy link
Member Author

/test e2e-agnostic-ovn-upgrade-out-of-change

Copy link
Contributor

@DavidHurta DavidHurta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good and functional!

My only concern is the used comparison when searching for an update in updates.

Other than that only nitpicks.

(Note: Only tested locally using the go test subcommand)

@petr-muller
Copy link
Member Author

/hold
Thanks for the review! Will address.

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 7, 2023
@petr-muller petr-muller force-pushed the do-not-reset-last-transition-time branch from db205e7 to 5606ae9 Compare September 8, 2023 13:09
@petr-muller
Copy link
Member Author

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 8, 2023
Copy link
Contributor

@DavidHurta DavidHurta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like the new changes, thanks! 💪 :shipit:

Just one smallest nitpick of all nitpicks, feel free to unhold or address.

/hold
/lgtm

@openshift-ci openshift-ci bot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. lgtm Indicates that a PR is ready to be merged. labels Sep 8, 2023
The code in `evaluateConditionalUpdates` correctly uses
`SetStatusCondition` to set conditions, which only updates the
`LastTransitionTime` field when `Status` differs between the original
and updated state. Previously though, the original state always
contained empty conditions, because conditional updates are always
obtained from OSUS and the fresh structure was never updated with
existing conditions from the in-cluster status.
@petr-muller petr-muller force-pushed the do-not-reset-last-transition-time branch from 5606ae9 to 0375904 Compare September 8, 2023 15:00
@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Sep 8, 2023
@DavidHurta
Copy link
Contributor

DavidHurta commented Sep 8, 2023

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Sep 8, 2023
@DavidHurta
Copy link
Contributor

/unhold

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 8, 2023
Copy link
Contributor

@DavidHurta DavidHurta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I apologize for the late re-review after accepting the changes.

/hold

Comment on lines 170 to 179
Type: "PromQL",
PromQL: &configv1.PromQLClusterCondition{PromQL: string(evalToYes)},
},
},
},
},
Conditions: []metav1.Condition{
{
Type: "Recommended",
Status: metav1.ConditionFalse,
Copy link
Contributor

@DavidHurta DavidHurta Sep 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we hard-code the field Type to the value of "PromQL", the field PromQL to the value of &configv1.PromQLClusterCondition{PromQL: string(evalToYes)}, and the field Status to the value of metav1.ConditionFalse even though they should be set by the function's parameters ruleType, promql, and by the expected value for the Status depending on the Match result.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Petr has things the way he does so he can excercise "what we already know about a conditional update for a particular target, and the new update service response comes in with new information for that same image?". So maybe we move from PromQL to Always like openshift/cincinnati-graph-data#3590. The test suite should confirm that lastTransitionTime is being preserved if we transition from "outgoing PromQL matched and so does the incoming Always" and that the transition time resets if we transition from "outgoing PromQL did not match but the incoming Always does".

Copy link
Contributor

@DavidHurta DavidHurta Sep 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Petr has things the way he does so he can excercise "what we already know about a conditional update for a particular target, and the new update service response comes in with new information for that same image?".

Oh, I didn't think of it like that, good point! Any of the options seems reasonable.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately I cannot claim the intent that Trevor assumes. The truth is that I finished the testing code knowing it is probably a bit rough but I called it day thinking it's not worth the effort to polish it entirely. David is right that this method pretends to be a reusable configurable helper that can prepare fixtures for different situations but in reality it does not. I'll spend some time looking at how I could clean it up even further, maybe taking other Trevor's review items into account as well.

Thanks for being thorough!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reworked the tests using the existing mock that Trevor suggested, which made some of the code above obsolete. I have dropped osusWithSingleConditionalEdge's illusion of being reusable, hardcoded its content (the value of this helper is that is it creates the complicated but consistent fixture data). When someone wants to add tests they can generalize the code following their future use case.

@openshift-ci openshift-ci bot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. and removed lgtm Indicates that a PR is ready to be merged. labels Sep 9, 2023
}

func osusWithSingleConditionalEdge(from, to string, ruleType clusterConditionRuleType, promql clusterConditionRuleFakePromql) ([]configv1.ConditionalUpdate, *httptest.Server) {
osus := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we could possibly simplify this test harness a bit if we pull this out into a new function, because the upstream defaulting, available-updates throttling, and proxy transport lookup don't seem as interesting to cover in tests. You could create a dummy client to pass through to calculateAvailableUpdatesStatus like:

type MockRoundTripper func(r *http.Request) *http.Response

func (f MockRoundTripper) RoundTrip(r *http.Request) (*http.Response, error) {
    return f(r), nil
}

and:

httpClient := &http.Client{
  Transport: MockRoundTripper(func(r *http.Request) *http.Response {
    return &http.Response{
      StatusCode: http.StatusOK,
      Body:       io.NopCloser(strings.NewReader(fmt.Sprintf(`{
  "nodes": [{"version": "%s", "payload": "payload/%s"}, {"version": "%s", "payload": "payload/%s"}],
  "conditionalEdges": [
    {
      "edges": [{"from": "%s", "to": "%s"}],
      "risks": [
        {
          "url": "https://example.com/%s",
          "name": "FourFiveSix",
          "message": "Four Five Five is just fine",
          "matchingRules": [{"type": "%s", "promql": { "promql": "%s"}}]
        }
      ]
    }
  ]
}
`, from, from, to, to, from, to, to, ruleType, promql
      ))),
    },
  },
}

or some such without needing to set up an HTTP server that needs to be closed later.

I'm not sure if it's worth the effort to pivot, now that you've already figured out all the plumbing to get this up to the existing level, but it might be worth poking at to see if the pivoting-this-pull cost seems like it might be worth reducing the onboarding-the-next-dev-who-needs-to-understand-this-test-suite cost.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have considered doing something like that, but the tested functionality felt like deserving a higher-level integration test so I thought test HTTP servers are cheap in Go with httptest. The real complexity is in preparing all the data and expected fixtures.

But I'll reconsider, with further development and feedback the code now is different than when started. Thanks for the suggestion!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I gave the pivot a quick shot and I think we're not really saving that much complexity, the annoying bits with "set up the input/output" would need to stay, we'd just need different plumbing.

But I actually discovered that I like using httptest here. It allows testing the full method and it quite explicitly communicates that the important input comes from a server. The plumbing needed is basically identical to mocking it in the client transport. I liked that I could get rid of the annoying queue stub though :)

@petr-muller
Copy link
Member Author

/label tide/merge-method-squash

@openshift-ci openshift-ci bot added the tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges. label Sep 11, 2023
@petr-muller
Copy link
Member Author

[sig-arch] events should not repeat pathologically for ns/openshift-etcd
{  1 events happened too frequently  event happened 36 times, something is wrong: ns/openshift-etcd pod/etcd-guard-ci-op-3w2kvwtx-deec6-m6cvk-master-1 node/ci-op-3w2kvwtx-deec6-m6cvk-master-1 hmsg/cc0a8bd52a - pathological/true reason/ProbeError Readiness probe error: Get "https://10.0.0.6:9980/readyz": net/http: request canceled (Client.Timeout exceeded while awaiting headers) result=reject  body:

looks unrelated
/retest

@petr-muller
Copy link
Member Author

/hold cancel

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 12, 2023
@petr-muller
Copy link
Member Author

[sig-node] nodes should not go unready after being upgraded and go unready only once
{  1 nodes violated upgrade expectations:  Node ci-op-zf3sq2f7-deec6-ch4ln-worker-centralus2-vd5kz went unready multiple times: 2023-09-12T15:51:44Z, 2023-09-12T15:54:15Z Node ci-op-zf3sq2f7-deec6-ch4ln-worker-centralus2-vd5kz went ready multiple times: 2023-09-12T15:52:44Z, 2023-09-12T15:54:18Z

Looks unrelated, I'll override next unrelated failure :P
/retest

@petr-muller
Copy link
Member Author

: [sig-arch] events should not repeat pathologically for ns/openshift-etcd
{  2 events happened too frequently  event happened 29 times, something is wrong: ns/openshift-etcd pod/etcd-guard-ci-op-p9cdqlzn-deec6-nnqcs-master-1 node/ci-op-p9cdqlzn-deec6-nnqcs-master-1 hmsg/1004945f03 - pathological/true reason/ProbeError Readiness probe error: Get "https://10.0.0.7:9980/readyz": net/http: request canceled (Client.Timeout exceeded while awaiting headers) result=reject

/override ci/prow/e2e-agnostic-ovn-upgrade-into-change

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Sep 13, 2023

@petr-muller: Overrode contexts on behalf of petr-muller: ci/prow/e2e-agnostic-ovn-upgrade-into-change

Details

In response to this:

: [sig-arch] events should not repeat pathologically for ns/openshift-etcd
{  2 events happened too frequently  event happened 29 times, something is wrong: ns/openshift-etcd pod/etcd-guard-ci-op-p9cdqlzn-deec6-nnqcs-master-1 node/ci-op-p9cdqlzn-deec6-nnqcs-master-1 hmsg/1004945f03 - pathological/true reason/ProbeError Readiness probe error: Get "https://10.0.0.7:9980/readyz": net/http: request canceled (Client.Timeout exceeded while awaiting headers) result=reject

/override ci/prow/e2e-agnostic-ovn-upgrade-into-change

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@shellyyang1989
Copy link
Contributor

Pre-merge test using the dummy cincy json:

{
  "nodes": [
    {
      "version": "4.14.0-0.ci.test-2023-09-14-011048-ci-ln-774gjt2-latest",
      "payload": "registry.build02.ci.openshift.org/ci-ln-774gjt2/release@sha256:a69481840637cbcfdba8889628639c1a15a2fe916d9e179e790dd1100c62c526"
    },
    {
      "version": "4.15.0-0.nightly-2023-09-17-000000",
      "payload": "registry.ci.openshift.org/ocp/release@sha256:66c753e8b75d172f2a3f7ba13363383a76ecbc7ecdc00f3a423bef4ea8560405"
    },
    {
      "version": "4.15.0-0.nightly-2023-09-18-111111",
      "payload": "registry.ci.openshift.org/ocp/release@sha256:a5cd1b44e5b25b8a617d92a1f947297f56fc9bad104c117a8e452f932e1e2fd0"
    },
    {
      "version": "4.15.0-0.nightly-2023-09-19-222222",
      "payload": "registry.ci.openshift.org/ocp/release@sha256:e385a786f122c6c0e8848ecb9901f510676438f17af8a5c4c206807a9bc0bf28"
    }
  ],
  "edges": [
    [0,1],
    [0,2],
    [0,3]
  ],
  "conditionalEdges":[
    {
      "edges": [
        {"from": "4.14.0-0.ci.test-2023-09-14-011048-ci-ln-774gjt2-latest", "to": "4.15.0-0.nightly-2023-09-17-000000"}
      ],
      "risks": [
        {
          "url": "https://bug.example.com/a",
          "name": "SomeInvokerThing",
          "message": "On clusters on default invoker user, this imaginary bug can happen.",
          "matchingRules": [
            {
              "type": "PromQL",
              "promql": {
                "promql": "cluster_installer"
              }
            }
          ]
        },
        {
          "url": "https://bug.example.com/b",
          "name": "SomeChannelThing",
          "message": "On clusters with the channel set to 'buggy', this imaginary bug can happen.",
          "matchingRules": [
            {
              "type": "PromQL",
              "promql": {
                "promql": "group(cluster_version_available_updates{channel=\"buggy\"})\nor\n0 * group(cluster_version_available_updates{channel!=\"buggy\"})"
              }
            }
          ]
        }
      ]
    },
    {
      "edges": [
        {"from": "4.14.0-0.ci.test-2023-09-14-011048-ci-ln-774gjt2-latest", "to": "4.15.0-0.nightly-2023-09-18-111111"}
      ],
      "risks": [
        {
          "url": "https://openshift-release.apps.ci.l2s4.p1.openshiftapps.com/releasestream/4.10.0-0.nightly/release/4.10.0-0.nightly-2021-11-24-075634",
          "name": "ReleaseIsRejected",
          "message": "Too many CI failures on this release, so do not update to it",
          "matchingRules": [
            {
              "type": "Always"
            }
          ]
        }
      ]
    },
    {
      "edges": [
        {"from": "4.14.0-0.ci.test-2023-09-14-011048-ci-ln-774gjt2-latest", "to": "4.15.0-0.nightly-2023-09-19-222222"}
      ],
      "risks": [
        {
          "url": "https://bug.example.com/c",
          "name": "SomeInfrastructureThing",
          "message": "Clusters on nonexist provider, this imaginary bug can happen.",
          "matchingRules": [
            {
              "type": "PromQL",
              "promql": {
                "promql": "cluster_infrastructure_provider{type=~\"nonexist\"}\nor\n0 * cluster_infrastructure_provider"
              }
            }
          ]
        }
      ]
    }
  ]
}

Prior to the change, the lastTransitionTime of conditionalUpdates was changed every 5 minutes no matter whether the condition status was changed or not.

// The lastTransitionTime of 3 conditionalUpdates was 2023-09-13T13:14:08Z after the upstream was patched

# oc get clusterversion -oyaml
    ...
    conditionalUpdates:
    - conditions:
      - lastTransitionTime: "2023-09-13T13:14:08Z"
        message: The update is recommended, because none of the conditional update
          risks apply to this cluster.
        reason: AsExpected
        status: "True"
        type: Recommended
      release:
        image: registry.ci.openshift.org/ocp/release@sha256:e385a786f122c6c0e8848ecb9901f510676438f17af8a5c4c206807a9bc0bf28
        version: 4.15.0-0.nightly-2023-09-19-222222
      risks:
      - matchingRules:
        - promql:
            promql: |-
              cluster_infrastructure_provider{type=~"nonexist"}
              or
              0 * cluster_infrastructure_provider
          type: PromQL
        message: Clusters on nonexist provider, this imaginary bug can happen.
        name: SomeInfrastructureThing
        url: https://bug.example.com/c
    - conditions:
      - lastTransitionTime: "2023-09-13T13:14:08Z"
        message: Too many CI failures on this release, so do not update to it https://openshift-release.apps.ci.l2s4.p1.openshiftapps.com/releasestream/4.10.0-0.nightly/release/4.10.0-0.nightly-2021-11-24-075634
        reason: ReleaseIsRejected
        status: "False"
        type: Recommended
      release:
        image: registry.ci.openshift.org/ocp/release@sha256:a5cd1b44e5b25b8a617d92a1f947297f56fc9bad104c117a8e452f932e1e2fd0
        version: 4.15.0-0.nightly-2023-09-18-111111
      risks:
      - matchingRules:
        - type: Always
        message: Too many CI failures on this release, so do not update to it
        name: ReleaseIsRejected
        url: https://openshift-release.apps.ci.l2s4.p1.openshiftapps.com/releasestream/4.10.0-0.nightly/release/4.10.0-0.nightly-2021-11-24-075634
    - conditions:
      - lastTransitionTime: "2023-09-13T13:14:08Z"
        message: |-
          Could not evaluate exposure to update risk SomeInvokerThing (evaluation is throttled until 13:24:08Z)
            SomeInvokerThing description: On clusters on default invoker user, this imaginary bug can happen.
            SomeInvokerThing URL: https://bug.example.com/a

          Could not evaluate exposure to update risk SomeChannelThing (evaluation is throttled until 13:24:08Z)
            SomeChannelThing description: On clusters with the channel set to 'buggy', this imaginary bug can happen.
            SomeChannelThing URL: https://bug.example.com/b
        reason: EvaluationFailed
        status: Unknown
        type: Recommended
      release:
        image: registry.ci.openshift.org/ocp/release@sha256:66c753e8b75d172f2a3f7ba13363383a76ecbc7ecdc00f3a423bef4ea8560405
        version: 4.15.0-0.nightly-2023-09-17-000000
      risks:
      - matchingRules:
        - promql:
            promql: cluster_installer
          type: PromQL
        message: On clusters on default invoker user, this imaginary bug can happen.
        name: SomeInvokerThing
        url: https://bug.example.com/a
      - matchingRules:
        - promql:
            promql: |-
              group(cluster_version_available_updates{channel="buggy"})
              or
              0 * group(cluster_version_available_updates{channel!="buggy"})
          type: PromQL
        message: On clusters with the channel set to 'buggy', this imaginary bug can
          happen.
        name: SomeChannelThing
        url: https://bug.example.com/b
    ...

// After 5 minutes, the lastTransitionTime of the 3 conditionalUpdates was changed to 2023-09-13T13:19:09Z but none of status were changed

# oc get clusterversion -oyaml
...
    conditionalUpdates:
    - conditions:
      - lastTransitionTime: "2023-09-13T13:19:09Z"
        message: The update is recommended, because none of the conditional update
          risks apply to this cluster.
        reason: AsExpected
        status: "True"
        type: Recommended
      release:
        image: registry.ci.openshift.org/ocp/release@sha256:e385a786f122c6c0e8848ecb9901f510676438f17af8a5c4c206807a9bc0bf28
        version: 4.15.0-0.nightly-2023-09-19-222222
      risks:
      - matchingRules:
        - promql:
            promql: |-
              cluster_infrastructure_provider{type=~"nonexist"}
              or
              0 * cluster_infrastructure_provider
          type: PromQL
        message: Clusters on nonexist provider, this imaginary bug can happen.
        name: SomeInfrastructureThing
        url: https://bug.example.com/c
    - conditions:
      - lastTransitionTime: "2023-09-13T13:19:09Z"
        message: Too many CI failures on this release, so do not update to it https://openshift-release.apps.ci.l2s4.p1.openshiftapps.com/releasestream/4.10.0-0.nightly/release/4.10.0-0.nightly-2021-11-24-075634
        reason: ReleaseIsRejected
        status: "False"
        type: Recommended
      release:
        image: registry.ci.openshift.org/ocp/release@sha256:a5cd1b44e5b25b8a617d92a1f947297f56fc9bad104c117a8e452f932e1e2fd0
        version: 4.15.0-0.nightly-2023-09-18-111111
      risks:
      - matchingRules:
        - type: Always
        message: Too many CI failures on this release, so do not update to it
        name: ReleaseIsRejected
        url: https://openshift-release.apps.ci.l2s4.p1.openshiftapps.com/releasestream/4.10.0-0.nightly/release/4.10.0-0.nightly-2021-11-24-075634
    - conditions:
      - lastTransitionTime: "2023-09-13T13:19:09Z"
        message: |-
          Could not evaluate exposure to update risk SomeInvokerThing (evaluation is throttled until 13:24:08Z)
            SomeInvokerThing description: On clusters on default invoker user, this imaginary bug can happen.
            SomeInvokerThing URL: https://bug.example.com/a

          Could not evaluate exposure to update risk SomeChannelThing (evaluation is throttled until 13:24:08Z)
            SomeChannelThing description: On clusters with the channel set to 'buggy', this imaginary bug can happen.
            SomeChannelThing URL: https://bug.example.com/b
        reason: EvaluationFailed
        status: Unknown
        type: Recommended
      release:
        image: registry.ci.openshift.org/ocp/release@sha256:66c753e8b75d172f2a3f7ba13363383a76ecbc7ecdc00f3a423bef4ea8560405
        version: 4.15.0-0.nightly-2023-09-17-000000
      risks:
      - matchingRules:
        - promql:
            promql: cluster_installer
          type: PromQL
        message: On clusters on default invoker user, this imaginary bug can happen.
        name: SomeInvokerThing
        url: https://bug.example.com/a
      - matchingRules:
        - promql:
            promql: |-
              group(cluster_version_available_updates{channel="buggy"})
              or
              0 * group(cluster_version_available_updates{channel!="buggy"})
          type: PromQL
        message: On clusters with the channel set to 'buggy', this imaginary bug can
          happen.
        name: SomeChannelThing
        url: https://bug.example.com/b
   ...

// After 5 minutes, the lastTransitionTime of the 3 conditionalUpdates was changed to 2023-09-13T13:24:11Z but only one of status were changed

# oc get clusterversion -oyaml
...
    conditionalUpdates:
    - conditions:
      - lastTransitionTime: "2023-09-13T13:24:11Z"
        message: The update is recommended, because none of the conditional update
          risks apply to this cluster.
        reason: AsExpected
        status: "True"
        type: Recommended
      release:
        image: registry.ci.openshift.org/ocp/release@sha256:e385a786f122c6c0e8848ecb9901f510676438f17af8a5c4c206807a9bc0bf28
        version: 4.15.0-0.nightly-2023-09-19-222222
      risks:
      - matchingRules:
        - promql:
            promql: |-
              cluster_infrastructure_provider{type=~"nonexist"}
              or
              0 * cluster_infrastructure_provider
          type: PromQL
        message: Clusters on nonexist provider, this imaginary bug can happen.
        name: SomeInfrastructureThing
        url: https://bug.example.com/c
    - conditions:
      - lastTransitionTime: "2023-09-13T13:24:11Z"
        message: Too many CI failures on this release, so do not update to it https://openshift-release.apps.ci.l2s4.p1.openshiftapps.com/releasestream/4.10.0-0.nightly/release/4.10.0-0.nightly-2021-11-24-075634
        reason: ReleaseIsRejected
        status: "False"
        type: Recommended
      release:
        image: registry.ci.openshift.org/ocp/release@sha256:a5cd1b44e5b25b8a617d92a1f947297f56fc9bad104c117a8e452f932e1e2fd0
        version: 4.15.0-0.nightly-2023-09-18-111111
      risks:
      - matchingRules:
        - type: Always
        message: Too many CI failures on this release, so do not update to it
        name: ReleaseIsRejected
        url: https://openshift-release.apps.ci.l2s4.p1.openshiftapps.com/releasestream/4.10.0-0.nightly/release/4.10.0-0.nightly-2021-11-24-075634
    - conditions:
      - lastTransitionTime: "2023-09-13T13:24:11Z"
        message: |-
          On clusters on default invoker user, this imaginary bug can happen. https://bug.example.com/a

          Could not evaluate exposure to update risk SomeChannelThing (evaluation is throttled until 13:34:11Z)
            SomeChannelThing description: On clusters with the channel set to 'buggy', this imaginary bug can happen.
            SomeChannelThing URL: https://bug.example.com/b
        reason: MultipleReasons
        status: "False"
        type: Recommended
      release:
        image: registry.ci.openshift.org/ocp/release@sha256:66c753e8b75d172f2a3f7ba13363383a76ecbc7ecdc00f3a423bef4ea8560405
        version: 4.15.0-0.nightly-2023-09-17-000000
      risks:
      - matchingRules:
        - promql:
            promql: cluster_installer
          type: PromQL
        message: On clusters on default invoker user, this imaginary bug can happen.
        name: SomeInvokerThing
        url: https://bug.example.com/a
      - matchingRules:
        - promql:
            promql: |-
              group(cluster_version_available_updates{channel="buggy"})
              or
              0 * group(cluster_version_available_updates{channel!="buggy"})
          type: PromQL
        message: On clusters with the channel set to 'buggy', this imaginary bug can
          happen.
        name: SomeChannelThing
        url: https://bug.example.com/b
    ...

After the change, lastTransitionTime is only changed when status is changed.

// The lastTransitionTime of the 3 conditions were 2023-09-14T02:09:21Z after the upstream was patched

...
conditionalUpdates:
    - conditions:
      - lastTransitionTime: "2023-09-14T02:09:21Z"
        message: The update is recommended, because none of the conditional update
          risks apply to this cluster.
        reason: AsExpected
        status: "True"
        type: Recommended
      release:
        image: registry.ci.openshift.org/ocp/release@sha256:e385a786f122c6c0e8848ecb9901f510676438f17af8a5c4c206807a9bc0bf28
        version: 4.15.0-0.nightly-2023-09-19-222222
      risks:
      - matchingRules:
        - promql:
            promql: |-
              cluster_infrastructure_provider{type=~"nonexist"}
              or
              0 * cluster_infrastructure_provider
          type: PromQL
        message: Clusters on nonexist provider, this imaginary bug can happen.
        name: SomeInfrastructureThing
        url: https://bug.example.com/c
    - conditions:
      - lastTransitionTime: "2023-09-14T02:09:21Z"
        message: Too many CI failures on this release, so do not update to it https://openshift-release.apps.ci.l2s4.p1.openshiftapps.com/releasestream/4.10.0-0.nightly/release/4.10.0-0.nightly-2021-11-24-075634
        reason: ReleaseIsRejected
        status: "False"
        type: Recommended
      release:
        image: registry.ci.openshift.org/ocp/release@sha256:a5cd1b44e5b25b8a617d92a1f947297f56fc9bad104c117a8e452f932e1e2fd0
        version: 4.15.0-0.nightly-2023-09-18-111111
      risks:
      - matchingRules:
        - type: Always
        message: Too many CI failures on this release, so do not update to it
        name: ReleaseIsRejected
        url: https://openshift-release.apps.ci.l2s4.p1.openshiftapps.com/releasestream/4.10.0-0.nightly/release/4.10.0-0.nightly-2021-11-24-075634
    - conditions:
      - lastTransitionTime: "2023-09-14T02:09:21Z"
        message: |-
          Could not evaluate exposure to update risk SomeInvokerThing (evaluation is throttled until 02:19:21Z)
            SomeInvokerThing description: On clusters on default invoker user, this imaginary bug can happen.
            SomeInvokerThing URL: https://bug.example.com/a

          Could not evaluate exposure to update risk SomeChannelThing (evaluation is throttled until 02:19:21Z)
            SomeChannelThing description: On clusters with the channel set to 'buggy', this imaginary bug can happen.
            SomeChannelThing URL: https://bug.example.com/b
        reason: EvaluationFailed
        status: Unknown
        type: Recommended
      release:
        image: registry.ci.openshift.org/ocp/release@sha256:66c753e8b75d172f2a3f7ba13363383a76ecbc7ecdc00f3a423bef4ea8560405
        version: 4.15.0-0.nightly-2023-09-17-000000
      risks:
      - matchingRules:
        - promql:
            promql: cluster_installer
          type: PromQL
        message: On clusters on default invoker user, this imaginary bug can happen.
        name: SomeInvokerThing
        url: https://bug.example.com/a
      - matchingRules:
        - promql:
            promql: |-
              group(cluster_version_available_updates{channel="buggy"})
              or
              0 * group(cluster_version_available_updates{channel!="buggy"})
          type: PromQL
        message: On clusters with the channel set to 'buggy', this imaginary bug can
          happen.
        name: SomeChannelThing
        url: https://bug.example.com/b
...

// After around 10 minutes, the status of the last condition was changed from Unknown to False with reason MultipleReasons, and the lastTransitionTime was changed to 2023-09-14T02:20:19Z. The lastTransitionTime of the rest conditions kept at 2023-09-14T02:09:21Z

...
 conditionalUpdates:
    - conditions:
      - lastTransitionTime: "2023-09-14T02:09:21Z"
        message: The update is recommended, because none of the conditional update
          risks apply to this cluster.
        reason: AsExpected
        status: "True"
        type: Recommended
      release:
        image: registry.ci.openshift.org/ocp/release@sha256:e385a786f122c6c0e8848ecb9901f510676438f17af8a5c4c206807a9bc0bf28
        version: 4.15.0-0.nightly-2023-09-19-222222
      risks:
      - matchingRules:
        - promql:
            promql: |-
              cluster_infrastructure_provider{type=~"nonexist"}
              or
              0 * cluster_infrastructure_provider
          type: PromQL
        message: Clusters on nonexist provider, this imaginary bug can happen.
        name: SomeInfrastructureThing
        url: https://bug.example.com/c
    - conditions:
      - lastTransitionTime: "2023-09-14T02:09:21Z"
        message: Too many CI failures on this release, so do not update to it https://openshift-release.apps.ci.l2s4.p1.openshiftapps.com/releasestream/4.10.0-0.nightly/release/4.10.0-0.nightly-2021-11-24-075634
        reason: ReleaseIsRejected
        status: "False"
        type: Recommended
      release:
        image: registry.ci.openshift.org/ocp/release@sha256:a5cd1b44e5b25b8a617d92a1f947297f56fc9bad104c117a8e452f932e1e2fd0
        version: 4.15.0-0.nightly-2023-09-18-111111
      risks:
      - matchingRules:
        - type: Always
        message: Too many CI failures on this release, so do not update to it
        name: ReleaseIsRejected
        url: https://openshift-release.apps.ci.l2s4.p1.openshiftapps.com/releasestream/4.10.0-0.nightly/release/4.10.0-0.nightly-2021-11-24-075634
    - conditions:
      - lastTransitionTime: "2023-09-14T02:20:19Z"
        message: |-
          On clusters on default invoker user, this imaginary bug can happen. https://bug.example.com/a

          Could not evaluate exposure to update risk SomeChannelThing (evaluation is throttled until 02:30:19Z)
            SomeChannelThing description: On clusters with the channel set to 'buggy', this imaginary bug can happen.
            SomeChannelThing URL: https://bug.example.com/b
        reason: MultipleReasons
        status: "False"
        type: Recommended
      release:
        image: registry.ci.openshift.org/ocp/release@sha256:66c753e8b75d172f2a3f7ba13363383a76ecbc7ecdc00f3a423bef4ea8560405
        version: 4.15.0-0.nightly-2023-09-17-000000
      risks:
      - matchingRules:
        - promql:
            promql: cluster_installer
          type: PromQL
        message: On clusters on default invoker user, this imaginary bug can happen.
        name: SomeInvokerThing
        url: https://bug.example.com/a
      - matchingRules:
        - promql:
            promql: |-
              group(cluster_version_available_updates{channel="buggy"})
              or
              0 * group(cluster_version_available_updates{channel!="buggy"})
          type: PromQL
        message: On clusters with the channel set to 'buggy', this imaginary bug can
          happen.
        name: SomeChannelThing
        url: https://bug.example.com/b
...

// The lastTransitionTime of the last condition kept at 2023-09-14T02:20:19Z even if the reason was changed to SomeInvokerThing

...
conditionalUpdates:
    - conditions:
      - lastTransitionTime: "2023-09-14T02:09:21Z"
        message: The update is recommended, because none of the conditional update
          risks apply to this cluster.
        reason: AsExpected
        status: "True"
        type: Recommended
      release:
        image: registry.ci.openshift.org/ocp/release@sha256:e385a786f122c6c0e8848ecb9901f510676438f17af8a5c4c206807a9bc0bf28
        version: 4.15.0-0.nightly-2023-09-19-222222
      risks:
      - matchingRules:
        - promql:
            promql: |-
              cluster_infrastructure_provider{type=~"nonexist"}
              or
              0 * cluster_infrastructure_provider
          type: PromQL
        message: Clusters on nonexist provider, this imaginary bug can happen.
        name: SomeInfrastructureThing
        url: https://bug.example.com/c
    - conditions:
      - lastTransitionTime: "2023-09-14T02:09:21Z"
        message: Too many CI failures on this release, so do not update to it https://openshift-release.apps.ci.l2s4.p1.openshiftapps.com/releasestream/4.10.0-0.nightly/release/4.10.0-0.nightly-2021-11-24-075634
        reason: ReleaseIsRejected
        status: "False"
        type: Recommended
      release:
        image: registry.ci.openshift.org/ocp/release@sha256:a5cd1b44e5b25b8a617d92a1f947297f56fc9bad104c117a8e452f932e1e2fd0
        version: 4.15.0-0.nightly-2023-09-18-111111
      risks:
      - matchingRules:
        - type: Always
        message: Too many CI failures on this release, so do not update to it
        name: ReleaseIsRejected
        url: https://openshift-release.apps.ci.l2s4.p1.openshiftapps.com/releasestream/4.10.0-0.nightly/release/4.10.0-0.nightly-2021-11-24-075634
    - conditions:
      - lastTransitionTime: "2023-09-14T02:20:19Z"
        message: On clusters on default invoker user, this imaginary bug can happen.
          https://bug.example.com/a
        reason: SomeInvokerThing
        status: "False"
        type: Recommended
      release:
        image: registry.ci.openshift.org/ocp/release@sha256:66c753e8b75d172f2a3f7ba13363383a76ecbc7ecdc00f3a423bef4ea8560405
        version: 4.15.0-0.nightly-2023-09-17-000000
      risks:
      - matchingRules:
        - promql:
            promql: cluster_installer
          type: PromQL
        message: On clusters on default invoker user, this imaginary bug can happen.
        name: SomeInvokerThing
        url: https://bug.example.com/a
      - matchingRules:
        - promql:
            promql: |-
              group(cluster_version_available_updates{channel="buggy"})
              or
              0 * group(cluster_version_available_updates{channel!="buggy"})
          type: PromQL
        message: On clusters with the channel set to 'buggy', this imaginary bug can
          happen.
        name: SomeChannelThing
        url: https://bug.example.com/b
...

The test result looks good to me.

Petr, please let me know if there is additional test against it needed

@petr-muller
Copy link
Member Author

@shellyyang1989 this looks good, thanks for testing!

@shellyyang1989
Copy link
Contributor

shellyyang1989 commented Sep 14, 2023

Thanks for confirming, then

/label qe-approved

@openshift-ci openshift-ci bot added the qe-approved Signifies that QE has signed off on this PR label Sep 14, 2023
Copy link
Member

@wking wking left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Sep 14, 2023
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Sep 14, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: petr-muller, wking

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot
Copy link
Contributor

/retest-required

Remaining retests: 0 against base HEAD 16108ef and 2 for PR HEAD 9bd912c in total

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Sep 18, 2023

@petr-muller: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-merge-robot openshift-merge-robot merged commit 440aed1 into openshift:master Sep 18, 2023
@petr-muller petr-muller deleted the do-not-reset-last-transition-time branch September 18, 2023 21:04
wking pushed a commit to wking/cluster-version-operator that referenced this pull request Sep 25, 2023
…openshift#964)

* availableupdates: do not reset lastTransitionTime on unchanged status

The code in `evaluateConditionalUpdates` correctly uses
`SetStatusCondition` to set conditions, which only updates the
`LastTransitionTime` field when `Status` differs between the original
and updated state. Previously though, the original state always
contained empty conditions, because conditional updates are always
obtained from OSUS and the fresh structure was never updated with
existing conditions from the in-cluster status.

* review: use existing mock condition instead of new code

* review: use real queue instead of a mock
openshift-cherrypick-robot pushed a commit to openshift-cherrypick-robot/cluster-version-operator that referenced this pull request Sep 27, 2023
…openshift#964)

* availableupdates: do not reset lastTransitionTime on unchanged status

The code in `evaluateConditionalUpdates` correctly uses
`SetStatusCondition` to set conditions, which only updates the
`LastTransitionTime` field when `Status` differs between the original
and updated state. Previously though, the original state always
contained empty conditions, because conditional updates are always
obtained from OSUS and the fresh structure was never updated with
existing conditions from the in-cluster status.

* review: use existing mock condition instead of new code

* review: use real queue instead of a mock
wking pushed a commit to wking/cluster-version-operator that referenced this pull request Oct 25, 2023
…openshift#964)

* availableupdates: do not reset lastTransitionTime on unchanged status

The code in `evaluateConditionalUpdates` correctly uses
`SetStatusCondition` to set conditions, which only updates the
`LastTransitionTime` field when `Status` differs between the original
and updated state. Previously though, the original state always
contained empty conditions, because conditional updates are always
obtained from OSUS and the fresh structure was never updated with
existing conditions from the in-cluster status.

* review: use existing mock condition instead of new code

* review: use real queue instead of a mock
wking pushed a commit to wking/cluster-version-operator that referenced this pull request Nov 2, 2023
…openshift#964)

* availableupdates: do not reset lastTransitionTime on unchanged status

The code in `evaluateConditionalUpdates` correctly uses
`SetStatusCondition` to set conditions, which only updates the
`LastTransitionTime` field when `Status` differs between the original
and updated state. Previously though, the original state always
contained empty conditions, because conditional updates are always
obtained from OSUS and the fresh structure was never updated with
existing conditions from the in-cluster status.

* review: use existing mock condition instead of new code

* review: use real queue instead of a mock
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. qe-approved Signifies that QE has signed off on this PR tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants