Skip to content

Conversation

@openshift-cherrypick-robot

This is an automated cherry-pick of #896

/assign petr-muller

petr-muller and others added 4 commits March 20, 2023 13:20
When we separated payload load from payload apply (openshift#683) the context
used for the retrieval changed as well. It went from one that was
constrained by syncTimeout (2 -4 minutes) [1] to being the unconstrained
shutdownContext [2]. However if "force" is specified we explicitly set a
2 minute timeout in RetrievePayload. This commit creates a new context
with a reasonable timeout for RetrievePayload regardless of "force".

[1]
https://github.com/openshift/cluster-version-operator/blob/57ffa7c610fb92ef4ccd9e9c49e75915e86e9296/pkg/cvo/sync_worker.go#L605

[2]
https://github.com/openshift/cluster-version-operator/blob/57ffa7c610fb92ef4ccd9e9c49e75915e86e9296/pkg/cvo/cvo.go#L413
The `RetrievePayload` performs two operations: verification and
download. Both can take a non-trivial amount of time to terminate, up to
"hanging" where CVO needs to abort the operation. The verification
result can be ignored when upgrade is forced. The CVO calls
`RetrievePayload` with a context that does not set a deadline, so
`RetrievePayload` previously set its own internal deadline, common for
both operations. This led to a suboptimal behavior on forced upgrades,
where "hanging" verification could eat the whole timeout budget, got
cancelled but its result was ignored (because of force). The code
tried to proceed with download but that immediately aborts because of
the expired context.

Improve timeouts in `RetrievePayload` for both input context states:
with and without deadline. If the input context sets a deadline, it is
respected. If it does not, the default, separate deadlines are applied
for both operations. In both cases, the code makes sure the hanging
verification never spends the whole budget. When verification terminates
fast, the rest of its alloted time is provided to the download
operation.
@openshift-ci-robot
Copy link
Contributor

@openshift-cherrypick-robot: Bugzilla Bug 2090680 has been cloned as Jira Issue OCPBUGS-10565. Retitling PR to link against new bug.
/retitle [release-4.12] OCPBUGS-10565: RetrievePayload: Improve timeouts and cover behavior with tests

Details

In response to this:

This is an automated cherry-pick of #896

/assign petr-muller

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci openshift-ci bot changed the title [release-4.12] Bug 2090680: RetrievePayload: Improve timeouts and cover behavior with tests [release-4.12] OCPBUGS-10565: RetrievePayload: Improve timeouts and cover behavior with tests Mar 20, 2023
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 20, 2023

@openshift-cherrypick-robot: No Bugzilla bug is referenced in the title of this pull request.
To reference a bug, add 'Bug XXX:' to the title of this pull request and request another bug refresh with /bugzilla refresh.

Details

In response to this:

[release-4.12] OCPBUGS-10565: RetrievePayload: Improve timeouts and cover behavior with tests

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Mar 20, 2023
@openshift-ci openshift-ci bot requested a review from LalatenduMohanty March 20, 2023 13:21
@openshift-ci-robot openshift-ci-robot added the jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. label Mar 20, 2023
@openshift-ci openshift-ci bot requested a review from wking March 20, 2023 13:21
@openshift-ci-robot openshift-ci-robot added the bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. label Mar 20, 2023
@openshift-ci-robot
Copy link
Contributor

@openshift-cherrypick-robot: This pull request references Jira Issue OCPBUGS-10565, which is valid. The bug has been moved to the POST state.

6 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.12.z) matches configured target version for branch (4.12.z)
  • bug is in the state New, which is one of the valid states (NEW, ASSIGNED, POST)
  • dependent bug Bugzilla Bug 2090680 is in the state VERIFIED, which is one of the valid states (VERIFIED, RELEASE PENDING, CLOSED (ERRATA), CLOSED (CURRENT RELEASE), CLOSED (DONE))
  • dependent Bugzilla Bug 2090680 targets the "4.13.0" version, which is one of the valid target versions: 4.13.0
  • bug has dependents

Requesting review from QA contact:
/cc @jiajliu

The bug has been updated to refer to the pull request using the external bug tracker.

Details

In response to this:

This is an automated cherry-pick of #896

/assign petr-muller

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci openshift-ci bot requested a review from jiajliu March 20, 2023 13:21
@petr-muller
Copy link
Member

/test unit
TestCVO_UpgradeFailedPayloadLoadWithCapsChanges is a flake

/override ci/prow/e2e-agnostic-ovn-upgrade-out-of-change

alert TelemeterClientFailures fired for 967 seconds with labels: {namespace="openshift-monitoring", severity="warning"} Failure Mar 20 15:33:53.074: Unexpected alerts fired or pending during the upgrade:
alert TelemeterClientFailures fired for 967 seconds with labels: {namespace="openshift-monitoring", severity="warning"}

^^^ is the only failure and it is unrelated to CVO

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 20, 2023

@petr-muller: petr-muller unauthorized: /override is restricted to Repo administrators, approvers in top level OWNERS file, and the following github teams:openshift: openshift-release-oversight.

Details

In response to this:

/test unit
TestCVO_UpgradeFailedPayloadLoadWithCapsChanges is a flake

/override ci/prow/e2e-agnostic-ovn-upgrade-out-of-change

alert TelemeterClientFailures fired for 967 seconds with labels: {namespace="openshift-monitoring", severity="warning"} Failure Mar 20 15:33:53.074: Unexpected alerts fired or pending during the upgrade:
alert TelemeterClientFailures fired for 967 seconds with labels: {namespace="openshift-monitoring", severity="warning"}

^^^ is the only failure and it is unrelated to CVO

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@petr-muller
Copy link
Member

Haha OWNERS files in release branches 🤣
/test all

@petr-muller
Copy link
Member

/test e2e-agnostic-operator

Copy link
Member

@LalatenduMohanty LalatenduMohanty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Mar 28, 2023
@LalatenduMohanty
Copy link
Member

/label backport-risk-assessed

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 28, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: LalatenduMohanty, openshift-cherrypick-robot

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added approved Indicates a PR has been approved by an approver from all required OWNERS files. backport-risk-assessed Indicates a PR to a release branch has been evaluated and considered safe to accept. labels Mar 28, 2023
@evakhoni
Copy link
Contributor

pre-merge verified in https://issues.redhat.com/browse/OCPBUGS-10565#comment-22011696
/label qe-approved

@openshift-ci openshift-ci bot added the qe-approved Signifies that QE has signed off on this PR label Mar 29, 2023
@petr-muller
Copy link
Member

@jianlinliu @jiajliu can we get a cherry-pick-approved here to merge this?

@jianlinliu
Copy link

/label cherry-pick-approved

@openshift-ci openshift-ci bot added the cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. label Mar 30, 2023
@openshift-ci-robot
Copy link
Contributor

/retest-required

Remaining retests: 0 against base HEAD c6d9baa and 2 for PR HEAD c484ef9 in total

@evakhoni
Copy link
Contributor

/retest-required

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 30, 2023

@openshift-cherrypick-robot: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-merge-robot openshift-merge-robot merged commit a5c4031 into openshift:release-4.12 Mar 30, 2023
@openshift-ci-robot
Copy link
Contributor

@openshift-cherrypick-robot: Jira Issue OCPBUGS-10565: All pull requests linked via external trackers have merged:

Jira Issue OCPBUGS-10565 has been moved to the MODIFIED state.

Details

In response to this:

This is an automated cherry-pick of #896

/assign petr-muller

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-bot
Copy link
Contributor

[ART PR BUILD NOTIFIER]

This PR has been included in build cluster-version-operator-container-v4.12.0-202305022015.p0.ga5c4031.assembly.stream for distgit cluster-version-operator.
All builds following this will include this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. backport-risk-assessed Indicates a PR to a release branch has been evaluated and considered safe to accept. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. qe-approved Signifies that QE has signed off on this PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants