Skip to content

Conversation

@petr-muller
Copy link
Member

This is a cherry-pick of #27645 and #27678

  • upgrade/adminack: guarantee one admin ack check post-upgrade
  • upgrade/adminack: optimize the post-upgrade check
  • upgrade/adminack: simplify polling and unblock "guaranteed" post-upgrade check
  • upgrade/adminack: wait up to 4m until gate propagates to upgradeable

/hold

While looking into OCPBUGS-5505 I discovered that some 4.10->4.11
upgrade job runs perform an Admin Ack check, while some do not. 4.11 has
a `ack-4.11-kube-1.25-api-removals-in-4.12` gate, so these upgrade jobs
sometimes test that `Upgradeable` goes `false` after the ugprade, and
sometimes they do not. This is only determined by the polling race
condition: the check is executed once per 10 minutes, and we cancel the
polling after upgrade is completed. This means that in some cases we are
lucky and manage to run one check before the cancel, and sometimes we
are not and only check while still on the base version.

Add a guaranteed single check execution after the upgrade, so that admin
ack is always checked at least once with the upgrade target version.
Doing checks after `done` is signalled has prior art in the alert test.
The `done` signal is either a timeout or "upgrade finished, stop testing". We do not need to perform the last check in the former case. Track versions that we check and when we get the signal, check whether the current version was checked at least once, and if not, check it before terminating.
…ade check

openshift#27645 intended to add a guaranteed post-upgrade check but I have overlooked how exactly the polling is implemented and terminated, leading to the post-upgrade check never actually execute.

Previously the test used `PollImmediateWithContext` for the each-10-minutes check. The `ConditionFunc` never actually returned `true` or non-nil `err`, so the `PollImmediateWithContext` never terminated by the means of `ConditionFunc`: it was always terminated by the `ctx.Done()` that the framework does on finished upgrade (or a test timeout). This means that `PollImmediateWithContext` always terminated with `err=wait.ErrWaitTimeout` and the `Test` method immediately returned, so the "guaranteed" check code is never reached.

Given our `ConditionFunc` never terminates the polling, we can simplify and use the `wait.UntilWithContext` instead, which is a simpler version that precisely implements the desired loop (poll until context is done).
During testing of OCPBUGS-5505, it was discovered that even with
shortening the CVO cache TTL, CVO may still only update `Upgradeable`
in its sync interval, which may be as high as 4 minutes. Hence the
tests needs to wait for that time (I added 5 second buffer on top of
that).
@openshift-ci-robot openshift-ci-robot added jira/severity-moderate Referenced Jira bug's severity is moderate for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Jan 26, 2023
@openshift-ci-robot
Copy link

@petr-muller: This pull request references Jira Issue OCPBUGS-6503, which is invalid:

  • expected the bug to target the "4.12.z" version, but it targets "4.13.0" instead
  • expected Jira Issue OCPBUGS-6503 to depend on a bug targeting a version in 4.13.0 and in one of the following states: VERIFIED, RELEASE PENDING, CLOSED (ERRATA), CLOSED (CURRENT RELEASE), CLOSED (DONE), but no dependents were found

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

Details

In response to this:

This is a cherry-pick of #27645 and #27678

  • upgrade/adminack: guarantee one admin ack check post-upgrade
  • upgrade/adminack: optimize the post-upgrade check
  • upgrade/adminack: simplify polling and unblock "guaranteed" post-upgrade check
  • upgrade/adminack: wait up to 4m until gate propagates to upgradeable

/hold

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jan 26, 2023

@petr-muller: No Bugzilla bug is referenced in the title of this pull request.
To reference a bug, add 'Bug XXX:' to the title of this pull request and request another bug refresh with /bugzilla refresh.

Details

In response to this:

OCPBUGS-6503: upgrade/adminack: simplify polling and unblock "guaranteed" post-upgrade check

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 26, 2023
@petr-muller petr-muller changed the title OCPBUGS-6503: upgrade/adminack: simplify polling and unblock "guaranteed" post-upgrade check upgrade/adminack: simplify polling and unblock "guaranteed" post-upgrade check Jan 26, 2023
@openshift-ci-robot openshift-ci-robot removed jira/severity-moderate Referenced Jira bug's severity is moderate for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Jan 26, 2023
@openshift-ci-robot
Copy link

@petr-muller: No Jira issue is referenced in the title of this pull request.
To reference a jira issue, add 'XYZ-NNN:' to the title of this pull request and request another refresh with /jira refresh.

Details

In response to this:

This is a cherry-pick of #27645 and #27678

  • upgrade/adminack: guarantee one admin ack check post-upgrade
  • upgrade/adminack: optimize the post-upgrade check
  • upgrade/adminack: simplify polling and unblock "guaranteed" post-upgrade check
  • upgrade/adminack: wait up to 4m until gate propagates to upgradeable

/hold

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jan 26, 2023

@petr-muller: No Bugzilla bug is referenced in the title of this pull request.
To reference a bug, add 'Bug XXX:' to the title of this pull request and request another bug refresh with /bugzilla refresh.

Details

In response to this:

upgrade/adminack: simplify polling and unblock "guaranteed" post-upgrade check

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@petr-muller petr-muller changed the title upgrade/adminack: simplify polling and unblock "guaranteed" post-upgrade check upgrade/adminack: guarantee one admin ack check post-upgrade Jan 26, 2023
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jan 26, 2023

@petr-muller: No Bugzilla bug is referenced in the title of this pull request.
To reference a bug, add 'Bug XXX:' to the title of this pull request and request another bug refresh with /bugzilla refresh.

Details

In response to this:

upgrade/adminack: guarantee one admin ack check post-upgrade

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@petr-muller petr-muller changed the title upgrade/adminack: guarantee one admin ack check post-upgrade [release-4.12] upgrade/adminack: guarantee one admin ack check post-upgrade Jan 26, 2023
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jan 26, 2023

@petr-muller: No Bugzilla bug is referenced in the title of this pull request.
To reference a bug, add 'Bug XXX:' to the title of this pull request and request another bug refresh with /bugzilla refresh.

Details

In response to this:

[release-4.12] upgrade/adminack: guarantee one admin ack check post-upgrade

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@petr-muller
Copy link
Member Author

/retest

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jan 27, 2023

@petr-muller: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-aws-ovn-single-node-upgrade d1f81bb link false /test e2e-aws-ovn-single-node-upgrade
ci/prow/e2e-openstack-ovn d1f81bb link false /test e2e-openstack-ovn
ci/prow/e2e-aws-ovn-etcd-scaling d1f81bb link false /test e2e-aws-ovn-etcd-scaling
ci/prow/e2e-aws-csi d1f81bb link false /test e2e-aws-csi

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@petr-muller
Copy link
Member Author

/jira cherrypick OCPBUGS-6503

@openshift-ci-robot
Copy link

@petr-muller: Jira Issue OCPBUGS-6503 has been cloned as Jira Issue OCPBUGS-6850. Retitling PR to link against new bug.
/retitle OCPBUGS-6850: [release-4.12] upgrade/adminack: guarantee one admin ack check post-upgrade

Details

In response to this:

/jira cherrypick OCPBUGS-6503

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci openshift-ci bot changed the title [release-4.12] upgrade/adminack: guarantee one admin ack check post-upgrade OCPBUGS-6850: [release-4.12] upgrade/adminack: guarantee one admin ack check post-upgrade Jan 31, 2023
@openshift-ci-robot openshift-ci-robot added jira/severity-moderate Referenced Jira bug's severity is moderate for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. labels Jan 31, 2023
@openshift-ci openshift-ci bot requested a review from jiajliu February 6, 2023 10:02
Copy link
Member

@LalatenduMohanty LalatenduMohanty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Feb 6, 2023
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Feb 6, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: LalatenduMohanty, petr-muller

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 6, 2023
@stbenjam
Copy link
Member

stbenjam commented Feb 7, 2023

/label backport-risk-assessed
/label cherry-pick-approved

@openshift-ci openshift-ci bot added backport-risk-assessed Indicates a PR to a release branch has been evaluated and considered safe to accept. cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. labels Feb 7, 2023
@petr-muller
Copy link
Member Author

/uncc @jiajliu

@openshift-ci openshift-ci bot removed the request for review from jiajliu February 7, 2023 20:18
@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 7, 2023
@openshift-merge-robot
Copy link
Contributor

PR needs rebase.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@petr-muller
Copy link
Member Author

/unassign @prietyc123

@petr-muller
Copy link
Member Author

GitHub is acting funny: the commits from this PR are merged including the merge commit but for some reason the PR stays open.

@petr-muller petr-muller closed this Feb 7, 2023
@openshift-ci-robot
Copy link

@petr-muller: This pull request references Jira Issue OCPBUGS-6850. The bug has been updated to no longer refer to the pull request using the external bug tracker. All external bug links have been closed. The bug has been moved to the NEW state.

Details

In response to this:

This is a cherry-pick of #27645 and #27678

  • upgrade/adminack: guarantee one admin ack check post-upgrade
  • upgrade/adminack: optimize the post-upgrade check
  • upgrade/adminack: simplify polling and unblock "guaranteed" post-upgrade check
  • upgrade/adminack: wait up to 4m until gate propagates to upgradeable

/hold

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@petr-muller
Copy link
Member Author

35f8c05
/shrug

@openshift-ci openshift-ci bot added the ¯\_(ツ)_/¯ ¯\\\_(ツ)_/¯ label Feb 7, 2023
openshift-cherrypick-robot pushed a commit to openshift-cherrypick-robot/origin that referenced this pull request Feb 9, 2023
…-to-release-4.12

OCPBUGS-6850: [release-4.12] upgrade/adminack: guarantee one admin ack check post-upgrade
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. backport-risk-assessed Indicates a PR to a release branch has been evaluated and considered safe to accept. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. jira/severity-moderate Referenced Jira bug's severity is moderate for the branch this PR is targeting. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. ¯\_(ツ)_/¯ ¯\\\_(ツ)_/¯

Projects

None yet

Development

Successfully merging this pull request may close these issues.