Skip to content

Conversation

@petr-muller
Copy link
Member

@petr-muller petr-muller commented Jan 23, 2023

#27645 intended to add a guaranteed post-upgrade check but I have overlooked how exactly the polling is implemented and terminated, leading to the post-upgrade check never actually execute.

Previously the test used PollImmediateWithContext for the each-10-minutes check. The ConditionFunc never actually returned true or non-nil err, so the PollImmediateWithContext never terminated by the means of ConditionFunc: it was always terminated by the ctx.Done() that the framework does on finished upgrade (or a test timeout). This means that PollImmediateWithContext always terminated with err=wait.ErrWaitTimeout and the Test method immediately returned, so the "guaranteed" check code is never reached.

Given our ConditionFunc never terminates the polling, we can simplify and use the wait.UntilWithContext instead, which is a simpler version that precisely implements the desired loop (poll until context is done).


During testing of OCPBUGS-5505, it was discovered that even with shortening the CVO cache TTL, CVO may still only update Upgradeable in its sync interval, which may be as high as 4 minutes. Hence the tests needs to wait for that time (I added 5 second buffer on top of that).

…ade check

openshift#27645 intended to add a guaranteed post-upgrade check but I have overlooked how exactly the polling is implemented and terminated, leading to the post-upgrade check never actually execute.

Previously the test used `PollImmediateWithContext` for the each-10-minutes check. The `ConditionFunc` never actually returned `true` or non-nil `err`, so the `PollImmediateWithContext` never terminated by the means of `ConditionFunc`: it was always terminated by the `ctx.Done()` that the framework does on finished upgrade (or a test timeout). This means that `PollImmediateWithContext` always terminated with `err=wait.ErrWaitTimeout` and the `Test` method immediately returned, so the "guaranteed" check code is never reached.

Given our `ConditionFunc` never terminates the polling, we can simplify and use the `wait.UntilWithContext` instead, which is a simpler version that precisely implements the desired loop (poll until context is done).
@openshift-ci-robot openshift-ci-robot added jira/severity-moderate Referenced Jira bug's severity is moderate for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Jan 23, 2023
@openshift-ci-robot
Copy link

@petr-muller: This pull request references Jira Issue OCPBUGS-6503, which is invalid:

  • expected the bug to target the "4.13.0" version, but it targets "4.13.z" instead

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

Details

In response to this:

#27645 intended to add a guaranteed post-upgrade check but I have overlooked how exactly the polling is implemented and terminated, leading to the post-upgrade check never actually execute.

Previously the test used PollImmediateWithContext for the each-10-minutes check. The ConditionFunc never actually returned true or non-nil err, so the PollImmediateWithContext never terminated by the means of ConditionFunc: it was always terminated by the ctx.Done() that the framework does on finished upgrade (or a test timeout). This means that PollImmediateWithContext always terminated with err=wait.ErrWaitTimeout and the Test method immediately returned, so the "guaranteed" check code is never reached.

Given our ConditionFunc never terminates the polling, we can simplify and use the wait.UntilWithContext instead, which is a simpler version that precisely implements the desired loop (poll until context is done).

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@petr-muller
Copy link
Member Author

/jira refresh

@openshift-ci-robot openshift-ci-robot added jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. and removed jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Jan 23, 2023
@openshift-ci-robot
Copy link

@petr-muller: This pull request references Jira Issue OCPBUGS-6503, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.13.0) matches configured target version for branch (4.13.0)
  • bug is in the state ASSIGNED, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @jiajliu

Details

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci openshift-ci bot requested a review from jiajliu January 23, 2023 15:37
@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 23, 2023
@petr-muller
Copy link
Member Author

/retest

@petr-muller
Copy link
Member Author

/retest

CI registry is unhappy:

error: unable to read image registry.build04.ci.openshift.org/ci-op-9xtqjngg/stable@sha256:4a6ecaf7b68d463869afb275f56d4c2cbfea2bbf295d8a599bc15399ba8e3406: received unexpected HTTP status: 500 Internal Server Error 

@petr-muller
Copy link
Member Author

/retest

During testing of OCPBUGS-5505, it was discovered that even with
shortening the CVO cache TTL, CVO may still only update `Upgradeable`
in its sync interval, which may be as high as 4 minutes. Hence the
tests needs to wait for that time (I added 5 second buffer on top of
that).
@openshift-ci-robot
Copy link

@petr-muller: This pull request references Jira Issue OCPBUGS-6503, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.13.0) matches configured target version for branch (4.13.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @jiajliu

Details

In response to this:

#27645 intended to add a guaranteed post-upgrade check but I have overlooked how exactly the polling is implemented and terminated, leading to the post-upgrade check never actually execute.

Previously the test used PollImmediateWithContext for the each-10-minutes check. The ConditionFunc never actually returned true or non-nil err, so the PollImmediateWithContext never terminated by the means of ConditionFunc: it was always terminated by the ctx.Done() that the framework does on finished upgrade (or a test timeout). This means that PollImmediateWithContext always terminated with err=wait.ErrWaitTimeout and the Test method immediately returned, so the "guaranteed" check code is never reached.

Given our ConditionFunc never terminates the polling, we can simplify and use the wait.UntilWithContext instead, which is a simpler version that precisely implements the desired loop (poll until context is done).

During testing of OCPBUGS-5505, it was discovered that even with shortening the CVO cache TTL, CVO may still only update Upgradeable in its sync interval, which may be as high as 4 minutes. Hence the tests needs to wait for that time (I added 5 second buffer on top of that).

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot
Copy link

@petr-muller: This pull request references Jira Issue OCPBUGS-6503, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.13.0) matches configured target version for branch (4.13.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @jiajliu

Details

In response to this:

#27645 intended to add a guaranteed post-upgrade check but I have overlooked how exactly the polling is implemented and terminated, leading to the post-upgrade check never actually execute.

Previously the test used PollImmediateWithContext for the each-10-minutes check. The ConditionFunc never actually returned true or non-nil err, so the PollImmediateWithContext never terminated by the means of ConditionFunc: it was always terminated by the ctx.Done() that the framework does on finished upgrade (or a test timeout). This means that PollImmediateWithContext always terminated with err=wait.ErrWaitTimeout and the Test method immediately returned, so the "guaranteed" check code is never reached.

Given our ConditionFunc never terminates the polling, we can simplify and use the wait.UntilWithContext instead, which is a simpler version that precisely implements the desired loop (poll until context is done).


During testing of OCPBUGS-5505, it was discovered that even with shortening the CVO cache TTL, CVO may still only update Upgradeable in its sync interval, which may be as high as 4 minutes. Hence the tests needs to wait for that time (I added 5 second buffer on top of that).

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@petr-muller
Copy link
Member Author

/retest

@petr-muller
Copy link
Member Author

GCP jobs affected by OHSS-18195, hope the AWS ones give me something to work with

@petr-muller
Copy link
Member Author

@petr-muller
Copy link
Member Author

/retest

@petr-muller
Copy link
Member Author

/retest

Copy link
Member

@wking wking left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@petr-muller
Copy link
Member Author

/test e2e-aws-ovn-fips

@petr-muller
Copy link
Member Author

/retest-required

@petr-muller
Copy link
Member Author

/skip

@openshift-ci-robot
Copy link

/retest-required

Remaining retests: 0 against base HEAD 56b7201 and 1 for PR HEAD c3184c4 in total

@petr-muller
Copy link
Member Author

/test e2e-aws-ovn-fips

1 similar comment
@petr-muller
Copy link
Member Author

/test e2e-aws-ovn-fips

@petr-muller
Copy link
Member Author

[bz-OLM][invariant] alert/KubePodNotReady should not be at or above info in ns/openshift-marketplace 😐

/test e2e-aws-ovn-fips

@petr-muller
Copy link
Member Author

/test e2e-aws-ovn-fips

@petr-muller
Copy link
Member Author

/test ci/prow/e2e-aws-ovn-fips

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jan 31, 2023

@petr-muller: The specified target(s) for /test were not found.
The following commands are available to trigger required jobs:

  • /test e2e-aws-jenkins
  • /test e2e-aws-ovn-fips
  • /test e2e-aws-ovn-image-registry
  • /test e2e-aws-ovn-serial
  • /test e2e-gcp-ovn
  • /test e2e-gcp-ovn-builds
  • /test e2e-gcp-ovn-image-ecosystem
  • /test e2e-gcp-ovn-upgrade
  • /test extended_gssapi
  • /test extended_ldap_groups
  • /test extended_networking
  • /test images
  • /test lint
  • /test unit
  • /test verify
  • /test verify-deps

The following commands are available to trigger optional jobs:

  • /test 4.12-upgrade-from-stable-4.11-e2e-aws-ovn-upgrade-rollback
  • /test e2e-agnostic-ovn-cmd
  • /test e2e-aws
  • /test e2e-aws-csi
  • /test e2e-aws-csi-migration
  • /test e2e-aws-disruptive
  • /test e2e-aws-multitenant
  • /test e2e-aws-ovn
  • /test e2e-aws-ovn-cgroupsv2
  • /test e2e-aws-ovn-etcd-scaling
  • /test e2e-aws-ovn-single-node
  • /test e2e-aws-ovn-single-node-serial
  • /test e2e-aws-ovn-single-node-upgrade
  • /test e2e-aws-ovn-upgrade
  • /test e2e-aws-proxy
  • /test e2e-azure
  • /test e2e-azure-ovn-etcd-scaling
  • /test e2e-gcp-csi
  • /test e2e-gcp-disruptive
  • /test e2e-gcp-fips-serial
  • /test e2e-gcp-ovn-etcd-scaling
  • /test e2e-gcp-ovn-rt-upgrade
  • /test e2e-metal-ipi-ovn-dualstack
  • /test e2e-metal-ipi-ovn-ipv6
  • /test e2e-metal-ipi-sdn
  • /test e2e-metal-ipi-serial
  • /test e2e-metal-ipi-serial-ovn-ipv6
  • /test e2e-metal-ipi-virtualmedia
  • /test e2e-openstack-kuryr
  • /test e2e-openstack-ovn
  • /test e2e-openstack-serial
  • /test e2e-vsphere
  • /test e2e-vsphere-ovn-etcd-scaling
  • /test okd-e2e-gcp

Use /test all to run the following jobs that were automatically triggered:

  • pull-ci-openshift-origin-master-e2e-agnostic-ovn-cmd
  • pull-ci-openshift-origin-master-e2e-aws-csi
  • pull-ci-openshift-origin-master-e2e-aws-ovn-cgroupsv2
  • pull-ci-openshift-origin-master-e2e-aws-ovn-etcd-scaling
  • pull-ci-openshift-origin-master-e2e-aws-ovn-fips
  • pull-ci-openshift-origin-master-e2e-aws-ovn-serial
  • pull-ci-openshift-origin-master-e2e-aws-ovn-single-node
  • pull-ci-openshift-origin-master-e2e-aws-ovn-single-node-serial
  • pull-ci-openshift-origin-master-e2e-aws-ovn-single-node-upgrade
  • pull-ci-openshift-origin-master-e2e-aws-ovn-upgrade
  • pull-ci-openshift-origin-master-e2e-azure-ovn-etcd-scaling
  • pull-ci-openshift-origin-master-e2e-gcp-csi
  • pull-ci-openshift-origin-master-e2e-gcp-ovn
  • pull-ci-openshift-origin-master-e2e-gcp-ovn-builds
  • pull-ci-openshift-origin-master-e2e-gcp-ovn-etcd-scaling
  • pull-ci-openshift-origin-master-e2e-gcp-ovn-rt-upgrade
  • pull-ci-openshift-origin-master-e2e-gcp-ovn-upgrade
  • pull-ci-openshift-origin-master-e2e-metal-ipi-ovn-ipv6
  • pull-ci-openshift-origin-master-e2e-metal-ipi-sdn
  • pull-ci-openshift-origin-master-e2e-openstack-ovn
  • pull-ci-openshift-origin-master-e2e-vsphere-ovn-etcd-scaling
  • pull-ci-openshift-origin-master-images
  • pull-ci-openshift-origin-master-lint
  • pull-ci-openshift-origin-master-unit
  • pull-ci-openshift-origin-master-verify
  • pull-ci-openshift-origin-master-verify-deps
Details

In response to this:

/test ci/prow/e2e-aws-ovn-fips

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@petr-muller
Copy link
Member Author

/test e2e-aws-ovn-fips

@petr-muller
Copy link
Member Author

/override
ci/prow/e2e-aws-ovn-fips

@petr-muller
Copy link
Member Author

/override ci/prow/e2e-aws-ovn-fips

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jan 31, 2023

@petr-muller: petr-muller unauthorized: /override is restricted to Repo administrators, approvers in top level OWNERS file, and the following github teams:openshift: openshift-release-oversight.

Details

In response to this:

/override ci/prow/e2e-aws-ovn-fips

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@petr-muller
Copy link
Member Author

/test e2e-aws-ovn-fips

@openshift-ci-robot
Copy link

/retest-required

Remaining retests: 0 against base HEAD c786e01 and 0 for PR HEAD c3184c4 in total

@openshift-ci-robot
Copy link

/hold

Revision c3184c4 was retested 3 times: holding

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 1, 2023
@petr-muller
Copy link
Member Author

/hold cancel
/retest

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 1, 2023
@openshift-ci-robot
Copy link

/retest-required

Remaining retests: 0 against base HEAD c786e01 and 2 for PR HEAD c3184c4 in total

1 similar comment
@openshift-ci-robot
Copy link

/retest-required

Remaining retests: 0 against base HEAD c786e01 and 2 for PR HEAD c3184c4 in total

@openshift-ci-robot
Copy link

/retest-required

Remaining retests: 0 against base HEAD 79d9b29 and 1 for PR HEAD c3184c4 in total

@petr-muller
Copy link
Member Author

/retest

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Feb 2, 2023

@petr-muller: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-aws-ovn-single-node c3184c4 link false /test e2e-aws-ovn-single-node
ci/prow/e2e-gcp-ovn-etcd-scaling c3184c4 link false /test e2e-gcp-ovn-etcd-scaling
ci/prow/e2e-azure-ovn-etcd-scaling c3184c4 link false /test e2e-azure-ovn-etcd-scaling
ci/prow/e2e-aws-ovn-single-node-upgrade c3184c4 link false /test e2e-aws-ovn-single-node-upgrade
ci/prow/e2e-aws-ovn-single-node-serial c3184c4 link false /test e2e-aws-ovn-single-node-serial

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@petr-muller
Copy link
Member Author

/retest

@openshift-merge-robot openshift-merge-robot merged commit 73d3250 into openshift:master Feb 2, 2023
@openshift-ci-robot
Copy link

@petr-muller: Some pull requests linked via external trackers have merged:

The following pull requests linked via external trackers have not merged:

These pull request must merge or be unlinked from the Jira bug in order for it to move to the next state. Once unlinked, request a bug refresh with /jira refresh.

Jira Issue OCPBUGS-6503 has not been moved to the MODIFIED state.

Details

In response to this:

#27645 intended to add a guaranteed post-upgrade check but I have overlooked how exactly the polling is implemented and terminated, leading to the post-upgrade check never actually execute.

Previously the test used PollImmediateWithContext for the each-10-minutes check. The ConditionFunc never actually returned true or non-nil err, so the PollImmediateWithContext never terminated by the means of ConditionFunc: it was always terminated by the ctx.Done() that the framework does on finished upgrade (or a test timeout). This means that PollImmediateWithContext always terminated with err=wait.ErrWaitTimeout and the Test method immediately returned, so the "guaranteed" check code is never reached.

Given our ConditionFunc never terminates the polling, we can simplify and use the wait.UntilWithContext instead, which is a simpler version that precisely implements the desired loop (poll until context is done).


During testing of OCPBUGS-5505, it was discovered that even with shortening the CVO cache TTL, CVO may still only update Upgradeable in its sync interval, which may be as high as 4 minutes. Hence the tests needs to wait for that time (I added 5 second buffer on top of that).

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@petr-muller
Copy link
Member Author

/jira refresh

@openshift-ci-robot
Copy link

@petr-muller: All pull requests linked via external trackers have merged:

Jira Issue OCPBUGS-6503 has been moved to the MODIFIED state.

Details

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. jira/severity-moderate Referenced Jira bug's severity is moderate for the branch this PR is targeting. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants