Skip to content

Conversation

@vrutkovs
Copy link
Contributor

- What I did
Cherrypicked #1552 and #1648 to release-4.4

- How to verify it
Run install / upgrade. etcd-quorum-guard pods should not fail or leak memory

- Description for the changelog
Cherrypicked etcd-quorum-guard refactoring on 4.4

Vadim Rutkovsky added 3 commits April 15, 2020 14:10
Prepare the script once so that readinessProbe command would not look for certificate path every time. This also uses downward API to avoid looking for certificates and fetch them from defined locations
Avoid setting hostNetwork for etcd-quorum-guard
Ensure `curl` command has NSS_SDB_USE_CACHE env var set
@openshift-ci-robot openshift-ci-robot added the bugzilla/invalid-bug Indicates that a referenced Bugzilla bug is invalid for the branch this PR is targeting. label Apr 15, 2020
@openshift-ci-robot
Copy link
Contributor

@vrutkovs: This pull request references Bugzilla bug 1824137, which is invalid:

  • expected dependent Bugzilla bug 1823677 to be in one of the following states: VERIFIED, RELEASE_PENDING, CLOSED (ERRATA), but it is ON_QA instead

Comment /bugzilla refresh to re-evaluate validity if changes to the Bugzilla bug are made, or edit the title of this pull request to link to a different bug.

Details

In response to this:

Bug 1824137: cherrypick etcd-quorum-guard refactoring to 4.4

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@yuqi-zhang
Copy link
Contributor

/retest
/approve

@openshift-ci-robot
Copy link
Contributor

@yuqi-zhang: The /retest command does not accept any targets.
The following commands are available to trigger jobs:

  • /test e2e-aws
  • /test e2e-aws-disruptive
  • /test e2e-aws-scaleup-rhel7
  • /test e2e-gcp-op
  • /test e2e-gcp-upgrade
  • /test e2e-metal-ipi
  • /test e2e-openstack
  • /test e2e-ovirt
  • /test e2e-vsphere
  • /test images
  • /test unit
  • /test verify

Use /test all to run all jobs.

Details

In response to this:

/retest
/approve

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 15, 2020
@vrutkovs
Copy link
Contributor Author

/retest

@kikisdeliveryservice
Copy link
Contributor

/bugzilla refresh

@openshift-ci-robot
Copy link
Contributor

@kikisdeliveryservice: This pull request references Bugzilla bug 1824137, which is invalid:

  • expected dependent Bugzilla bug 1823677 to be in one of the following states: VERIFIED, RELEASE_PENDING, CLOSED (ERRATA), but it is ON_QA instead

Comment /bugzilla refresh to re-evaluate validity if changes to the Bugzilla bug are made, or edit the title of this pull request to link to a different bug.

Details

In response to this:

/bugzilla refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@kikisdeliveryservice kikisdeliveryservice changed the title Bug 1824137: cherrypick etcd-quorum-guard refactoring to 4.4 [release-4.4] Bug 1824137: cherrypick etcd-quorum-guard refactoring to 4.4 Apr 16, 2020
@kikisdeliveryservice
Copy link
Contributor

/retest

@kikisdeliveryservice
Copy link
Contributor

/assign @hexfusion

@runcom
Copy link
Member

runcom commented Apr 17, 2020

/retest
/lgtm

@openshift-ci-robot
Copy link
Contributor

@runcom: The /retest command does not accept any targets.
The following commands are available to trigger jobs:

  • /test e2e-aws
  • /test e2e-aws-disruptive
  • /test e2e-aws-scaleup-rhel7
  • /test e2e-gcp-op
  • /test e2e-gcp-upgrade
  • /test e2e-metal-ipi
  • /test e2e-openstack
  • /test e2e-ovirt
  • /test e2e-vsphere
  • /test images
  • /test unit
  • /test verify

Use /test all to run all jobs.

Details

In response to this:

/retest
/lgtm

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Apr 17, 2020
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: runcom, vrutkovs, yuqi-zhang

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

4 similar comments
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/bugzilla refresh

Recalculating validity in case the underlying Bugzilla bug has changed.

@openshift-ci-robot
Copy link
Contributor

@openshift-bot: This pull request references Bugzilla bug 1824137, which is invalid:

  • expected dependent Bugzilla bug 1823677 to be in one of the following states: VERIFIED, RELEASE_PENDING, CLOSED (ERRATA), but it is ON_QA instead

Comment /bugzilla refresh to re-evaluate validity if changes to the Bugzilla bug are made, or edit the title of this pull request to link to a different bug.

Details

In response to this:

/bugzilla refresh

Recalculating validity in case the underlying Bugzilla bug has changed.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@mfojtik mfojtik added the cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. label Apr 20, 2020
@eparis
Copy link
Member

eparis commented Apr 20, 2020

/hold

@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 20, 2020
@eparis
Copy link
Member

eparis commented Apr 20, 2020

This PR conflates 2 distinct issue.

  1. a cosmetic issue about traffic counts that, as best I can tell, hurts nothing
  2. an oddity in NSS/kernel negative dentries which does have a negative impact.

Can we at least get this PR and bug to talk (primarily) about #2 instead of #1?

Let's not combine unrelated things in backports in the future, please. i know it's more 'paperwork' but please don't.

Also, how does moving from hostNetwork to SDN affect things if the SDN is down? previously I didn't need SDN to get quorum guard running. Now I will. What affects is that going to have? How do we test is this change is detrimental?

After that you may remove the hold.

Please also ping Bob Relyea and Nikos about the upstream nss/kernel issue to get it back on their radar.

@vrutkovs
Copy link
Contributor Author

a cosmetic issue about traffic counts that, as best I can tell, hurts nothing

Correct

an oddity in NSS/kernel negative dentries which does have a negative impact.

There is no negative impact of this in 4.4 (there has been in 4.5). export NSS... needs to be moved in the script which is being run during health checks.

Also, how does moving from hostNetwork to SDN affect things if the SDN is down?

etcd-quorum-guard would be restarted if SDN on master is down. If its down then API container is not reachable anyway

What affects is that going to have? How do we test is this change is detrimental?

I tested it with running minor and major upgrade - SDN gets disrupted and etcd-quorum-guard survives that.

NSS env var is orthogonal to this PR - it merely moves it in the script which runs curl as this part is being refactored

@hexfusion
Copy link
Contributor

Given the above, while I see this as a possible improvement in networking metrics moving quorum-guard to SDN. But I feel like we are missing coverage on full understanding on corner cases. With the CI data that exists, we should be able to better understand this.

So let's spend more time in that validation and this can drop into z-stream if we see the net gain.

@eparis eparis removed the cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. label Apr 20, 2020
@ashcrow
Copy link
Member

ashcrow commented Apr 22, 2020

/retest

@sdodson
Copy link
Member

sdodson commented Apr 24, 2020

/bugzilla refresh
(to pick up retargetting to 4.4.z)

@openshift-ci-robot openshift-ci-robot added bugzilla/invalid-bug Indicates that a referenced Bugzilla bug is invalid for the branch this PR is targeting. and removed bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. labels Apr 24, 2020
@openshift-ci-robot
Copy link
Contributor

@sdodson: This pull request references Bugzilla bug 1825976, which is invalid:

  • expected the bug to target the "4.4.0" release, but it targets "4.4.z" instead

Comment /bugzilla refresh to re-evaluate validity if changes to the Bugzilla bug are made, or edit the title of this pull request to link to a different bug.

Details

In response to this:

/bugzilla refresh
(to pick up retargetting to 4.4.z)

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot openshift-ci-robot added bugzilla/severity-high Referenced Bugzilla bug's severity is high for the branch this PR is targeting. and removed bugzilla/high labels Apr 29, 2020
@runcom
Copy link
Member

runcom commented May 7, 2020

/bugzilla refresh

@openshift-ci-robot openshift-ci-robot added the bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. label May 7, 2020
@openshift-ci-robot
Copy link
Contributor

@runcom: This pull request references Bugzilla bug 1825976, which is valid.

6 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target release (4.4.z) matches configured target release for branch (4.4.z)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)
  • dependent bug Bugzilla bug 1825967 is in the state VERIFIED, which is one of the valid states (VERIFIED, RELEASE_PENDING, CLOSED (ERRATA))
  • dependent Bugzilla bug 1825967 targets the "4.5.0" release, which is one of the valid target releases: 4.5.0, 4.5.z
  • bug has dependents
Details

In response to this:

/bugzilla refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot openshift-ci-robot removed the bugzilla/invalid-bug Indicates that a referenced Bugzilla bug is invalid for the branch this PR is targeting. label May 7, 2020
@runcom
Copy link
Member

runcom commented May 7, 2020

@hexfusion bz is now valid, pending your hold and approval to get in for the z stream

@LorbusChris
Copy link
Contributor

/cherry-pick fcos

@openshift-cherrypick-robot

@LorbusChris: once the present PR merges, I will cherry-pick it on top of fcos in a new PR and assign it to you.

Details

In response to this:

/cherry-pick fcos

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@LorbusChris
Copy link
Contributor

/retest

3 similar comments
@vrutkovs
Copy link
Contributor Author

/retest

@ashcrow
Copy link
Member

ashcrow commented May 11, 2020

/retest

@sinnykumari
Copy link
Contributor

/retest

@openshift-ci-robot
Copy link
Contributor

@vrutkovs: The following test failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
ci/prow/e2e-gcp-upgrade 6836bb6 link /test e2e-gcp-upgrade

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@sdodson
Copy link
Member

sdodson commented May 13, 2020

My understanding is that our intent is to close this and revert the upstream change. As such not giving cherry-pick-approved despite this bug being marked high severity.

@hexfusion
Copy link
Contributor

/close

After conversations with @eparis this is not what we want. Essentially this would run quorum-guard checks through SDN.

@openshift-ci-robot
Copy link
Contributor

@hexfusion: Closed this PR.

Details

In response to this:

/close

After conversations with @eparis this is not what we want. Essentially this would run quorum-guard checks through SDN.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. bugzilla/severity-high Referenced Bugzilla bug's severity is high for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.