-
Notifications
You must be signed in to change notification settings - Fork 463
Bug 1825976: cherrypick etcd-quorum-guard refactoring to 4.4 #1655
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug 1825976: cherrypick etcd-quorum-guard refactoring to 4.4 #1655
Conversation
Prepare the script once so that readinessProbe command would not look for certificate path every time. This also uses downward API to avoid looking for certificates and fetch them from defined locations
Avoid setting hostNetwork for etcd-quorum-guard
Ensure `curl` command has NSS_SDB_USE_CACHE env var set
|
@vrutkovs: This pull request references Bugzilla bug 1824137, which is invalid:
Comment DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/retest |
|
@yuqi-zhang: The
Use DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/retest |
|
/bugzilla refresh |
|
@kikisdeliveryservice: This pull request references Bugzilla bug 1824137, which is invalid:
Comment DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/retest |
|
/assign @hexfusion |
|
/retest |
|
@runcom: The
Use DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: runcom, vrutkovs, yuqi-zhang The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
4 similar comments
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
/bugzilla refresh Recalculating validity in case the underlying Bugzilla bug has changed. |
|
@openshift-bot: This pull request references Bugzilla bug 1824137, which is invalid:
Comment DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/hold |
|
This PR conflates 2 distinct issue.
Can we at least get this PR and bug to talk (primarily) about #2 instead of #1? Let's not combine unrelated things in backports in the future, please. i know it's more 'paperwork' but please don't. Also, how does moving from hostNetwork to SDN affect things if the SDN is down? previously I didn't need SDN to get quorum guard running. Now I will. What affects is that going to have? How do we test is this change is detrimental? After that you may remove the hold. Please also ping Bob Relyea and Nikos about the upstream nss/kernel issue to get it back on their radar. |
Correct
There is no negative impact of this in 4.4 (there has been in 4.5).
etcd-quorum-guard would be restarted if SDN on master is down. If its down then API container is not reachable anyway
I tested it with running minor and major upgrade - SDN gets disrupted and etcd-quorum-guard survives that. NSS env var is orthogonal to this PR - it merely moves it in the script which runs curl as this part is being refactored |
|
Given the above, while I see this as a possible improvement in networking metrics moving quorum-guard to SDN. But I feel like we are missing coverage on full understanding on corner cases. With the CI data that exists, we should be able to better understand this. So let's spend more time in that validation and this can drop into z-stream if we see the net gain. |
|
/retest |
|
/bugzilla refresh |
|
@sdodson: This pull request references Bugzilla bug 1825976, which is invalid:
Comment DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/bugzilla refresh |
|
@runcom: This pull request references Bugzilla bug 1825976, which is valid. 6 validation(s) were run on this bug
DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
@hexfusion bz is now valid, pending your hold and approval to get in for the z stream |
|
/cherry-pick fcos |
|
@LorbusChris: once the present PR merges, I will cherry-pick it on top of fcos in a new PR and assign it to you. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/retest |
3 similar comments
|
/retest |
|
/retest |
|
/retest |
|
@vrutkovs: The following test failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
|
My understanding is that our intent is to close this and revert the upstream change. As such not giving cherry-pick-approved despite this bug being marked high severity. |
|
/close After conversations with @eparis this is not what we want. Essentially this would run quorum-guard checks through SDN. |
|
@hexfusion: Closed this PR. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
- What I did
Cherrypicked #1552 and #1648 to release-4.4
- How to verify it
Run install / upgrade. etcd-quorum-guard pods should not fail or leak memory
- Description for the changelog
Cherrypicked etcd-quorum-guard refactoring on 4.4