-
Notifications
You must be signed in to change notification settings - Fork 463
Bug 1829923: Fix quorum-guard timeouts #1797
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug 1829923: Fix quorum-guard timeouts #1797
Conversation
Before this change, the quorum-guard `timeoutSeconds` and `failureThreshold` values were left unspecified in the manifest, and were defaulted. The default value for `timeoutSeconds` is 1, while the probe itself enforces a 2 second timeout. This means that in cases where the probe itself should succeed, Kube will consider the probe failed because of the stricter timeout on the probe specification. The effect is the probe sporadically reports false negative outcomes. This change increases the `timeoutSeconds` value to exceed the probe logic's internal timeout so that the probe command is the source of truth with regards to timeouts. This change also makes the `failureThreshold` value explicit, but the default value is preserved because I don't have a clear reason yet to change it.
|
@ironcladlou: This pull request references Bugzilla bug 1829923, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker. 3 validation(s) were run on this bug
DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
lgtm |
|
/lgtm nice one @ironcladlou ! |
|
/assign @kikisdeliveryservice |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: hexfusion, ironcladlou, yuqi-zhang The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/cherry-pick release-4.5 |
|
@ironcladlou: once the present PR merges, I will cherry-pick it on top of release-4.5 in a new PR and assign it to you. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/cherry-pick release-4.4 |
|
@ironcladlou: once the present PR merges, I will cherry-pick it on top of release-4.4 in a new PR and assign it to you. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
@ironcladlou: All pull requests linked via external trackers have merged: openshift/machine-config-operator#1797. Bugzilla bug 1829923 has been moved to the MODIFIED state. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
@ironcladlou: new pull request created: #1798 DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
@ironcladlou: new pull request created: #1799 DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
This didn't actually change the quorum guard spec: https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_machine-config-operator/1797/pull-ci-openshift-machine-config-operator-master-e2e-aws/7596/artifacts/e2e-aws/gather-extra/pods.json |
The new properties are under |
|
🤦 |
Before this change, the quorum-guard
timeoutSecondsandfailureThresholdvalues wereleft unspecified in the manifest, and were defaulted.
The default value for
timeoutSecondsis 1, while the probe itself enforces a 2second timeout. This means that in cases where the probe itself should succeed,
Kube will consider the probe failed because of the stricter timeout on the probe
specification.
The effect is the probe sporadically reports false negative outcomes.
This change increases the
timeoutSecondsvalue to exceed the probe logic'sinternal timeout so that the probe command is the source of truth with regards
to timeouts.
This change also makes the
failureThresholdvalue explicit, but the default valueis preserved because I don't have a clear reason yet to change it.
- What I did
- How to verify it
- Description for the changelog