Skip to content

Conversation

@ironcladlou
Copy link
Contributor

Before this change, the quorum-guard timeoutSeconds and failureThreshold values were
left unspecified in the manifest, and were defaulted.

The default value for timeoutSeconds is 1, while the probe itself enforces a 2
second timeout. This means that in cases where the probe itself should succeed,
Kube will consider the probe failed because of the stricter timeout on the probe
specification.

The effect is the probe sporadically reports false negative outcomes.

This change increases the timeoutSeconds value to exceed the probe logic's
internal timeout so that the probe command is the source of truth with regards
to timeouts.

This change also makes the failureThreshold value explicit, but the default value
is preserved because I don't have a clear reason yet to change it.

- What I did

- How to verify it

- Description for the changelog

Before this change, the quorum-guard `timeoutSeconds` and `failureThreshold` values were
left unspecified in the manifest, and were defaulted.

The default value for `timeoutSeconds` is 1, while the probe itself enforces a 2
second timeout. This means that in cases where the probe itself should succeed,
Kube will consider the probe failed because of the stricter timeout on the probe
specification.

The effect is the probe sporadically reports false negative outcomes.

This change increases the `timeoutSeconds` value to exceed the probe logic's
internal timeout so that the probe command is the source of truth with regards
to timeouts.

This change also makes the `failureThreshold` value explicit, but the default value
is preserved because I don't have a clear reason yet to change it.
@openshift-ci-robot openshift-ci-robot added the bugzilla/severity-medium Referenced Bugzilla bug's severity is medium for the branch this PR is targeting. label Jun 9, 2020
@openshift-ci-robot
Copy link
Contributor

@ironcladlou: This pull request references Bugzilla bug 1829923, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target release (4.6.0) matches configured target release for branch (4.6.0)
  • bug is in the state ASSIGNED, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)
Details

In response to this:

Bug 1829923: Fix quorum-guard timeouts

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot openshift-ci-robot added the bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. label Jun 9, 2020
@rphillips
Copy link
Contributor

lgtm

@hexfusion
Copy link
Contributor

/lgtm

nice one @ironcladlou !

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Jun 9, 2020
@rphillips
Copy link
Contributor

/assign @kikisdeliveryservice

@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: hexfusion, ironcladlou, yuqi-zhang

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 9, 2020
@ironcladlou
Copy link
Contributor Author

/cherry-pick release-4.5

@openshift-cherrypick-robot

@ironcladlou: once the present PR merges, I will cherry-pick it on top of release-4.5 in a new PR and assign it to you.

Details

In response to this:

/cherry-pick release-4.5

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@ironcladlou
Copy link
Contributor Author

/cherry-pick release-4.4

@openshift-cherrypick-robot

@ironcladlou: once the present PR merges, I will cherry-pick it on top of release-4.4 in a new PR and assign it to you.

Details

In response to this:

/cherry-pick release-4.4

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-merge-robot openshift-merge-robot merged commit 4c70d3a into openshift:master Jun 9, 2020
@openshift-ci-robot
Copy link
Contributor

@ironcladlou: All pull requests linked via external trackers have merged: openshift/machine-config-operator#1797. Bugzilla bug 1829923 has been moved to the MODIFIED state.

Details

In response to this:

Bug 1829923: Fix quorum-guard timeouts

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-cherrypick-robot

@ironcladlou: new pull request created: #1798

Details

In response to this:

/cherry-pick release-4.4

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-cherrypick-robot

@ironcladlou: new pull request created: #1799

Details

In response to this:

/cherry-pick release-4.5

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@ironcladlou
Copy link
Contributor Author

@wking
Copy link
Member

wking commented Jun 11, 2020

This didn't actually change the quorum guard spec...

The new properties are under exec in this PR, when they should have been under the parent readinessProbe.

@ironcladlou
Copy link
Contributor Author

🤦

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. bugzilla/severity-medium Referenced Bugzilla bug's severity is medium for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants