Skip to content

Conversation

@Miciah
Copy link
Contributor

@Miciah Miciah commented Feb 11, 2021

Set CoreDNS readiness probe period and timeout to 3 seconds

Change the readiness probe for CoreDNS from using a 10-second period and timeout to using a 3-second period and timeout in order to reduce the time to remove the endpoint if CoreDNS becomes unresponsive.

  • assets/dns/daemonset.yaml: Set readiness probe period and timeout to 3 seconds.
  • pkg/manifests/bindata.go: Regenerate.

Reconcile all readiness probe parameters

Check all parameters of the CoreDNS daemonset's readiness probe, and update the daemonset if any of the probe's parameters differ from the expected values.

Before this commit, the DNS controller only reconciled the readiness probe's handler and ignored changes to other parameters of the probe.

  • pkg/operator/controller/controller_dns_daemonset.go (daemonsetConfigChanged): Check if any of the readiness probe's parameters changed and update them if they did.
  • pkg/operator/controller/controller_dns_daemonset_test.go (TestDaemonsetConfigChanged): Add test case to verify that daemonsetConfigChanged detects changes to the readiness probe's periodSeconds parameter.

Change the readiness probe for CoreDNS from using a 10-second period and
timeout to using a 3-second period and timeout in order to reduce the time
to remove the endpoint if CoreDNS becomes unresponsive.

This commit is related to bug 1919737.

https://bugzilla.redhat.com/show_bug.cgi?id=1919737

* assets/dns/daemonset.yaml: Set readiness probe period and timeout to 3
seconds.
* pkg/manifests/bindata.go: Regenerate.
Check all parameters of the CoreDNS daemonset's readiness probe, and update
the daemonset if any of the probe's parameters differ from the expected
values.

Before this commit, the DNS controller only reconciled the readiness
probe's handler and ignored changes to other parameters of the probe.

* pkg/operator/controller/controller_dns_daemonset.go
(daemonsetConfigChanged): Check if any of the readiness probe's parameters
changed and update them if they did.
* pkg/operator/controller/controller_dns_daemonset_test.go
(TestDaemonsetConfigChanged): Add test case to verify that
daemonsetConfigChanged detects changes to the readiness probe's
periodSeconds parameter.
@openshift-ci-robot
Copy link
Contributor

@Miciah: This pull request references Bugzilla bug 1919737, which is valid. The bug has been updated to refer to the pull request using the external bug tracker.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target release (4.8.0) matches configured target release for branch (4.8.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)
Details

In response to this:

Bug 1919737: Set CoreDNS readiness probe period and timeout each to 3 seconds

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot openshift-ci-robot added bugzilla/severity-urgent Referenced Bugzilla bug's severity is urgent for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. labels Feb 11, 2021
@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 11, 2021
@knobunc
Copy link

knobunc commented Feb 11, 2021

I like this. I'd lgtm, but want other people to consider too.

@Miciah
Copy link
Contributor Author

Miciah commented Feb 12, 2021

I see some lookup failures in the last e2e-upgrade CI job run, and pods.json shows that one of the DNS pods was oom-killed. However, several other pods also report having been oom-killed, so DNS may have been a victim and not a culprit here.
/test e2e-upgrade

Meanwhile, must-gather failed on the last e2e-aws CI job run because DNS wasn't working, which is concerning. Must-gather is known to be flaky, but the DNS failures are concerning, so let's keep an eye on that.
/test e2e-aws

@Miciah
Copy link
Contributor Author

Miciah commented Feb 12, 2021

e2e-upgrade passed, and e2e-aws failed with errors related to sandbox initialization and nothing obviously related to DNS.
/test e2e-aws

@sgreene570
Copy link
Contributor

/lgtm
looks good.

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Feb 12, 2021
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Miciah, sgreene570

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@Miciah
Copy link
Contributor Author

Miciah commented Feb 12, 2021

/cherry-pick release-4.7

@openshift-cherrypick-robot

@Miciah: once the present PR merges, I will cherry-pick it on top of release-4.7 in a new PR and assign it to you.

Details

In response to this:

/cherry-pick release-4.7

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@Miciah
Copy link
Contributor Author

Miciah commented Feb 12, 2021

/cherry-pick release-4.6

@openshift-cherrypick-robot

@Miciah: once the present PR merges, I will cherry-pick it on top of release-4.6 in a new PR and assign it to you.

Details

In response to this:

/cherry-pick release-4.6

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-merge-robot openshift-merge-robot merged commit cd9190c into openshift:master Feb 12, 2021
@openshift-ci-robot
Copy link
Contributor

@Miciah: Some pull requests linked via external trackers have merged:

The following pull requests linked via external trackers have not merged:

These pull request must merge or be unlinked from the Bugzilla bug in order for it to move to the next state. Once unlinked, request a bug refresh with /bugzilla refresh.

Bugzilla bug 1919737 has not been moved to the MODIFIED state.

Details

In response to this:

Bug 1919737: Set CoreDNS readiness probe period and timeout each to 3 seconds

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-cherrypick-robot

@Miciah: new pull request created: #235

Details

In response to this:

/cherry-pick release-4.7

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-cherrypick-robot

@Miciah: #234 failed to apply on top of branch "release-4.6":

Applying: Set CoreDNS readiness probe period and timeout to 3
Using index info to reconstruct a base tree...
M	assets/dns/daemonset.yaml
M	pkg/manifests/bindata.go
Falling back to patching base and 3-way merge...
Auto-merging pkg/manifests/bindata.go
CONFLICT (content): Merge conflict in pkg/manifests/bindata.go
Auto-merging assets/dns/daemonset.yaml
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
Patch failed at 0001 Set CoreDNS readiness probe period and timeout to 3
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".

Details

In response to this:

/cherry-pick release-4.6

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. bugzilla/severity-urgent Referenced Bugzilla bug's severity is urgent for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants