-
Notifications
You must be signed in to change notification settings - Fork 231
BUG 1858400: [Performance] Lease refresh period for machine-api-controllers is too high, causes heavy writes to etcd at idle #675
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@Danil-Grigorev: This pull request references Bugzilla bug 1858400, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker. 3 validation(s) were run on this bug
DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
@Danil-Grigorev: This pull request references Bugzilla bug 1858400, which is valid. 3 validation(s) were run on this bug
DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
michaelgugino
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like we should also set RenewDeadline https://github.com/kubernetes-sigs/controller-runtime/blob/master/pkg/manager/manager.go#L163 to a sensible value
a405869 to
9ffe27a
Compare
|
Set this on |
Prevent machine controllers from writing in etcd at idle too often by setting 30s retry and 90s deadline on all renewals. BZ 1858403
Prevent machine controllers from writing in etcd at idle too often by setting 30s retry and 90s deadline on all renewals. BZ 1858403
Prevent machine controllers from writing in etcd at idle too often by setting 60s retry and delay on all renewals. BZ 1858403
Prevent machine controllers from writing in etcd at idle too often by setting 60s retry and delay on all renewals. BZ 1858403
9ffe27a to
48d9cce
Compare
|
After some realization, set it on |
|
@Danil-Grigorev: This pull request references Bugzilla bug 1858400, which is valid. 3 validation(s) were run on this bug
DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
michaelgugino
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's do
LeaseDuration = 180 seconds
RenewDeadline = 120 seconds
RetryPeriod = 90 seconds
RenewDeadline needs to be less than LeaseDuration according to the examples.
|
in general i think this is a good patch, but i agree with @michaelgugino that the values should be higher. |
|
Okay, after some more discussion, we determined 120/110/90 might be a better fit. Don't want the leadership to be too long as if the pod gets moved (eg, upgrades) we don't want operations suspended for too long. 120 seconds would be adequate. |
elmiko
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the changes Danil, this looks good to me but i just have a quick question.
| "--v=3", | ||
| "--leader-elect=true", | ||
| "--leader-elect-lease-duration=90s", | ||
| "--leader-elect-lease-duration=120s", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is the only confusing part to me. we set the lease duration in the controller-runtime config options, do we also need to set on the command line?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We use the default if not specified on the CLI. This is useful for development/debugging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ahh, perfect. thanks for the explanation Mike!
michaelgugino
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
| "--v=3", | ||
| "--leader-elect=true", | ||
| "--leader-elect-lease-duration=90s", | ||
| "--leader-elect-lease-duration=120s", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We use the default if not specified on the CLI. This is useful for development/debugging.
elmiko
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/approve
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: elmiko The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
@Danil-Grigorev: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
|
@Danil-Grigorev: Some pull requests linked via external trackers have merged: openshift/machine-api-operator#675, openshift/machine-api-operator#649, openshift/cluster-api-provider-ovirt#56, openshift/cluster-api-provider-openstack#109. The following pull requests linked via external trackers have not merged:
DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Reduce default lease retry rate on
30swhich will prevent from heavy writes into etcd at idle, and constrain renew deadline on90s.Inspired by openshift/cloud-credential-operator#231