Skip to content

Comments

add check of lease length to allow 60s of kube-apiserver communication disruption#26215

Closed
deads2k wants to merge 1 commit intoopenshift:masterfrom
deads2k:leases
Closed

add check of lease length to allow 60s of kube-apiserver communication disruption#26215
deads2k wants to merge 1 commit intoopenshift:masterfrom
deads2k:leases

Conversation

@deads2k
Copy link
Contributor

@deads2k deads2k commented Jun 9, 2021

We cannot find them perfectly, but we can find those that are obviously broken.

There will need to be a library-go PR.

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 9, 2021
if err := json.Unmarshal([]byte(leaderElection), leaderElectionRecord); err != nil {
o.Expect(err).NotTo(o.HaveOccurred())
}
if leaderElectionRecord.LeaseDurationSeconds < 90 {
Copy link

@romfreiman romfreiman Jun 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use 90 as const.
Didnt we say 60?

@@ -0,0 +1,77 @@
package operators

import (

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we want to start from exclude list so the test will be green?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we want to start from exclude list so the test will be green?

given how far off we are, let's close a few and then add the list. Many ought to be easy.

o.Expect(err).NotTo(o.HaveOccurred())
}
if leaderElectionRecord.LeaseDurationSeconds < 90 {
shortLeases = append(shortLeases, fmt.Sprintf("configmap/%s used by %q, has too short a lease to span 60s kube-apiserver disruption. Try 99s leaseDuration with 13s retryPeriod and a 17s renewDeadline. Be sure you have the graceful release properly wired.", endpoint.Name, leaderElectionRecord.HolderIdentity))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

endpoints, not configmap

endpoints, err := kubeClient.CoreV1().Endpoints("").List(ctx, metav1.ListOptions{})
o.Expect(err).NotTo(o.HaveOccurred())
for _, endpoint := range endpoints.Items {
leaderElection, ok := endpoint.Annotations[resourcelock.LeaderElectionRecordAnnotationKey]
Copy link

@romfreiman romfreiman Jun 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe to have this as an function and call it both for cm and endpoints?

o.Expect(err).NotTo(o.HaveOccurred())
for _, lease := range leases.Items {
if lease.Spec.LeaseDurationSeconds != nil && *lease.Spec.LeaseDurationSeconds < 90 {
shortLeases = append(shortLeases, fmt.Sprintf("configmap/%s used by %q, has too short a lease to span 60s kube-apiserver disruption. Try 99s leaseDuration with 13s retryPeriod and a 17s renewDeadline. Be sure you have the graceful release properly wired.", lease.Name, lease.Spec.HolderIdentity))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

leases, not configmaps

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jun 10, 2021

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: deads2k

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jun 25, 2021

@deads2k: The following tests failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
ci/prow/e2e-gcp-disruptive 5b0e26a link /test e2e-gcp-disruptive
ci/prow/e2e-gcp 5b0e26a link /test e2e-gcp
ci/prow/e2e-metal-ipi-ovn-ipv6 5b0e26a link /test e2e-metal-ipi-ovn-ipv6
ci/prow/e2e-gcp-upgrade 5b0e26a link /test e2e-gcp-upgrade
ci/prow/e2e-aws-fips 5b0e26a link /test e2e-aws-fips
ci/prow/e2e-aws-serial 5b0e26a link /test e2e-aws-serial
ci/prow/e2e-aws-disruptive 5b0e26a link /test e2e-aws-disruptive
ci/prow/e2e-aws-jenkins 5b0e26a link /test e2e-aws-jenkins

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.


shortLeases := []string{}

configMaps, err := kubeClient.CoreV1().ConfigMaps("").List(ctx, metav1.ListOptions{})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about using a paginated list instead?

}
}

leases, err := kubeClient.CoordinationV1().Leases("").List(ctx, metav1.ListOptions{})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it looks like kube-apiservers use the lease, we could filter them out

@romfreiman
Copy link

@deads2k is this test will run as part of the fips suite?

@romfreiman
Copy link

@openshift-bot
Copy link
Contributor

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 18, 2021
@romfreiman
Copy link

@deads2k dont we want to merge this test?

@romfreiman
Copy link

/remove-lifecycle stale

@openshift-ci openshift-ci bot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 31, 2021
@openshift-bot
Copy link
Contributor

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 29, 2022
@openshift-bot
Copy link
Contributor

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

@openshift-ci openshift-ci bot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 28, 2022
@openshift-ci openshift-ci bot added the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Feb 28, 2022
@openshift-bot
Copy link
Contributor

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

@openshift-ci openshift-ci bot closed this Mar 30, 2022
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 30, 2022

@openshift-bot: Closed this PR.

Details

In response to this:

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants