UPSTREAM: 80004: Prefer to delete doubled-up pods of a ReplicaSet by Miciah · Pull Request #23806 · openshift/origin

Miciah · 2019-09-16T21:44:04Z

When scaling down a ReplicaSet, delete doubled up replicas first, where a "doubled up replica" is defined as one that is on the same node as an active replica belonging to a related ReplicaSet. ReplicaSets are considered "related" if they have a common controller (typically a Deployment).

My intention with this change is to make a rolling update of a Deployment scale down the old ReplicaSet as it scales up the new ReplicaSet by deleting pods from the old ReplicaSet that are colocated with ready pods of the new ReplicaSet. This change in the behavior of rolling updates can be combined with pod affinity rules to preserve the locality of a Deployment's pods over rollout.

A specific scenario that benefits from this change is when a Deployment's pods are exposed by a Service that has type "LoadBalancer" and external traffic policy "Local". In this scenario, the load balancer uses health checks to determine whether it should forward traffic for the Service to a particular node. If the node has no local endpoints for the Service, the health check will fail for that node. Eventually, the load balancer will stop forwarding traffic to that node. In the meantime, the service proxy drops traffic for that Service. Thus, in order to reduce risk of dropping traffic during a rolling update, it is desirable preserve node locality of endpoints.

vendor/k8s.io/kubernetes/pkg/controller/controller_utils.go (ActivePodsWithRanks): New type to
sort pods using a given ranking.
vendor/k8s.io/kubernetes/pkg/controller/replicaset/replica_set.go (getReplicaSetsWithSameController): New method. Given a ReplicaSet, return all ReplicaSets that have the same owner.
(manageReplicas): Call getIndirectlyRelatedPods, and pass its result to getPodsToDelete.
(getIndirectlyRelatedPods): New method. Given a ReplicaSet, return all pods that are owned by any ReplicaSet with the same owner.
(getPodsToDelete): Add an argument for related pods. Use related pods and ActivePodsWithRanks to take into account whether the pod is doubled up.
vendor/k8s.io/kubernetes/pkg/controller/replicaset/replica_set_test.go (newReplicaSet): Set OwnerReferences on the ReplicaSet.
(newPod): Set a unique UID on the pod.
(byName): New type to sort pods by name.
(TestRelatedPodsLookup): New test for getIndirectlyRelatedPods.
(TestGetPodsToDelete): Add a "ready and colocated with another ready pod vs not colocated, diff < len(pods)" test case to verify that a doubled-up pod gets preferred for deletion. Augment the "various pod phases and conditions, diff = len(pods)" test case to ensure that scale-down still selects doubled-up pods if there are not enough other pods to scale down. Augment the "various pod phases and conditions, diff < len(pods)" test case to ensure that not-ready pods are preferred over ready but doubled-up pods.
vendor/k8s.io/kubernetes/pkg/controller/replicaset/BUILD: Regenerate.

When scaling down a ReplicaSet, delete doubled up replicas first, where a "doubled up replica" is defined as one that is on the same node as an active replica belonging to a related ReplicaSet. ReplicaSets are considered "related" if they have a common controller (typically a Deployment). The intention of this change is to make a rolling update of a Deployment scale down the old ReplicaSet as it scales up the new ReplicaSet by deleting pods from the old ReplicaSet that are colocated with ready pods of the new ReplicaSet. This change in the behavior of rolling updates can be combined with pod affinity rules to preserve the locality of a Deployment's pods over rollout. A specific scenario that benefits from this change is when a Deployment's pods are exposed by a Service that has type "LoadBalancer" and external traffic policy "Local". In this scenario, the load balancer uses health checks to determine whether it should forward traffic for the Service to a particular node. If the node has no local endpoints for the Service, the health check will fail for that node. Eventually, the load balancer will stop forwarding traffic to that node. In the meantime, the service proxy drops traffic for that Service. Thus, in order to reduce risk of dropping traffic during a rolling update, it is desirable preserve node locality of endpoints. * vendor/k8s.io/kubernetes/pkg/controller/controller_utils.go (ActivePodsWithRanks): New type to sort pods using a given ranking. * vendor/k8s.io/kubernetes/pkg/controller/controller_utils_test.go (TestSortingActivePodsWithRanks): New test for ActivePodsWithRanks. * vendor/k8s.io/kubernetes/pkg/controller/replicaset/replica_set.go (getReplicaSetsWithSameController): New method. Given a ReplicaSet, return all ReplicaSets that have the same owner. (manageReplicas): Call getIndirectlyRelatedPods, and pass its result to getPodsToDelete. (getIndirectlyRelatedPods): New method. Given a ReplicaSet, return all pods that are owned by any ReplicaSet with the same owner. (getPodsToDelete): Add an argument for related pods. Use related pods and the new getPodsRankedByRelatedPodsOnSameNode function to take into account whether the pod is doubled up when sorting pods for deletion. (getPodsRankedByRelatedPodsOnSameNode): New function. Return an ActivePodsWithRanks value that wraps the given slice of pods and computes ranks where each pod's rank is equal to the number of active related pods that are colocated on the same node. * vendor/k8s.io/kubernetes/pkg/controller/replicaset/replica_set_test.go (newReplicaSet): Set OwnerReferences on the ReplicaSet. (newPod): Set a unique UID on the pod. (byName): New type to sort pods by name. (TestRelatedPodsLookup): New test for getIndirectlyRelatedPods. (TestGetPodsToDelete): Augment the "various pod phases and conditions, diff = len(pods)" test case to ensure that scale-down still selects doubled-up pods if there are not enough other pods to scale down. Add a "various pod phases and conditions, diff = len(pods), relatedPods empty" test case to verify that getPodsToDelete works even if related pods could not be determined. Add a "ready and colocated with another ready pod vs not colocated, diff < len(pods)" test case to verify that a doubled-up pod gets preferred for deletion. Augment the "various pod phases and conditions, diff < len(pods)" test case to ensure that not-ready pods are preferred over ready but doubled-up pods. * vendor/k8s.io/kubernetes/pkg/controller/replicaset/BUILD: Regenerate. * vendor/k8s.io/kubernetes/test/e2e/apps/deployment.go (testRollingUpdateDeploymentWithLocalTrafficLoadBalancer): New end-to-end test. Create a deployment with a rolling update strategy and affinity rules and a load balancer with "Local" external traffic policy, and verify that set of nodes with local endponts for the service remains unchanged during rollouts. (setAffinity): New helper, used by testRollingUpdateDeploymentWithLocalTrafficLoadBalancer. * vendor/k8s.io/kubernetes/test/e2e/apps/types.go (AgnhostImageName) (AgnhostImage): New constants for the agnhost image. * vendor/k8s.io/kubernetes/test/e2e/framework/service/jig.go (GetEndpointNodes): Factor building the set of node names out... (GetEndpointNodeNames): ...into this new method.

Miciah · 2019-10-18T19:23:23Z

Looks like problems pulling from the internal registry.
/test e2e-aws-upgrade

ironcladlou · 2019-10-23T23:47:11Z

@smarterclayton PTAL, we need to get this in soon.

ironcladlou · 2019-10-23T23:47:30Z

/lgtm

smarterclayton · 2019-10-24T03:15:23Z

/retest

smarterclayton · 2019-10-24T03:15:43Z

/approve

openshift-ci-robot · 2019-10-24T03:19:14Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ironcladlou, Miciah, smarterclayton

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~vendor/OWNERS~~ [smarterclayton]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

openshift-bot · 2019-10-24T03:50:47Z

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2019-10-24T05:34:33Z

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2019-10-24T08:50:44Z

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2019-10-24T10:20:07Z

/retest

Please review the full test history for this PR and help us cut down flakes.

tnozicka · 2019-10-29T12:48:58Z

this broke https://prow.svc.ci.openshift.org/job-history/origin-ci-test/logs/release-openshift-openshift-ansible-e2e-aws-scaleup-rhel7-4.3

p0lyn0mial · 2019-10-29T13:01:44Z

I lost that commit during rebase, let me know if we want to bring it back.

Miciah · 2019-10-29T17:30:25Z

this broke https://prow.svc.ci.openshift.org/job-history/origin-ci-test/logs/release-openshift-openshift-ansible-e2e-aws-scaleup-rhel7-4.3

Can you be more specific? What failure are you saying this PR caused, and why?

Looking at a half dozen failures, the most common symptom that I see is that new nodes do not become ready but rather report, "runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: Missing CNI default network". Is that the error you are saying this PR caused? How did you trace it to this PR?

Miciah · 2019-10-29T22:42:01Z

Ah, are you referring to https://bugzilla.redhat.com/show_bug.cgi?id=1765756?

tnozicka · 2019-10-30T08:33:42Z

Ah, are you referring to https://bugzilla.redhat.com/show_bug.cgi?id=1765756?

correct, https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-openshift-ansible-e2e-aws-scaleup-rhel7-4.3/184#1:build-log.txt%3A5981

Miciah · 2019-10-30T18:00:19Z

#24057 brings this commit back and adds a fix for https://bugzilla.redhat.com/show_bug.cgi?id=1765756.

openshift-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Sep 16, 2019

openshift-ci-robot requested review from deads2k and smarterclayton September 16, 2019 21:44

Miciah force-pushed the UPSTREAM-80004-prefer-to-delete-doubled-up-pods-of-a-replicaset branch from 7a53570 to 507bcd8 Compare October 2, 2019 02:00

openshift-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Oct 2, 2019

Miciah force-pushed the UPSTREAM-80004-prefer-to-delete-doubled-up-pods-of-a-replicaset branch 4 times, most recently from cb2160e to 3a71e71 Compare October 7, 2019 16:47

Miciah force-pushed the UPSTREAM-80004-prefer-to-delete-doubled-up-pods-of-a-replicaset branch from 3a71e71 to aa70dc2 Compare October 7, 2019 22:20

openshift-ci-robot assigned ironcladlou Oct 23, 2019

openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Oct 23, 2019

openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 24, 2019

openshift-merge-robot merged commit 386cd8c into openshift:master Oct 24, 2019

tnozicka referenced this pull request Oct 29, 2019

bump(origin-4.3-kubernetes-1.16.2)

4756173

p0lyn0mial mentioned this pull request Oct 29, 2019

UPSTREAM: 80004: Prefer to delete doubled-up pods of a ReplicaSet #24047

Closed

Miciah mentioned this pull request Nov 1, 2019

Bug 1765756: UPSTREAM: 80004: Prefer to delete doubled-up pods of a ReplicaSet #24057

Merged

Conversation

Miciah commented Sep 16, 2019

Uh oh!

Miciah commented Oct 18, 2019

Uh oh!

ironcladlou commented Oct 23, 2019

Uh oh!

ironcladlou commented Oct 23, 2019

Uh oh!

smarterclayton commented Oct 24, 2019

Uh oh!

smarterclayton commented Oct 24, 2019

Uh oh!

openshift-ci-robot commented Oct 24, 2019

Uh oh!

openshift-bot commented Oct 24, 2019

Uh oh!

openshift-bot commented Oct 24, 2019

Uh oh!

openshift-bot commented Oct 24, 2019

Uh oh!

openshift-bot commented Oct 24, 2019

Uh oh!

tnozicka commented Oct 29, 2019

Uh oh!

p0lyn0mial commented Oct 29, 2019

Uh oh!

Miciah commented Oct 29, 2019

Uh oh!

Miciah commented Oct 29, 2019

Uh oh!

tnozicka commented Oct 30, 2019

Uh oh!

Miciah commented Oct 30, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants