Enable deployment strategies to tolerate concurrent external scaling events #14062

ironcladlou · 2015-09-16T18:39:58Z

The rolling updater should be tolerant of external scaling events concurrent with the update, but lacks the required context to reevaluate its internal state. As a result, updates can conflict with manual scaling and the autoscaler in unpredictable ways.

The rolling updater computes the desired replica count for the target replication controller only once during initialization based on a snapshot of the the target replication controller state. If during the rolling update the target replication controller spec.replicas is modified externally (via scale operations), the rolling updater is unable to deduce the intent of the replica count change in order to recompute the target desired replica count for the replication controller.

A new replication controller field such as spec.trendingToReplicas could be added and considered a formalization of the kubectl.kubernetes.io/desired-replicas annotation. This field would provide a means for deployment strategies and scaling systems to set a desired replica count for the replication controller itself in the context of replication controller transitions (such as a rolling update).

The rolling updater (and any other strategies) would drive the replication controller spec.replicas towards spec.trendingToReplicas. Any changes to spec.trendingToReplicas could safely result in recalculating the scaling internals.

Scaling systems and other API calls which ultimately want to mutate spec.replicas could then work safely with deployment strategies: If spec.trendingToReplicas is set, update spec.trendingToReplicas and cause rolling updates (or other deployment strategies) to adapt. If spec.trendingToReplicas is unset, directly update the RC spec.replicas.

cc @nikhiljindal @bgrant0607 @smarterclayton @Kargakis

The text was updated successfully, but these errors were encountered:

bgrant0607 · 2015-09-16T18:44:36Z

Related: #11874

bgrant0607 · 2015-09-16T22:53:45Z

Also relevant:
#2863 (comment)
#1743
#1629

In the case of Deployment, the desired replicas needn't be stored in any of the underlying ReplicationControllers.

If one were to attach the desired replicas to a single RC, then one needs to deal with the case of creating a new RC with a new desired replicas while an existing rollout was in flight.

A new field would mean that a newer client wouldn't work with older apiservers, and the new rolling update wouldn't be compatible with older clients performing scaling.

Keeping track of changes made by rolling update instead of the desired replicas count would enable desired replicas to be calculated from the underlying RCs in the presence of simultaneous scaling. Here's an example to see how complicated this would be.

Start with 10 replicas in RC_orig.
Start rolling update with MaxUnavailable = 2, MaxSurge = 3. RC_new has 0 replicas. RC_orig and RC_new both have deltas of 0.
Increase RC_new to 3 replicas. Its delta is +3.
Decrease RC_orig to 8 replicas. Its delta is -2.
Increase RC_new to 5 replicas. Its delta is +5. The net delta can be computed by summing the deltas: +5 + -2 = 3. The number of available replicas is determined by monitoring available replicas from all RCs, currently 8. Min. available replicas can be computed by summing the number of replicas, subtracting the net delta and subtracting MaxUnavailable: 5 + 8 - 3 - 2 = 8.
RC_orig is still the larger RC, so let's say it's scaled to 9 replicas, which increased the total desired replicas (not explicitly recorded) to 11. Its delta is still -2. Now min. available replicas is 5 + 9 - 3 - 2 = 9, current available replicas is still 8, and the current surge is equal to the net delta, 3. So the update needs to wait. Seems reasonable to me.
When, say, 2 of the new replicas become available, decrease RC_orig from 9 to 7 replicas. Its delta becomes -4. The net delta is +5 + -4 = 1.
Let's say the client is killed at this point. In the meantime, the remaining pods from RC_new become available, as determined by Ready transition times.
The update is resumed. RC_new has 5 replicas and delta +5. RC_orig has 7 replicas and delta -4. Net delta is +5 - 4 = 1. Number of replicas is 5 + 7 = 12, which are all available. Current surge is still 1, which makes sense, since 12 - 1 = 11. RC_new can be updated to 7 replicas, delta +7, increasing the net delta to 3 again.
Number available is 12 and min. number available is 7 + 7 - 3 - 2 = 9. RC_orig can be updated to 4 replicas, delta -7. Number available is 9. Net delta is +7 + -7 = 0. Total replicas is 7 + 4 = 11, so that matches.
RC_new can be updated to 10 replicas, delta +10. Total replicas is 14. Net delta is +10 + -7 = 3.
Let's say 2 more replicas become available. RC_orig can be updated to 2 replicas, delta -9. Net delta is +10 + -9 = 1. Total replicas is 10 + 2 = 12. Available replicas is still 9.
RC_new can be updated to 11 replicas, which is total replicas - net delta, and delta +11.
When 2 more replicas become available, RC_orig can be updated to 0.

Done.

Seems to work.

ironcladlou · 2015-09-17T13:58:06Z

Looks like you're right; delta tracking would do the trick. Thanks for the thorough analysis.

nikhiljindal · 2015-09-21T18:36:06Z

How are we keeping track of the deltas? Annotations on the RCs?

nikhiljindal · 2015-09-29T19:25:34Z

cc @jszczepkowski as he is interested in getting deployments to work with autoscaler.

jszczepkowski · 2015-09-30T11:48:47Z

Horizontal Pod Autoscaler when working with Deployments, should resize the Deployment, not a RC being a part of it. So, the situation @ironcladlou mentioned shouldn't happen.

ironcladlou · 2015-10-01T15:37:03Z

Can this issue be closed? Sounds like it's not going to be a problem.

bgrant0607 · 2015-10-01T23:56:49Z

I don't think this can be closed unless/until we delete the replica-count annotations from the rolling-update implementation.

bgrant0607 · 2015-10-03T02:42:19Z

We need to decouple scaling from rolling update, and get rid of original-replicas. The solution to #14669 made rolling updates even less tolerant to simultaneous scaling. Rolling update shouldn't change the number of replicas.

bgrant0607 · 2015-10-08T00:38:17Z

Deployment also needs to work with simultaneous scaling and rolling update. It shouldn't generate a new RC just to scale.

bgrant0607 · 2015-11-12T17:55:30Z

We need these annotations in order to facilitate re-adoption after orphaning.

ironcladlou · 2015-11-12T18:46:11Z

Doesn't the new deployment controller rolling updater design invalidate my original issue? The autoscaler (and scaling endpoint) should be mutating only the deployment spec replicas (not the RCs directly). The reconciler will consider the latest deployment spec replicas each interval.

0xmichalis · 2015-11-16T13:21:28Z

The autoscaler (and scaling endpoint) should be mutating only the deployment spec replicas (not the RCs directly). The reconciler will consider the latest deployment spec replicas each interval.

I think this is correct.

bgrant0607 added area/app-lifecycle team/ux labels Sep 16, 2015

bgrant0607 added the priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. label Sep 16, 2015

bgrant0607 added this to the v1.2-candidate milestone Sep 16, 2015

ironcladlou mentioned this issue Sep 16, 2015

oc scale against a dc is fragile in the presence of races openshift/origin#4327

Closed

bgrant0607 mentioned this issue Sep 16, 2015

[RFC] DeploymentAPI: Change surge and unavailable % to be based on desired replicas #14008

Closed

bgrant0607 added the kind/enhancement label Oct 1, 2015

bgrant0607 modified the milestones: v1.2-candidate, v1.2 Nov 19, 2015

bgrant0607 modified the milestones: v1.2-candidate, v1.2 Dec 1, 2015

bgrant0607 added priority/backlog Higher priority than priority/awaiting-more-evidence. and removed priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. labels Dec 1, 2015

bgrant0607 closed this as completed Jan 16, 2016

bgrant0607 mentioned this issue Feb 4, 2016

Proportionally scale paused and rolling deployments #20273

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable deployment strategies to tolerate concurrent external scaling events #14062

Enable deployment strategies to tolerate concurrent external scaling events #14062

ironcladlou commented Sep 16, 2015

bgrant0607 commented Sep 16, 2015

bgrant0607 commented Sep 16, 2015

ironcladlou commented Sep 17, 2015

nikhiljindal commented Sep 21, 2015

nikhiljindal commented Sep 29, 2015

jszczepkowski commented Sep 30, 2015

ironcladlou commented Oct 1, 2015

bgrant0607 commented Oct 1, 2015

bgrant0607 commented Oct 3, 2015

bgrant0607 commented Oct 8, 2015

bgrant0607 commented Nov 12, 2015

ironcladlou commented Nov 12, 2015

0xmichalis commented Nov 16, 2015

Enable deployment strategies to tolerate concurrent external scaling events #14062

Enable deployment strategies to tolerate concurrent external scaling events #14062

Comments

ironcladlou commented Sep 16, 2015

bgrant0607 commented Sep 16, 2015

bgrant0607 commented Sep 16, 2015

ironcladlou commented Sep 17, 2015

nikhiljindal commented Sep 21, 2015

nikhiljindal commented Sep 29, 2015

jszczepkowski commented Sep 30, 2015

ironcladlou commented Oct 1, 2015

bgrant0607 commented Oct 1, 2015

bgrant0607 commented Oct 3, 2015

bgrant0607 commented Oct 8, 2015

bgrant0607 commented Nov 12, 2015

ironcladlou commented Nov 12, 2015

0xmichalis commented Nov 16, 2015