Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable deployment strategies to tolerate concurrent external scaling events #14062

Closed
ironcladlou opened this issue Sep 16, 2015 · 13 comments
Closed
Labels
area/app-lifecycle priority/backlog Higher priority than priority/awaiting-more-evidence.

Comments

@ironcladlou
Copy link
Contributor

The rolling updater should be tolerant of external scaling events concurrent with the update, but lacks the required context to reevaluate its internal state. As a result, updates can conflict with manual scaling and the autoscaler in unpredictable ways.

The rolling updater computes the desired replica count for the target replication controller only once during initialization based on a snapshot of the the target replication controller state. If during the rolling update the target replication controller spec.replicas is modified externally (via scale operations), the rolling updater is unable to deduce the intent of the replica count change in order to recompute the target desired replica count for the replication controller.

A new replication controller field such as spec.trendingToReplicas could be added and considered a formalization of the kubectl.kubernetes.io/desired-replicas annotation. This field would provide a means for deployment strategies and scaling systems to set a desired replica count for the replication controller itself in the context of replication controller transitions (such as a rolling update).

The rolling updater (and any other strategies) would drive the replication controller spec.replicas towards spec.trendingToReplicas. Any changes to spec.trendingToReplicas could safely result in recalculating the scaling internals.

Scaling systems and other API calls which ultimately want to mutate spec.replicas could then work safely with deployment strategies: If spec.trendingToReplicas is set, update spec.trendingToReplicas and cause rolling updates (or other deployment strategies) to adapt. If spec.trendingToReplicas is unset, directly update the RC spec.replicas.

cc @nikhiljindal @bgrant0607 @smarterclayton @Kargakis

@bgrant0607
Copy link
Member

Related: #11874

@bgrant0607 bgrant0607 added the priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. label Sep 16, 2015
@bgrant0607 bgrant0607 added this to the v1.2-candidate milestone Sep 16, 2015
@bgrant0607
Copy link
Member

Also relevant:
#2863 (comment)
#1743
#1629

In the case of Deployment, the desired replicas needn't be stored in any of the underlying ReplicationControllers.

If one were to attach the desired replicas to a single RC, then one needs to deal with the case of creating a new RC with a new desired replicas while an existing rollout was in flight.

A new field would mean that a newer client wouldn't work with older apiservers, and the new rolling update wouldn't be compatible with older clients performing scaling.

Keeping track of changes made by rolling update instead of the desired replicas count would enable desired replicas to be calculated from the underlying RCs in the presence of simultaneous scaling. Here's an example to see how complicated this would be.

  1. Start with 10 replicas in RC_orig.
  2. Start rolling update with MaxUnavailable = 2, MaxSurge = 3. RC_new has 0 replicas. RC_orig and RC_new both have deltas of 0.
  3. Increase RC_new to 3 replicas. Its delta is +3.
  4. Decrease RC_orig to 8 replicas. Its delta is -2.
  5. Increase RC_new to 5 replicas. Its delta is +5. The net delta can be computed by summing the deltas: +5 + -2 = 3. The number of available replicas is determined by monitoring available replicas from all RCs, currently 8. Min. available replicas can be computed by summing the number of replicas, subtracting the net delta and subtracting MaxUnavailable: 5 + 8 - 3 - 2 = 8.
  6. RC_orig is still the larger RC, so let's say it's scaled to 9 replicas, which increased the total desired replicas (not explicitly recorded) to 11. Its delta is still -2. Now min. available replicas is 5 + 9 - 3 - 2 = 9, current available replicas is still 8, and the current surge is equal to the net delta, 3. So the update needs to wait. Seems reasonable to me.
  7. When, say, 2 of the new replicas become available, decrease RC_orig from 9 to 7 replicas. Its delta becomes -4. The net delta is +5 + -4 = 1.
  8. Let's say the client is killed at this point. In the meantime, the remaining pods from RC_new become available, as determined by Ready transition times.
  9. The update is resumed. RC_new has 5 replicas and delta +5. RC_orig has 7 replicas and delta -4. Net delta is +5 - 4 = 1. Number of replicas is 5 + 7 = 12, which are all available. Current surge is still 1, which makes sense, since 12 - 1 = 11. RC_new can be updated to 7 replicas, delta +7, increasing the net delta to 3 again.
  10. Number available is 12 and min. number available is 7 + 7 - 3 - 2 = 9. RC_orig can be updated to 4 replicas, delta -7. Number available is 9. Net delta is +7 + -7 = 0. Total replicas is 7 + 4 = 11, so that matches.
  11. RC_new can be updated to 10 replicas, delta +10. Total replicas is 14. Net delta is +10 + -7 = 3.
  12. Let's say 2 more replicas become available. RC_orig can be updated to 2 replicas, delta -9. Net delta is +10 + -9 = 1. Total replicas is 10 + 2 = 12. Available replicas is still 9.
  13. RC_new can be updated to 11 replicas, which is total replicas - net delta, and delta +11.
  14. When 2 more replicas become available, RC_orig can be updated to 0.

Done.

Seems to work.

@ironcladlou
Copy link
Contributor Author

Looks like you're right; delta tracking would do the trick. Thanks for the thorough analysis.

@nikhiljindal
Copy link
Contributor

How are we keeping track of the deltas? Annotations on the RCs?

@nikhiljindal
Copy link
Contributor

cc @jszczepkowski as he is interested in getting deployments to work with autoscaler.

@jszczepkowski
Copy link
Contributor

Horizontal Pod Autoscaler when working with Deployments, should resize the Deployment, not a RC being a part of it. So, the situation @ironcladlou mentioned shouldn't happen.

@ironcladlou
Copy link
Contributor Author

Can this issue be closed? Sounds like it's not going to be a problem.

@bgrant0607
Copy link
Member

I don't think this can be closed unless/until we delete the replica-count annotations from the rolling-update implementation.

@bgrant0607
Copy link
Member

We need to decouple scaling from rolling update, and get rid of original-replicas. The solution to #14669 made rolling updates even less tolerant to simultaneous scaling. Rolling update shouldn't change the number of replicas.

@bgrant0607
Copy link
Member

Deployment also needs to work with simultaneous scaling and rolling update. It shouldn't generate a new RC just to scale.

@bgrant0607
Copy link
Member

We need these annotations in order to facilitate re-adoption after orphaning.

@ironcladlou
Copy link
Contributor Author

Doesn't the new deployment controller rolling updater design invalidate my original issue? The autoscaler (and scaling endpoint) should be mutating only the deployment spec replicas (not the RCs directly). The reconciler will consider the latest deployment spec replicas each interval.

@0xmichalis
Copy link
Contributor

The autoscaler (and scaling endpoint) should be mutating only the deployment spec replicas (not the RCs directly). The reconciler will consider the latest deployment spec replicas each interval.

I think this is correct.

@bgrant0607 bgrant0607 modified the milestones: v1.2-candidate, v1.2 Nov 19, 2015
@bgrant0607 bgrant0607 modified the milestones: v1.2-candidate, v1.2 Dec 1, 2015
@bgrant0607 bgrant0607 added priority/backlog Higher priority than priority/awaiting-more-evidence. and removed priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. labels Dec 1, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/app-lifecycle priority/backlog Higher priority than priority/awaiting-more-evidence.
Projects
None yet
Development

No branches or pull requests

5 participants