-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
roundrobin: randomize starting address when rebuilding the picker #2579
Conversation
cc @ejona86 @markdroth FYI |
The more I think about this, the less convinced I am that randomization is the right behavior. As we've discussed, there are two cases here:
In case 2, I think starting at zero is fine. Starting at a random index could be harmful if there are cases where there are a small rate of very expensive RPCs and a resolver or balancer gave us addresses in a particular order and expected us to follow that order. Admittedly, that's probably a rare case, so we could probably get away with using a random index in this case, but since starting at zero is probably easier to implement than randomizing, I still think it's the better choice. Case 1 is the more interesting case. Starting at a random index is strictly better than starting over at zero, for the reasons you note in this PR. However, I still think the right behavior is that we should keep going from the index we were at previously. Otherwise, this means that any time an individual backend is flapping, we will be constantly skipping randomly throughout the list. Even worse, all clients will be doing that random skipping, which means that round_robin winds up degenerating into pick_random. |
That doesn't sound that bad, as a degenerate case. Being the literal worst case, degenerating to pick_random... meh. Like, I have trouble caring. To change that behavior would require a larger change in how subchannels are passed to the picker. So it sounds like this PR should still be merged, even if it is expected to be replaced later. |
@@ -47,12 +48,19 @@ type rrPickerBuilder struct{} | |||
|
|||
func (*rrPickerBuilder) Build(readySCs map[resolver.Address]balancer.SubConn) balancer.Picker { | |||
grpclog.Infof("roundrobinPicker: newPicker called with readySCs: %v", readySCs) | |||
if len(readySCs) == 0 { | |||
return base.NewErrPicker(balancer.ErrNoSubConnAvailable) | |||
} | |||
var scs []balancer.SubConn | |||
for _, sc := range readySCs { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So go is iterating through the channels in a shuffled order?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Effectively, yes. They are ordered by how they happen to be in the readySCs map, keyed by the address (as a string, plus a "type" and a pointer to optional user metadata). This is also not right, but I do think randomizing here is better than not randomizing, so we should merge this PR for now and file an issue to improve further.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I wasn't wanting to block this PR.
A task flapping is not that unusual of a case. I think this is common enough that it's something we should handle gracefully.
I'm not sure exactly how Go (or Java) are passing the subchannels to the picker, but it's not obvious to me why that has to change. Presumably, the picker stores the index into its list. So couldn't we have the client channel let the new picker get some state from the old picker before swapping the new picker in? That would give us the ability to retain the last index used in the previous picker.
If this PR only adds randomization for the case 1, not case 2, then I agree that it can go forward, since it's a strict improvement over resetting to 0 every time a subchannel's state changes. But we should follow up with a better fix in a separate PR. |
The problem is that the lists will be of different sizes. Go and Java pass a list of size equal to the number of ready subchannels. In addition, because there is no locking around uses of the Picker the old and new picker are used simultaneously (or rather, old calls to the old picker may still be in progress when new calls to the new picker begin). So "get state from the old picker" is probably not a solution like it is in C. (Instead, we would probably share data between the two, but we have a GC so that doesn't cause lifetime issues.)
It would randomize on both cases. Case 2 is rare; while I can agree it'd be good to start at the beginning, I don't think it practically matters. The current implementation for case 1 is dire; it degrades to all RPCs being sent to one backend. It seems weird to me to block an obvious "this could cause an outage" fix because of an "occasionally we will do things in a slightly different order, but nobody will probably even notice". |
Correct. If maintaining the position in the list is a requirement, probably what I'd do is make it so there actually is only one picker permanently, and update its state atomically/with a lock when a change needs to be made. Either way, I think this is a real improvement and the full fix is going to be lower priority for us. I will file an issue for the follow-up changes. In that, it would be nice if we could define the required/desired/acceptable behaviors for a RR balancer in gRPC in case we can find a user that would like to contribute a fix. Also note that in case 2 we aren't even starting at the top today - as @ejona86 observed, we are shuffling our list of addresses due to the map from addresses to subchannels. |
And per my observation in #2580, starting at the top doesn't even seem like a reliable thing that can be tested or relied upon by users, given that the first RPC will go to the first-connected backend and not unconditionally to the first address in the list. |
No description provided.