- Author(s): Easwar Swaminathan (@easwars)
- Approver: @markdroth
- Status: Draft
- Implemented in:
- Last updated: 2023-04-20
- Discussion at: https://groups.google.com/g/grpc-io/c/uUf0V5zZvQc
This document describes a couple of changes being made to the pick_first
LB
policy with regards to
- Expected behavior when connections to all addresses fail, and
- Support for address order randomization
All gRPC implementations contain a simple load balancing policy named
pick_first
that can be summarized as follows:
- It takes a list of addresses from the name resolver and attempts to connect to those addresses one at a time, in order, until it finds one that is reachable.
- Once it finds a reachable address:
- All RPCs sent on the gRPC channel will be sent to this address.
- If this connection breaks at a later point in time,
pick_first
will not attempt to reconnect until the application requests that it does so, or makes an RPC.
- If none of the addresses are reachable, it applies an exponential backoff before attempting to reconnect.
There are a few problems with the existing pick_first
functionality, which
will be described in the following subsections.
When connections to all addresses fail, there are some similarities and some differences between the Java/Go implementations and the C-core implementation.
Similarities include:
- Reporting
TRANSIENT_FAILURE
as the connectivity state to the channel. - Applying exponential backoff. During the backoff period, wait_for_ready RPCs are queued while other RPCs fail.
Differences show up after the backoff period ends:
- C-core remains in
TRANSIENT_FAILURE
while Java/Go move toIDLE
. - C-core attempts to reconnect to the given addresses, while Java/Go rely on the client application to make an RPC or an explicit attempt to connect.
- C-core moves to
READY
only when a connection attempt succeeds.
This behavior of staying in TRANSIENT_FAILURE
until it can report READY
is
called sticky TRANSIENT_FAILURE, or sticky-TF.
Current pick_first
implementations that don't provide sticky-TF have the
following shortcomings:
- When none of the received addresses are reachable, client applications
experience long delays before their RPCs fail. This is because the channel
does not spend enough time in
TRANSIENT_FAILURE
and goes back toCONNECTING
state while attempting to reconnect. - priority LB policy maintains an ordered list of child policies, and
sends picks to the highest priority child reporting
READY
orIDLE
. It expects child policies to support sticky-TF, and if not, it can result in picks being sent to a higher priority child with no reachable backends, instead of a lower priority child that is reportingREADY
. This comes up is in xDS in the following scenario:- A
LOGICAL_DNS
cluster is used under an aggregate cluster, and theLOGICAL_DNS
cluster is not the last cluster in the list. - Each cluster under the aggregate cluster is represented as a child policy
under
priority
, and the leaf policy for aLOGICAL_DNS
cluster ispick_first
. - Without sticky-TF support in
pick_first
, it can lead to a situation where thepriority
LB policy continues to send picks to a higher priorityLOGICAL_DNS
cluster when none of the addresses behind it are reachable, becausepick_first
doesn't reportTRANSIENT_FAILURE
as its connectivity state. See gRFC A37 for more details on aggregate clusters.
- A
Because pick_first
sends all requests to the same address, it is often used
for L4 load balancing by randomizing the order of the addresses used by each
client. In general, gRPC expects address ordering to be determined as part of
name resolution, not by the LB policy. For example, DNS servers may randomize
the order of addresses when there are multiple A/AAAA records, and the DNS
resolver in gRPC is expected to perform [RFC-6724][6724] address sorting.
However, there are some cases where DNS cannot randomize the address order,
either because the DNS server does not support that functionality or because it
is defeated by client-side DNS caching. To address such cases, it is desirable
to add a client-side mechanism for randomly shuffling the order of the
addresses.
There are cases where it is desirable to perform L4 load balancing using
pick_first
when getting addresses via xDS instead of DNS. As a result, we need
a way to configure use of this LB policy via xDS.
Note that client-side address shuffling may be equally desirable in this case, since the xDS server may send the same EDS resource (with the same endpoints in the same order) to all clients.
- gRFC A37: xDS Aggregate and Logical DNS Clusters
- gRFC A52: gRPC xDS Custom Load Balancer Configuration
- gRFC A56:
priority_experimental
LB policy
Specific changes are described in their own subsections below.
Using sticky-TF by default in all pick_first
implementations would enable us
to overcome the shortcomings described above. This
would involve making the following changes to pick_first
implementations, once
connections to all addresses fail:
- Report
TRANSIENT_FAILURE
as the connectivity state. - Attempt to reconnect to the addresses indefinitely until a connection succeeds
(at which point, they should report
READY
), or there is no RPC activity on the channel for the specifiedIDLE_TIMEOUT
.
All gRPC implementations should implement IDLE_TIMEOUT
and have it enabled by
default. A default value of 30 minutes is recommended.
At the time of this writing, pick_first
implementations do not expect any
configuration to be passed to it. As part of this design, we will add a field to
its configuration that would enable it to randomly shuffle the order of the
addresses it receives.
{
// If set to true, instructs the LB policy to shuffle the order of the
// list of addresses received from the name resolver before attempting to
// connect to them.
"shuffleAddressList": boolean
}
In a gRPC implementation that supports this feature, when the
shuffleAddressList
option is enabled, the pick_first
Lb policy will randomly
shuffle the order of the addresses. This shuffling will be done when the LB
policy receives an updated address list from its parent.
Note that existing gRPC implementations that do not support this feature will ignore this field, so their behavior will remain unchanged.
gRPC recently added support for custom load balancer configuration to be specified by the xDS server. See gRFC A52 for more details.
To enable the xDS server to specify pick_first
using this mechanism, an
extension configuration message was added as part of Envoy PR
#26952.
gRPC's Custom LB policy functionality will be enhanced to support this new
extension and will result in the pick_first
LB policy being used as the
locality and endpoint picking policy.
During initial development, the GRPC_EXPERIMENTAL_PICKFIRST_LB_CONFIG
environment variable will guard the following:
shuffleAddressList
configuration knob in thepick_first
LB policy- Accepting the PickFirst config message as a Custom LB policy in xDS
N/A
Will be implemented in C-core, Java, Go, and Node.
N/A