Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A62: pick_first: sticky TRANSIENT_FAILURE and address order randomization #357

Merged
merged 7 commits into from
May 17, 2023
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
83 changes: 48 additions & 35 deletions A62-pick-first.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
A62: `pick_first` LB policy
A62: `pick_first`: sticky TRANSIENT_FAILURE and address order randomization
----
* Author(s): Easwar Swaminathan (@easwars)
* Approver: @markdroth
Expand All @@ -9,63 +9,68 @@ A62: `pick_first` LB policy

## Abstract

This design specifies a configuration option for the `pick_first` LB policy that
would enable it to randomly shuffle the order of the addresses it receives
before it starts to attempt to connect.

Given that `pick_first` is the default LB policy in all gRPC implementations, it
is necessary to have a written specification that details its behavior. This
document provides that specification.
This document describes a couple of changes being made to the `pick_first` LB
policy with regards to
- Expected behavior when connections to all addresses fail, and
- Support for address order randomization

## Background
markdroth marked this conversation as resolved.
Show resolved Hide resolved

All gRPC implementations contain a simple load balancing policy named
`pick_first` that can be summarized as follows:
- It takes a list of addresses from the name resolver and attempts to connect to
those addresses one at a time, in order, until it finds one that is reachable.
All RPCs sent on the gRPC channel will be sent to this address.
- If this connection breaks at a later point in time, pick_first will not
attempt to reconnect until the application requests that it does so.
- All RPCs sent on the gRPC channel will be sent to this address.
markdroth marked this conversation as resolved.
Show resolved Hide resolved
- If this connection breaks at a later point in time, `pick_first` will not
attempt to reconnect until the application requests that it does so, or makes
an RPC.
- If none of the addresses are reachable, it applies an exponential backoff
before attempting to reconnect.

Since `pick_first` was implemented prior to the first stable version of any gRPC
language, no gRFC exists for it. Although it is a simple LB policy whose
behavior can be summarized in a few sentences, as done above, there are some
subtle differences in implementation across the various languages. This document
aims to provide a more detailed description of its behavior, with the aim of
bringing convergence among the different implementations.
When connections to all addresses fail, there are some similarities and some
differences between the Java/Go implementations and the C-core implementation.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the behavior in node? @murgatroid99

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Node's current behavior is similar to Java and Go in regards to TF reporting. Node also does not currently implement IDLE_TIMEOUT.


## Proposal
Similarities include:
- Reporting `TRANSIENT_FAILURE` as the connectivity state to the channel.
- Applying exponential backoff. During the backoff period, `wait_for_ready` RPCs
markdroth marked this conversation as resolved.
Show resolved Hide resolved
are queued while other RPCs fail.

Differences show up after the backoff period ends:
- C-core remains in `TRANSIENT_FAILURE` while Java/Go move to `IDLE`.
- C-core attempts to reconnect to the given addresses, while Java/Go rely on the
client application to make an RPC or an explicit attempt to connect.
- C-core moves to `READY` only when a connection attempt succeeds.

Specific scenarios are described in their own subsections below.
This behavior of staying in `TRANSIENT_FAILURE` until it can report `READY` is
called sticky-TransientFailure, or sticky-TF.
markdroth marked this conversation as resolved.
Show resolved Hide resolved

### When connections to all addresses fail
## Proposal

Specific changes are described in their own subsections below.

`pick_first` LB policy attempts to connect to the given addresses in order, and
when none of the addresses are reachable, it must:
- report `TransientFailure` as the connectivity state to the channel
- apply exponential backoff (all RPCs attempted during this period will fail)
- attempt to reconnect to the given addresses in order
- continue to report `TransientFailure` as the connectivity state, until a
connection succeeds, at which point, it must report `Ready`
### Support Sticky-TF

This behavior of staying in `TransientFailure` until it can report `Ready` is
called sticky-TransientFailure.
All `pick_first` implementations should support sticky-TF. Once connections to
markdroth marked this conversation as resolved.
Show resolved Hide resolved
all addresses fail, they should:
- Report `TRANSIENT_FAILURE` as the connectivity state.
- Attempt to reconnect to the addresses indefinitely until a connection succeeds
(at which point, they should report `READY`), or the channel becomes idle (see
[gRPC Connectivity Semantics](1) for more details about idleness).
markdroth marked this conversation as resolved.
Show resolved Hide resolved
markdroth marked this conversation as resolved.
Show resolved Hide resolved

`pick_first` LB policy should indefinitely attempt to reconnect, in a bid to
move out of `TransientFailure` and into `Ready`, until the channel enters idle
mode due to inactivity, at which point, the channel will shut down the name
resolver and the LB policy. See [gRPC Connectivity Semantics](1) for more
details about idleness.
Supporting sticky-TF has the following advantages:
markdroth marked this conversation as resolved.
Show resolved Hide resolved
- Avoids long delays before failing RPCs while the channel goes back to
`CONNECTING` state while attempting to reconnect.
- Allows `pick_first` to work as a child of the [priority LB policy](2).
markdroth marked this conversation as resolved.
Show resolved Hide resolved
markdroth marked this conversation as resolved.
Show resolved Hide resolved
markdroth marked this conversation as resolved.
Show resolved Hide resolved

[1]: https://github.com/grpc/grpc/blob/master/doc/connectivity-semantics-and-api.md
[2]: https://github.com/grpc/proposal/blob/master/A56-priority-lb-policy.md

### Enable random shuffling of address list

At the time of this writing, `pick_first` implementations do not expect any
configuration to be passed to it. As part of this design, we will add a field to
its configuration.
its configuration that would enable it to randomly shuffle the order of the
addresses it receives before it starts to attempt to connect.

```
{
Expand All @@ -81,6 +86,14 @@ not support this configuration should continue to work and their behavior should
remain unchanged. Implementations that support the new field should shuffle the
markdroth marked this conversation as resolved.
Show resolved Hide resolved
received address list at random before attempting to connect to them.

Our general philosophy is that the address order is to be determined by the name
resolver server, or by the name resolver client performing sorting as described
in [RFC-8304 section 4](https://www.rfc-editor.org/rfc/rfc8305#section-4). But
having this option in `pick_first` can be beneficial in some DNS configurations
where all clients get the addresses in the same order (e.g., either because the
authoritative server does that or because of caching) but where it is desirable
to randomize the order of the addresses to provide better load balancing.

### Temporary environment variable protection

During initial development, random shuffling of address list will be enabled by
markdroth marked this conversation as resolved.
Show resolved Hide resolved
Expand Down