Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add retry classifier customization RFC #3018

Merged
merged 12 commits into from
Oct 10, 2023
1 change: 1 addition & 0 deletions design/src/SUMMARY.md
Original file line number Diff line number Diff line change
Expand Up @@ -63,6 +63,7 @@
- [RFC-0034: Smithy Orchestrator](./rfcs/rfc0034_smithy_orchestrator.md)
- [RFC-0035: Collection Defaults](./rfcs/rfc0035_collection_defaults.md)
- [RFC-0036: HTTP Dependency Exposure](./rfcs/rfc0036_http_dep_elimination.md)
- [RFC-0037: User-configurable retry classification](./rfcs/rfc0037_retry_classifier_customization.md)

- [Contributing](./contributing/overview.md)
- [Writing and debugging a low-level feature that relies on HTTP](./contributing/writing_and_debugging_a_low-level_feature_that_relies_on_HTTP.md)
240 changes: 240 additions & 0 deletions design/src/rfcs/rfc0037_retry_classifier_customization.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,240 @@
# RFC: User-configurable retry classification

> Status: RFC
>
> Applies to: client

For a summarized list of proposed changes, see the [Changes Checklist](#changes-checklist) section.

This RFC defines the user experience and implementation of user-configurable retry classification. Custom retry classifiers enable users to change what errors are retried while still allowing them to rely on defaults set by SDK authors when desired.
Velfi marked this conversation as resolved.
Show resolved Hide resolved
## Terminology

- **Smithy Service**: An HTTP service, whose API is modeled with the [Smithy IDL](https://www.smithy.io).
- **Smithy Client**: An HTTP client generated by smithy-rs from a `.smithy` model file.
- **AWS SDK**: A **smithy client** that's specifically configured to work with an AWS service.
- **Operation**: A modeled interaction with a service, defining the proper input and expected output types, as well as important metadata related to request construction. "Sending" an operation implies sending one or more HTTP requests to **smithy service**, and then receiving an output or error in response.
Velfi marked this conversation as resolved.
Show resolved Hide resolved
- **Orchestrator**: The client code which manages the request/response pipeline. The orchestrator is responsible for:
- Constructing, serializing, and sending requests.
- Receiving, deserializing, and (optionally) retrying requests.
- Running interceptors *(not covered in this RFC)* and handling errors.
- **Runtime Component**: A part of the orchestrator responsible for a specific function. Runtime components are *always* required and may depend on specific configuration. Examples include the endpoint resolver, retry strategy, and request signer.
Velfi marked this conversation as resolved.
Show resolved Hide resolved
- **Runtime Plugin**: Code responsible for setting and **runtime components** and related configuration. Runtime plugins defined by codegen are responsible for setting default configuration and altering the behavior of **smithy clients** including the **AWS SDKs**.
Velfi marked this conversation as resolved.
Show resolved Hide resolved
- **Retry Strategy**: The process by which the orchestrator determines when and how to retry failed requests. Only one retry strategy may be set. The retry strategy depends upon the **retry classifier** to interpret responses and determine if they are retryable.
- **Retry Classifier**: Code responsible for introspecting the request/response pipeline's and determining if a retry is necessary. Multiple retry classifiers may be combined into a **retry classifier chain**.
Velfi marked this conversation as resolved.
Show resolved Hide resolved
- **Retry Classifier Chain**: Requests can fail for any number of reasons, but retry classifiers are specific, meaning that they target a narrow range of possibilities. Rather than define a single retry classifier to handle all possible outcomes, multiple classifiers are chained together. Each classifier is run in turn, and the chain ends when a classifier decides a response is retryable, or a classifier may determine that a response must not be retried, or after all classifiers have run and haven't decided a retry is necessary.
Velfi marked this conversation as resolved.
Show resolved Hide resolved
- **Retry Classifier Priority**: Retry classifiers in different places and times before sending a request. Each classifier has a defined priority that enables them to be sorted correctly. When implementing your own classifier, you may set your own priority.
## The user experience if this RFC is implemented

In the current version of the SDK, users are unable to configure retry classification, except by defining a custom retry strategy. Once this RFC is implemented, users will be able to define their own classifiers and set them at the service level and/or the operation level.

### Defining a custom retry classifier

```rust
use aws_smithy_runtime::client::retries::{ClassifyRetry, RetryClassifierResult};
use aws_smithy_runtime::client::interceptors::context::InterceptorContext;

#[derive(Debug)]
struct CustomRetryClassifier;

impl ClassifyRetry for CustomRetryClassifier {
fn classify_retry(
&self,
ctx: &InterceptorContext,
result_of_preceding_classifier: Option<RetryClassifierResult>,
) -> Option<RetryClassifierResult> {
// It's typical, but not required, to respect the judgement of the
// preceding classifier and forward it on.
if let Some(result) = result_of_preceding_classifier {
return result;
}

todo!("inspect the interceptor context to determine if a retry attempt should be made.")
}

fn name(&self) -> &'static str { "my custom retry classifier" }
}
```
### Customizing retry classification for a service

```rust
#[tokio::main]
async fn main() -> Result<(), aws_sdk_s3::Error> {
let sdk_config = aws_config::load_from_env().await;
let service_config = aws_sdk_s3::Config::from(&sdk_config)
.to_builder()
.retry_classifier(CustomRetryClassifier)
.build()
let client = aws_sdk_s3::Client::from_conf(&service_config);

let res = client
.list_buckets()
.send()
.await?;

println!("your buckets: {res:?}");

Ok(())
}
```

### Customizing retry classification for an operation

```rust
#[tokio::main]
async fn main() -> Result<(), aws_sdk_s3::Error> {
let sdk_config = aws_config::load_from_env().await;
let client = aws_sdk_s3::Client::new(&sdk_config);

let res = client
.list_buckets()
.customize()
.await
.unwrap()
.config_override(
aws_sdk_s3::Config::builder()
.retry_classifier(CustomRetryClassifier)
)
.send()
.await?;

println!("your buckets: {res:?}");

Ok(())
}
```

## How to actually implement this RFC

In order to implement this feature, we must:
- Update the current retry classification system so that individual classifiers as well as collections of classifiers can be easily composed together.
- Create two new configuration mechanisms for users that allow them to customize retry classification at the service level and at the operation level.
- Update retry classifiers so that they may 'short-circuit' the chain, ending retry classification immediately.

### The `RetryClassifier` trait

```rust
/// The result of running a [`ClassifyRetry`] on a [`InterceptorContext`].
#[non_exhaustive]
#[derive(Clone, Eq, PartialEq, Debug)]
pub enum RetryClassifierResult {
/// "A retryable error was received. This is what kind of error it was,
/// in case that's important."
Error(ErrorKind),
/// "The server told us to wait this long before retrying the request."
Explicit(Duration),
/// "This response should not be retried."
DontRetry,
}
Velfi marked this conversation as resolved.
Show resolved Hide resolved

/// Classifies what kind of retry is needed for a given [`InterceptorContext`].
pub trait ClassifyRetry: Send + Sync + fmt::Debug {
/// Run this classifier on the [`InterceptorContext`] to determine if the previous request
/// should be retried. Returns a [`RetryClassifierResult`].
fn classify_retry(
&self,
ctx: &InterceptorContext,
result_of_preceding_classifier: Option<RetryClassifierResult>,
Velfi marked this conversation as resolved.
Show resolved Hide resolved
) -> Option<RetryClassifierResult>;
Velfi marked this conversation as resolved.
Show resolved Hide resolved

/// The name of this retry classifier.
///
/// Used for debugging purposes
fn name(&self) -> &'static str;

/// The priority of this retry classifier. Classifiers with a higher priority will run before
/// classifiers with a lower priority. Classifiers with equal priorities make no guarantees
/// about which will run first.
fn priority(&self) -> RetryClassifierPriority {
Velfi marked this conversation as resolved.
Show resolved Hide resolved
RetryClassifierPriority::default()
}
}
```

### Chaining retry classifiers

Multiple retry classifiers are chained by wrapping classifiers inside one another. When classifiers must be wrapped in a specific order, use a specific type for the inner classifier. When classifiers must be composable in any order, use a referenced trait object. This approach is demonstrated in the following example code:
Velfi marked this conversation as resolved.
Show resolved Hide resolved

```rust
struct ExampleRetryClassifier<'a> {
inner: Option<&'a dyn ClassifyRetry>
}

impl<'a> ClassifyRetry for ExampleRetryClassifier<'a> {
fn classify_retry(
&self,
ctx: &InterceptorContext,
result_of_preceding_classifier: Option<RetryClassifierResult>,
) -> Option<RetryClassifierResult> {
// It's typical, but not required, to respect the judgement of the
// preceding classifier and forward it on.
if let Some(result) = result_of_preceding_classifier {
return result;
}

// Do retry classification here...
// Assume that we found a retryable server error
Some(RetryClassifierResult::Error(ErrorKind::ServerError))
}
}
```

When each classifier in the chain reports the result of running it, debugging the result of classification is easy:

```txt
running 'errors modeled as retryable' classifier resulted in 'continue'
running 'retryable smithy errors' classifier resulted in 'continue'
running 'http status code' classifier resulted in 'retry (server error)'
```
### The retry classifier state machine and `RetryClassifierResult`

It's up to each chained classifier to respect the decision made by earlier links in the chain. When properly chained, the classifiers can be thought of as a state machine:

```mermaid
flowchart TB;
RetryStrategy --calls-->
RetryClassifier[The next retry classifier\n in the chain] --SomeError--> Retry[make a retry attempt]
RetryClassifier --SomeExplicit--> RetryAfter[wait for the given duration\nbefore making a retry attempt]
RetryClassifier --SomeDontRetry--> DontRetry[don't make a retry attempt]
RetryClassifier --None--> RetryClassifier
```
It is possible for a wrapping classifier to ignore inner classifiers, but this is not considered typical behavior. The cases where an inner classifier would be ignored MUST be clearly documented.

### Setting a retry classifier with a runtime plugin

Wrapping retry classifiers may be set with a runtime plugin. When setting a classifier with this method, the runtime plugin is responsible for extracting any previously-set classifier and wrapping it.

```rust
impl<'a> ExampleRetryClassifier<'a> {
pub fn new(inner: Option<&'a dyn ClassifyRetry>) -> Self {
Self { inner }
}
}

struct ExampleRetryClassifierRuntimePlugin;

impl RuntimePlugin for ExampleRetryClassifierRuntimePlugin {
fn runtime_components(
&self,
current_components: &RuntimeComponentsBuilder,
) -> Cow<'_, RuntimeComponentsBuilder> {

let rcb = RuntimeComponentsBuilder::new("ExampleRetryClassifierRuntimePlugin")
.with_retry_classifier(
ExampleRetryClassifier::new(current_components.retry_classifier())
);

Cow::Owned(rcb)
}
}
```

By default, newer runtime plugins will override previously-set plugins. This is important to consider when deciding how your classifier will wrap other classifiers.

Velfi marked this conversation as resolved.
Show resolved Hide resolved
## Changes checklist

- [ ] Make retry classifiers composable by runtime plugins.
- [ ] Enable configuration of retry classifiers at the service level.
- [ ] Enable configuration of retry classifiers at the operation level.
- [ ] Replace `RetryReason` with `RetryClassifierResult`.
- [ ] Add variant for `DontRetry`
- [ ] Add variant for `Continue`
Velfi marked this conversation as resolved.
Show resolved Hide resolved