-
Notifications
You must be signed in to change notification settings - Fork 249
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Should ResponseError
s be retriable?
#303
Comments
this is a good question. We're investigating on our side to determine how the other SDKs behave and what the correct behavior should be. It's worth noting that it's possible to completely override the retry policy by using the low-level operation API + |
As an aside, I'm curious about the framework / tooling you're using for fault generation, we'd like to use something similar in the SDK. |
Ah yeah, it's not too complicated. It's closer to random chaos testing than it is to principled fault injection. Specifically we test that Materialize's S3 source is resilient to transient network failures by putting Toxiproxy between |
Sounds good! For now we're just blindly retrying any errors that we see with some backoff, which works well enough: https://github.com/MaterializeInc/materialize/blob/dbf6c5df193f785e5bff364b5f21b1957a7c0322/src/dataflow/src/source/s3.rs#L316-L325. The downside is that we'll retry permanent errors a few times, but the upside is that we're 100% guaranteed to catch any transient errors. |
thanks again for reporting this. this should be retried automatically by the SDK and we're working on a fix |
I think this was fixed in version 0.8.0 by smithy-lang/smithy-rs#1197 Are you still running into this? |
Hmm, I'm not sure— let (err, response) = match err {
Ok(_) => return RetryKind::Unnecessary,
Err(SdkError::ServiceError { err, raw }) => (err, raw),
Err(SdkError::DispatchFailure(err)) => {
return if err.is_timeout() || err.is_io() {
RetryKind::Error(ErrorKind::TransientError)
} else if let Some(ek) = err.is_other() {
RetryKind::Error(ek)
} else {
RetryKind::UnretryableFailure
}
}
Err(_) => return RetryKind::UnretryableFailure,
}; The issue is that |
From discussion with @rcoh: The inner |
|
|
I just noticed that
ResponseError
s are not automatically retried by the SDK. The specific example I was surprised by was an S3ListObjectsV2
request that failed due to an incomplete body:(FWIW I'm specifically testing retry behavior with injected faults; this isn't an actual error I've observed in production.)
Is it intentional that these aren't automatically retried? I'm not actually sure what the other SDKs do in this situation. I guess it could be a problem for non-idempotent requests, because they'll get duplicated, but that can already happen with requests that are retried after a timeout, where the client doesn't know whether the server actually processed the request already.
The text was updated successfully, but these errors were encountered: