-
Notifications
You must be signed in to change notification settings - Fork 251
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AssumeRoleProvider might not be retrying throttling errors #524
Comments
I haven't been able to catch a throttling error from STS in the wild, so I've been testing with a faked throttling error by swapping out the connector with one that fakes a throttling error response 50% of the time. With this, I was able to instrument the retry logic with more logging and confirm that retry is working as expected. I tested the STS client directly, and also the If the SDK is correctly identifying the real world STS throttling error as a throttling error (as opposed to an unretryable error), then the retry should work. I need to see an actual STS throttling response to make any further progress investigating this. Tested with this fake throttling response: http::Response::builder()
.status(400)
.body(SdkBody::from(Bytes::from_static(
br#"
<ErrorResponse>
<Error>
<Type>ErrorMessage</Type>
<Code>Throttling</Code>
<Message>Rate exceeded</Message>
</Error>
</ErrorResponse>
"#,
)))
.unwrap()) All this testing was done on |
Got a hold of a real throttling error from STS and replicated it in the test connector. For posterity, this is what a real throttling error looks like: http::Response::builder()
.status(400)
.header("content-type", "text/xml")
.body(SdkBody::from(Bytes::from_static(
br#"
<ErrorResponse xmlns="https://sts.amazonaws.com/doc/2011-06-15/">
<Error>
<Type>Sender</Type>
<Code>Throttling</Code>
<Message>Rate exceeded</Message>
</Error>
<RequestId>bf2eb53e-56c1-4c87-970f-0f7cf4342b93</RequestId>
</ErrorResponse>
"#,
)))
.unwrap() The retry is working correctly as far as I can tell. I will add a debug event to the retry logic to make it clearer that it's working by looking at the logs. |
|
Describe the bug
With an application that makes a call to S3 on startup using the AssumeRoleProvider, if several instances of that application are started up rapidly, STS AssumeRole throttles. Currently, it looks like the SDK isn't correctly detecting this throttling and retrying with backoff, but instead, immediately failing the call to S3. This hasn't been confirmed though, since the current retry implementation doesn't log enough to say for certain.
Expected Behavior
The AssumeRoleProvider should retry throttling errors with exponential backoff.
Current Behavior
See bug description.
Reproduction Steps
See bug description.
Possible Solution
No response
Additional Information/Context
No response
Version
0.8.0
Environment details (OS name and version, etc.)
AL2
Logs
No response
The text was updated successfully, but these errors were encountered: