Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AssumeRoleProvider might not be retrying throttling errors #524

Closed
jdisanti opened this issue Apr 27, 2022 · 3 comments
Closed

AssumeRoleProvider might not be retrying throttling errors #524

jdisanti opened this issue Apr 27, 2022 · 3 comments

Comments

@jdisanti
Copy link
Contributor

Describe the bug

With an application that makes a call to S3 on startup using the AssumeRoleProvider, if several instances of that application are started up rapidly, STS AssumeRole throttles. Currently, it looks like the SDK isn't correctly detecting this throttling and retrying with backoff, but instead, immediately failing the call to S3. This hasn't been confirmed though, since the current retry implementation doesn't log enough to say for certain.

Expected Behavior

The AssumeRoleProvider should retry throttling errors with exponential backoff.

Current Behavior

See bug description.

Reproduction Steps

See bug description.

Possible Solution

No response

Additional Information/Context

No response

Version

0.8.0

Environment details (OS name and version, etc.)

AL2

Logs

No response

@jdisanti jdisanti added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Apr 27, 2022
@jdisanti
Copy link
Contributor Author

I haven't been able to catch a throttling error from STS in the wild, so I've been testing with a faked throttling error by swapping out the connector with one that fakes a throttling error response 50% of the time. With this, I was able to instrument the retry logic with more logging and confirm that retry is working as expected. I tested the STS client directly, and also the AssumeRoleProvider with the special connector and both are retrying correctly.

If the SDK is correctly identifying the real world STS throttling error as a throttling error (as opposed to an unretryable error), then the retry should work. I need to see an actual STS throttling response to make any further progress investigating this.

Tested with this fake throttling response:

http::Response::builder()
    .status(400)
    .body(SdkBody::from(Bytes::from_static(
        br#"
        <ErrorResponse>
            <Error>
                <Type>ErrorMessage</Type>
                <Code>Throttling</Code>
                <Message>Rate exceeded</Message>
            </Error>
        </ErrorResponse>
        "#,
    )))
    .unwrap())

All this testing was done on v0.10.1.

@Velfi Velfi added investigating This issue is being investigated and/or work is in progress to resolve the issue. and removed needs-triage This issue or PR still needs to be triaged. labels Apr 29, 2022
@jdisanti
Copy link
Contributor Author

Got a hold of a real throttling error from STS and replicated it in the test connector. For posterity, this is what a real throttling error looks like:

http::Response::builder()
    .status(400)
    .header("content-type", "text/xml")
    .body(SdkBody::from(Bytes::from_static(
        br#"
        <ErrorResponse xmlns="https://sts.amazonaws.com/doc/2011-06-15/">
                <Error>
                        <Type>Sender</Type>
                        <Code>Throttling</Code>
                        <Message>Rate exceeded</Message>
                </Error>
                <RequestId>bf2eb53e-56c1-4c87-970f-0f7cf4342b93</RequestId>
        </ErrorResponse>
        "#,
    )))
    .unwrap()

The retry is working correctly as far as I can tell. I will add a debug event to the retry logic to make it clearer that it's working by looking at the logs.

@jdisanti jdisanti removed bug This issue is a bug. investigating This issue is being investigated and/or work is in progress to resolve the issue. labels Apr 29, 2022
@github-actions
Copy link

⚠️COMMENT VISIBILITY WARNING⚠️

Comments on closed issues are hard for our team to see.
If you need more assistance, please either tag a team member or open a new issue that references this one.
If you wish to keep having a conversation with other community members under this issue feel free to do so.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants