-
Notifications
You must be signed in to change notification settings - Fork 519
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pluto: add more retries for requests out to AWS #3214
pluto: add more retries for requests out to AWS #3214
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks reasonable to me. Is this the same timeout that we were getting by default before?
const IMDS_CONNECT_TIMEOUT: Duration = Duration::from_secs(2);
The default timeout for the IMDS clients are set to 1 second. https://github.com/awslabs/smithy-rs/blob/7ed51b21290aba818fcfd7a5501fe7035cde5c24/aws/rust-runtime/aws-config/src/imds/client.rs#L47 |
sources/api/pluto/src/aws.rs
Outdated
|
||
// Max request retry attempts | ||
const MAX_ATTEMPTS: u32 = 10; | ||
const IMDS_CONNECT_TIMEOUT: Duration = Duration::from_secs(2); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make this 3 seconds to align with the AWS SDK?
sources/api/pluto/src/aws.rs
Outdated
pub(crate) async fn sdk_imds_client() -> Result<imds::Client> { | ||
imds::Client::builder() | ||
.max_attempts(MAX_ATTEMPTS) | ||
.connect_timeout(IMDS_CONNECT_TIMEOUT) | ||
.build() | ||
.await | ||
.context(SdkImdsSnafu) | ||
} | ||
|
||
pub(crate) fn sdk_retry_config() -> RetryConfig { | ||
RetryConfigBuilder::new().max_attempts(MAX_ATTEMPTS).build() | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do these need to be pub(crate)
?
pub(crate) async fn sdk_imds_client() -> Result<imds::Client> { | |
imds::Client::builder() | |
.max_attempts(MAX_ATTEMPTS) | |
.connect_timeout(IMDS_CONNECT_TIMEOUT) | |
.build() | |
.await | |
.context(SdkImdsSnafu) | |
} | |
pub(crate) fn sdk_retry_config() -> RetryConfig { | |
RetryConfigBuilder::new().max_attempts(MAX_ATTEMPTS).build() | |
} | |
async fn sdk_imds_client() -> Result<imds::Client> { | |
imds::Client::builder() | |
.max_attempts(MAX_ATTEMPTS) | |
.connect_timeout(IMDS_CONNECT_TIMEOUT) | |
.build() | |
.await | |
.context(SdkImdsSnafu) | |
} | |
fn sdk_retry_config() -> RetryConfig { | |
RetryConfigBuilder::new().max_attempts(MAX_ATTEMPTS).build() | |
} |
5a49daf
to
37e6aa9
Compare
Pushes above and below addresses @bcressey 's comment.
|
Enforces a 6 minute timeout on settings generator execution so as to not hang the boot for too long before erroring out.
This adds more resiliency when pluto makes API requests out to EC2 and EKS. We choose a large number of retries to force the SDK to keep retrying until we timeout on the overall request. It's now less likely for the request to overall fail due to transient connection errors.
37e6aa9
to
21876a0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updates look good to me.
Issue number:
N/A
Description of changes:
Testing done:
On a host where 25% of all outbound packets get dropped, I was able to call
pluto private-dns-name
where it retried the request 5 times (beyond the default 3 tries set by the SDK).Terms of contribution:
By submitting this pull request, I agree that this contribution is dual-licensed under the terms of both the Apache License, version 2.0, and the MIT license.