Update Agent to be more resilient in case of unauthenticated timeouts with IMDS #3795
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
There is an edge case where the credentials within the agent used to access internal resources can be incorrect due to other processes restarting during a certain time interval. This can cause the agent to think that the credentials are not expired and correct. The change is to prevent this from happening, and refreshing the credentials in the first location this issue would appear, which would be
RegisterContainerInstance
in the ecsclient.Implementation details
The issue occurs in the
RegisterContainerInstance
function in ecsclient'sclient.go
.RegisterContainerInstance
makes the call to IMDS inSetInstanceIdentity
function where the client for communicating with IMDS is used with the callclient.ec2metadata.GetDynamicData(...)
. The new IMDS process does not work with the current credentials.In the fix, this line is replaced with a retry strategy. When we get an error code, the current credentials in agent are forced before the retry to be expired, and a
GetCredentials
call is made to refresh the credentials, with multiple retries if it fails again. The global instance credentials in the agent are refreshed with this call, so other calls made by agent to IMDS are no longer an issue.Testing
To test, a new case was added to the
RegisterContainerInstanceTest
that is the exact same as the basic success case, but the first call toclient.ec2metadata.GetDynamicData(...)
was simulated to fail, and the retry call to succeed. The rest of the test validations remained, making sure that retries didn't prevent any other process from happening.New tests cover the changes: yes
Description for the changelog
Files:
were changed as described in the implementation and testing details
Licensing
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.