Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

imdsclient: better retries with tokio retry and timeout #1841

Merged
merged 2 commits into from
Dec 15, 2021

Conversation

jpculp
Copy link
Member

@jpculp jpculp commented Nov 24, 2021

Description of changes:

To mitigate things like closed connections, IMDSv2 token and data
fetches will continue to retry until success or until the timeout has
been reached.

The default timeout is 300s to match the ifup timeout in wicked, but can
be set manually using .with_timeout when creating a new ImdsClient.

Also moves the token fetch to just before the IMDS data fetch so
that building a new ImdsClient no longer needs an await.

refresh_token has been replaced by clear_token and write_token.

Testing done:
In addition to the unit testing...

  • Built aws-k8s-1.21 ami and launched instance.
  • Instance connected to eks cluster.
  • Connected to control container via ssm session.
  • Verified that host-containers.admin.user-data contained a base64-encoded block.
  • Connected to admin container via ssh.
  • Verified that /.bottlerocket/host-containers/admin/user-data contained JSON.
  • Ran sudo sheltie to verify root shell was still available.
  • Checked for failed systemd units.
  • Ran pluto with it's sub-commands to verify functionality.

Terms of contribution:

By submitting this pull request, I agree that this contribution is dual-licensed under the terms of both the Apache License, version 2.0, and the MIT license.

sources/imdsclient/src/lib.rs Outdated Show resolved Hide resolved
sources/imdsclient/src/lib.rs Outdated Show resolved Hide resolved
sources/imdsclient/src/lib.rs Outdated Show resolved Hide resolved
sources/imdsclient/src/lib.rs Outdated Show resolved Hide resolved
sources/imdsclient/src/lib.rs Show resolved Hide resolved
@jpculp
Copy link
Member Author

jpculp commented Dec 7, 2021

  • Reduced initial token retry duration from 1000ms to 500ms.
  • Reduced factor from 3 to 1.
  • Added comments to clarify where timings are coming from.

@jpculp
Copy link
Member Author

jpculp commented Dec 7, 2021

  • Fixed incorrect test. (Correct test timeout was reduced from 10s in the initial commit to 2s)

@jpculp jpculp requested a review from tjkirch December 7, 2021 00:19
sources/imdsclient/src/lib.rs Outdated Show resolved Hide resolved
sources/imdsclient/src/lib.rs Show resolved Hide resolved
sources/imdsclient/src/lib.rs Outdated Show resolved Hide resolved
sources/imdsclient/src/lib.rs Show resolved Hide resolved
@jpculp jpculp changed the title imdsclient: better session token retries with tokio retry and timeout imdsclient: better retries with tokio retry and timeout Dec 10, 2021
@jpculp
Copy link
Member Author

jpculp commented Dec 10, 2021

In this latest push rather than just using tokio retry and timeout on the token fetch, they are used for fetching IMDS data as well. This introduces things like RwLock to allow for the token to be refreshed within the timeout/retry logic.

In a second commit I've also moved the token fetch out of ImdsClient::new() so that it no longer requires an await, but we can move that to a separate PR if folks are concerned it is out of scope.

@jpculp jpculp requested a review from webern December 10, 2021 19:55
sources/imdsclient/src/lib.rs Outdated Show resolved Hide resolved
sources/imdsclient/src/lib.rs Outdated Show resolved Hide resolved
sources/imdsclient/src/lib.rs Show resolved Hide resolved
@jpculp
Copy link
Member Author

jpculp commented Dec 10, 2021

  • Adjusted logic to avoid triggering a retry on first run.
  • Added comment about clone in read_token.
  • Start backoff with 250ms for even quicker retries.

@jpculp jpculp requested a review from cbgbt December 10, 2021 23:34
sources/imdsclient/README.md Outdated Show resolved Hide resolved
}

async fn new_impl(imds_base_uri: String) -> Result<Self> {
fn new_impl(imds_base_uri: String) -> Result<Self> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're not allowing the user to override imds_base_uri anywhere, so this method probably doesn't need to take it as an argument.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new_impl is just for unit testing so that we can point the client to the server httptest creates.

sources/imdsclient/src/lib.rs Outdated Show resolved Hide resolved
sources/imdsclient/src/lib.rs Outdated Show resolved Hide resolved
sources/imdsclient/src/lib.rs Show resolved Hide resolved
pub enum Error {
#[snafu(display("Response '{}' from '{}': {}", get_status_code(source), uri, source))]
BadResponse { uri: String, source: reqwest::Error },

#[snafu(display("Fetched IMDSv2 session token"))]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this say something about the token being empty? Or am I reading this wrong?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was the token being fetched because it was empty, but actually this error isn't used anymore so I've removed it in this latest push.

@@ -422,6 +474,9 @@ mod error {
#[snafu(display("Response was not UTF-8: {}", source))]
NonUtf8Response { source: std::string::FromUtf8Error },

#[snafu(display("IMDSv2 session token was refreshed."))]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This message is a little confusing to me. The error is UnauthorizedTokenRefreshed, but the message says the token was refreshed. Maybe I'm missing some context here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My original line of thinking was the token was refreshed because the query returned a 401, but I went ahead and renamed this to TokenRefreshed.

#[snafu(display("Failed to fetch IMDSv2 session token"))]
FailedFetchToken,

#[snafu(display("Failed to read token within ImdsClient"))]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Since these are "internal" implementation detail errors, let's make the message a bit clearer. Something like "Failed to read token from RwLock..."? (I'm not sold on that wording in particular)

sources/api/early-boot-config/src/provider/aws.rs Outdated Show resolved Hide resolved
To mitigate things like closed connections, IMDSv2 token and data
fetches will continue to retry until success or until the timeout has
been reached.

The default timeout is 300s to match the ifup timeout in wicked, but can
be set manually using `.with_timeout` when creating a new ImdsClient.
@jpculp
Copy link
Member Author

jpculp commented Dec 13, 2021

  • Replaced Result<Self> with Self in ImdsClient::new() (and added Default for clippy).
  • Made the 300s retry_timeout a constant (RETRY_TIMEOUT_SECS).
  • Renamed UnauthorizedTokenRefreshed to TokenRefreshed to be less confusing.
  • Removed outdated comment about fetching the token on instantiation.
  • Other small fixes for increased clarity.

@jpculp jpculp requested review from cbgbt and zmrow December 13, 2021 22:21
@jpculp jpculp requested a review from webern December 13, 2021 22:21
This moves the token fetch to just before the IMDS data fetch so
that building a new ImdsClient no longer needs an await, nor does it
return a result.

`refresh_token` has been replaced by `clear_token` and `write_token`.
@jpculp
Copy link
Member Author

jpculp commented Dec 14, 2021

  • Implemented @bcressey's suggestion to clear the token on 401/Unauthorized.
  • Fixed some comment tweaks I missed in the previous push.

@jpculp jpculp requested a review from bcressey December 14, 2021 02:06
@jpculp jpculp merged commit 4db4b5a into bottlerocket-os:develop Dec 15, 2021
@jpculp jpculp deleted the imds-token-tokio-retry branch December 15, 2021 01:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants