-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CSHARP-5017: Retry KMS requests on transient errors #1541
base: main
Are you sure you want to change the base?
Conversation
} | ||
} | ||
catch (Exception ex) when (ex is IOException or SocketException) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure here which other kind of network-errors related exceptions we could get
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I saw some comment on the Drivers ticket for this feature that it only pertains to HTTP errors so for socket-level errors do we also retry on those? Other comments in the ticket seems to suggest we'll need to tell libmongocrypt to reset its state if we encounter socket errors while reading a kms response. Actually are http errors thrown as socket or IOException errors?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think that comment is a little bit misleading. So, as far as I have understood the situation is like this:
- The HTTP error are actually handled internally by libmongocrypt when we "feed" the response bytes to it. It reads the response and understands if it's an HTTP response with an error code.
- The network errors instead need to be recognized by us. When we do, we need to call
mongocrypt_kms_ctx_fail
(that is used inside the newFail
method) to notify libmongocrypt of the network error. - Internally, the handling for both errors is the same, as it retries the request a certain number of times
This PR is still in a rough-ish state, and it's here mostly to get feedback on the approach used.
|
@@ -210,6 +210,13 @@ static Library() | |||
() => __loader.Value.GetFunction<Delegates.mongocrypt_ctx_destroy>(("mongocrypt_ctx_destroy")), true); | |||
_mongocrypt_kms_ctx_get_kms_provider = new Lazy<Delegates.mongocrypt_kms_ctx_get_kms_provider>( | |||
() => __loader.Value.GetFunction<Delegates.mongocrypt_kms_ctx_get_kms_provider>(("mongocrypt_kms_ctx_get_kms_provider")), true); | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For these and the following methods/variables, is there a specific ordering? It does not seem to be completely alphabetical so for now I've put the new things on the bottom
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There isn't a specific order for the methods, but many of them are grouped together. For instance, you'll notice that most of the setopt methods are clustered. For now, I recommend placing the mongocrypt_setopt_retry_kms method alongside the other setopt methods and leaving the other new methods at the bottom. You can add a TODO comment above the static constructor to organize the methods later (so you don't have unrelated changes to your PR). I'll handle that when addressing technical debt in Libmongocrypt later this quarter.
@@ -170,6 +170,13 @@ public KmsRequestCollection GetKmsMessageRequests() | |||
return new KmsRequestCollection(requests, this); | |||
} | |||
|
|||
//TODO I think we should remove the previous method and use this | |||
public KmsRequest GetNextKmsMessageRequest() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done :)
|
||
if (failureType == "network") | ||
{ | ||
await SetFailure("network", 4); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor: could probably just use failureType
since we know it is network
after the if statement
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about it but I would prefer to keep it this way to make it easier to read
{ | ||
{ "region", "foo" }, | ||
{ "key", "bar" }, | ||
{ "endpoint", "127.0.0.1:9003" } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor: replace "127.0.0.1:9003" here and other uses below with endpoint
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely
@adelinowona I've fixed also the sync part of the test and done some cleanup. |
No description provided.