-
Notifications
You must be signed in to change notification settings - Fork 15
Adapt retries to new PhysicalIO #340
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
ozkoca
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Added a few minor requests
...src/main/java/software/amazon/s3/analyticsaccelerator/common/telemetry/DefaultTelemetry.java
Show resolved
Hide resolved
common/src/main/java/software/amazon/s3/analyticsaccelerator/common/telemetry/Telemetry.java
Show resolved
Hide resolved
common/src/main/java/software/amazon/s3/analyticsaccelerator/common/telemetry/Telemetry.java
Show resolved
Hide resolved
| * @param durationInMillis Timeout duration for reading from storage | ||
| * @param retryCount Number of times to retry if Timeout Exceeds | ||
| */ | ||
| void timeout(long durationInMillis, int retryCount); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can be renamed sthg like setTimeoutPolicy()
ahmarsuhail
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @fuatbasik really like how simple this is becoming. left a few questions.
Also - Do we already have an ITest for the retry behaviour from a previous PR/or is there another test that covers this behaviour?
It would be good to test the number of test GET requests made when there is a timeout (maybe the gray failure ones cover this?). And also that when timeout is set in openStream information, that overrides the physicalIo configuration (sorry if these tests already exist and I've just missed them!)
| boolean shouldEvict = false; | ||
|
|
||
| // Check for IO errors while reading data | ||
| if (e instanceof IOException |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are we no longer evicting on a tiemout?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
retry should retry the timeout blocks. I am not sure if Timeout should be considered as a persistent change (like etag has changed) therefore, evicting already existing blocks sound wrong to me.
| } catch (IOException e) { | ||
| LOG.error("IOException while reading blocks", e); | ||
| setErrorOnBlocksAndRemove(blocks, e); | ||
| if (objectContent == null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not related to your PR, but in what scenario do we expect ObjectContent to be null?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually we dont. i think this is just and additional check to avoid NPE.
| provided = new DefaultRetryStrategyImpl(); | ||
| } | ||
|
|
||
| if (this.physicalIOConfiguration.getBlockReadTimeout() > 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok so the major change is that failsafe handles the timeout for us, if processReadTask() takes longer than X seconds for whatever reason, it gets retried. this is so much simpler now, nice :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
➕
| ArgumentCaptor<GetRequest> requestCaptor = ArgumentCaptor.forClass(GetRequest.class); | ||
| verify(objectClient, timeout(1_000).times(3)).getObject(requestCaptor.capture(), any()); | ||
|
|
||
| List<GetRequest> getRequestList = requestCaptor.getAllValues(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why did you need to update this test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in this test we know we will make 3 reads (2 5MB and 1 3MB) but we do not know their order due to async. behaviour. Therefore, updated the tests.
| openStreamInfo); | ||
|
|
||
| // Asserting there is timeout | ||
| assertThrows(IOException.class, overrideStream::read); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would be good to also assert that the cause of the IoException is Timeout
ahmarsuhail
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM
Description of change
This is adopting Retry logic to the new physical IO. Now, we are merging timeout and retries into a unified place.
With this change we are letting consumers also to pass Timeout setting as a part of Retry strategy. If passed, AAL will not use timeout values from PhysicalIO configuration.
I also removed timeouts from Telemetry as they are no longer relevant. Failsafe now manages retries and timeouts.
Relevant issues
N/A
Does this contribution introduce any breaking changes to the existing APIs or behaviors?
No.
Does this contribution introduce any new public APIs or behaviors?
Yes, Please see description Javadocs are updated accordingly.
How was the contribution tested?
Added new unit tests, and modified existing unit tests.
Does this contribution need a changelog entry?
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and I agree to the terms of the Developer Certificate of Origin (DCO).