Adds test cases for readVectored()#284
Conversation
SanjayMarreddi
left a comment
There was a problem hiding this comment.
Thanks a lot for the PR!
I just want to see your thoughts on if we can move some of these tests to unit-tests as they test a single logical unit of functionality ( like validating the ranges )
| } | ||
|
|
||
| @Test | ||
| void testNullRangeList() throws IOException { |
There was a problem hiding this comment.
Should we make testNullRangeList a unit test at S3SeekableInputStream level as it just checks the arguments at the same level?
...rc/integrationTest/java/software/amazon/s3/analyticsaccelerator/access/ReadVectoredTest.java
Show resolved
Hide resolved
| import software.amazon.s3.analyticsaccelerator.S3SeekableInputStream; | ||
| import software.amazon.s3.analyticsaccelerator.common.ObjectRange; | ||
|
|
||
| public class ReadVectoredTest extends IntegrationTestBase { |
There was a problem hiding this comment.
Not related to this change, but can we make sure these tests are not using the default seekable stream configurations as they can cause out of memory errors, can you check read correctness or concurrency correctness tests I had addressed this issue in those.
...rc/integrationTest/java/software/amazon/s3/analyticsaccelerator/access/ReadVectoredTest.java
Show resolved
Hide resolved
...rc/integrationTest/java/software/amazon/s3/analyticsaccelerator/access/ReadVectoredTest.java
Show resolved
Hide resolved
...rc/integrationTest/java/software/amazon/s3/analyticsaccelerator/access/ReadVectoredTest.java
Show resolved
Hide resolved
| import software.amazon.s3.analyticsaccelerator.S3SeekableInputStream; | ||
| import software.amazon.s3.analyticsaccelerator.common.ObjectRange; | ||
|
|
||
| public class ReadVectoredTest extends IntegrationTestBase { |
There was a problem hiding this comment.
It might be good to consider certain test cases that will test known underlying interactions here. For example, test cases where ranges will span multiple blocks or where multiple ranges are accessing to same block.
There was a problem hiding this comment.
I feel like we really need to think critically about test cases for read vectored including example i gave before. We should consider different failure modes in detail and make sure that we are protected against them.
fuatbasik
left a comment
There was a problem hiding this comment.
Thanks @ahmarsuhail please see my comments inline.
f83ba39 to
a55f484
Compare
| private boolean shouldExtendRequest(ReadMode readMode) { | ||
| // Do not apply sequential prefetching when the read is coming from parquet prefetcher or the | ||
| // readVectored(), | ||
| // in this case we know exactly the ranges we want, and don't want to extend them further. | ||
| if (readMode == ReadMode.READ_VECTORED | ||
| || readMode == ReadMode.REMAINING_COLUMN_PREFETCH | ||
| || readMode == ReadMode.COLUMN_PREFETCH | ||
| || readMode == ReadMode.DICTIONARY_PREFETCH | ||
| || readMode == ReadMode.PREFETCH_TAIL) { | ||
| return false; | ||
| } | ||
|
|
||
| return true; | ||
| } |
There was a problem hiding this comment.
May be a nit, but I think this logic can be simplified if we update our ReadMode enum class to something like below: ( The boolean values assigned may be wrong, but syntax should be like that ). Then in this blockManager, we could just use readMode.allowsSequentialPrefetch() value and remove the above method. In the future, if we have more readModes, we dont have to make changes in both enum class and this method.
public enum ReadMode {
SYNC(true),
ASYNC(true),
SMALL_OBJECT_PREFETCH(true),
SEQUENTIAL_FILE_PREFETCH(true),
READ_VECTORED(false),
REMAINING_COLUMN_PREFETCH(false),
COLUMN_PREFETCH(false),
DICTIONARY_PREFETCH(false),
PREFETCH_TAIL(false),
private final boolean allowsSequentialPrefetch;
ReadMode(boolean allowsSequentialPrefetch) {
this.allowsSequentialPrefetch = allowsSequentialPrefetch;
}
public boolean allowsSequentialPrefetch() {
return allowsSequentialPrefetch;
}
}
There was a problem hiding this comment.
thanks this is a good suggestion :) will do
| IOPlan ioPlan = new IOPlan(ranges); | ||
| // Create a non-empty IOPlan only if we have a valid range to work with | ||
| physicalIO.execute(ioPlan); | ||
| physicalIO.execute(ioPlan, ReadMode.PREFETCH_TAIL); |
There was a problem hiding this comment.
Can a single IOPlan have multiple ranges with different readmodes? If yes, this would not handle that case. If now, should we make ReadMode a property of IOPlan?
There was a problem hiding this comment.
no, single IoPlan should not have multiple readModes. so this is ok here
fuatbasik
left a comment
There was a problem hiding this comment.
thanks a lot @ahmarsuhail . I put 2-3 comments. I have a question about concurrency in one of my comments if you think that is OK my concern is not valid, i am happy to approve
| * | ||
| * @param objectRanges Vectored ranges to fetch | ||
| */ | ||
| private void makeReadVectoredRangesAvailable(List<ObjectRange> objectRanges) { |
There was a problem hiding this comment.
I am concerned about concurrency here. The scenario I have in mind 2 streams against the same S3 key calling readvectored concurrently with overlapping ranges. Would that be a problem?
fuatbasik
left a comment
There was a problem hiding this comment.
@ahmarsuhail thanks a lot! LGTM.
Description of change
Adds in more test cases for readVectored()
Relevant issues
Does this contribution introduce any breaking changes to the existing APIs or behaviors?
Does this contribution introduce any new public APIs or behaviors?
How was the contribution tested?
Does this contribution need a changelog entry?
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and I agree to the terms of the Developer Certificate of Origin (DCO).