Adds test cases for readVectored()#284

Merged

ahmarsuhail merged 9 commits intoawslabs:mainfrom

ahmarsuhail:read-vectored-tests

Jun 23, 2025

Collaborator

ahmarsuhail commented Jun 5, 2025

Description of change

Adds in more test cases for readVectored()

Relevant issues

Does this contribution introduce any breaking changes to the existing APIs or behaviors?

Does this contribution introduce any new public APIs or behaviors?

How was the contribution tested?

Does this contribution need a changelog entry?

I have updated the CHANGELOG or README if appropriate

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and I agree to the terms of the Developer Certificate of Origin (DCO).

ahmarsuhail had a problem deploying to integration-tests

June 5, 2025 10:12

— with

GitHub Actions Failure

ahmarsuhail had a problem deploying to integration-tests

June 5, 2025 10:14

— with

GitHub Actions Failure

SanjayMarreddi reviewed

View reviewed changes

Contributor

SanjayMarreddi left a comment

Thanks a lot for the PR!

I just want to see your thoughts on if we can move some of these tests to unit-tests as they test a single logical unit of functionality ( like validating the ranges )

...rc/integrationTest/java/software/amazon/s3/analyticsaccelerator/access/ReadVectoredTest.java Outdated

+                }
+                @Test
+                void testNullRangeList() throws IOException {

Contributor

SanjayMarreddi Jun 5, 2025

Should we make testNullRangeList a unit test at S3SeekableInputStream level as it just checks the arguments at the same level?

Collaborator Author

ahmarsuhail Jun 12, 2025

done, thanks

SanjayMarreddi reviewed

View reviewed changes

...rc/integrationTest/java/software/amazon/s3/analyticsaccelerator/access/ReadVectoredTest.java Show resolved Hide resolved

rajdchak reviewed

View reviewed changes

...rc/integrationTest/java/software/amazon/s3/analyticsaccelerator/access/ReadVectoredTest.java

+              import software.amazon.s3.analyticsaccelerator.S3SeekableInputStream;
+              import software.amazon.s3.analyticsaccelerator.common.ObjectRange;
               public class ReadVectoredTest extends IntegrationTestBase {

Contributor

rajdchak Jun 5, 2025

Not related to this change, but can we make sure these tests are not using the default seekable stream configurations as they can cause out of memory errors, can you check read correctness or concurrency correctness tests I had addressed this issue in those.

Collaborator Author

ahmarsuhail Jun 13, 2025

done, thanks!

fuatbasik reviewed

View reviewed changes

...rc/integrationTest/java/software/amazon/s3/analyticsaccelerator/access/ReadVectoredTest.java Show resolved Hide resolved

...rc/integrationTest/java/software/amazon/s3/analyticsaccelerator/access/ReadVectoredTest.java Show resolved Hide resolved

...rc/integrationTest/java/software/amazon/s3/analyticsaccelerator/access/ReadVectoredTest.java Show resolved Hide resolved

fuatbasik reviewed

View reviewed changes

...rc/integrationTest/java/software/amazon/s3/analyticsaccelerator/access/ReadVectoredTest.java

+              import software.amazon.s3.analyticsaccelerator.S3SeekableInputStream;
+              import software.amazon.s3.analyticsaccelerator.common.ObjectRange;
               public class ReadVectoredTest extends IntegrationTestBase {

Collaborator

fuatbasik Jun 11, 2025

It might be good to consider certain test cases that will test known underlying interactions here. For example, test cases where ranges will span multiple blocks or where multiple ranges are accessing to same block.

Collaborator

fuatbasik Jun 11, 2025

I feel like we really need to think critically about test cases for read vectored including example i gave before. We should consider different failure modes in detail and make sure that we are protected against them.

fuatbasik requested changes

View reviewed changes

Collaborator

fuatbasik left a comment

Thanks @ahmarsuhail please see my comments inline.

ahmarsuhail had a problem deploying to integration-tests

June 13, 2025 13:10

— with

GitHub Actions Failure

ahmarsuhail had a problem deploying to integration-tests

June 13, 2025 15:45

— with

GitHub Actions Failure

ahmarsuhail had a problem deploying to integration-tests

June 13, 2025 16:34

— with

GitHub Actions Failure

ahmarsuhail had a problem deploying to integration-tests

June 13, 2025 16:38

— with

GitHub Actions Failure

ahmarsuhail added 6 commits

June 13, 2025 17:45


          adds test cases for readVectored()

e040892


          typo

866b6ae


          adds in additional tests

bc98337


          adds in readModes

b5d1624


          block manager tests

83f9fa2


          rename sequential read mode

a55f484

ahmarsuhail force-pushed the read-vectored-tests branch from f83ba39 to a55f484 Compare

June 13, 2025 16:49

ahmarsuhail temporarily deployed to integration-tests

June 13, 2025 16:49

— with

GitHub Actions Inactive

ahmarsuhail mentioned this pull request

Improve GET request annotation #195

Closed

1 task

SanjayMarreddi reviewed

View reviewed changes

...eam/src/main/java/software/amazon/s3/analyticsaccelerator/io/physical/data/BlockManager.java Outdated

Comment on lines +234 to +247

+                private boolean shouldExtendRequest(ReadMode readMode) {
+                  // Do not apply sequential prefetching when the read is coming from parquet prefetcher or the
+                  // readVectored(),
+                  // in this case we know exactly the ranges we want, and don't want to extend them further.
+                  if (readMode == ReadMode.READ_VECTORED
+                      || readMode == ReadMode.REMAINING_COLUMN_PREFETCH
+                      || readMode == ReadMode.COLUMN_PREFETCH
+                      || readMode == ReadMode.DICTIONARY_PREFETCH
+                      || readMode == ReadMode.PREFETCH_TAIL) {
+                    return false;
+                  }
+                  return true;
+                }

Contributor

SanjayMarreddi Jun 17, 2025

May be a nit, but I think this logic can be simplified if we update our ReadMode enum class to something like below: ( The boolean values assigned may be wrong, but syntax should be like that ). Then in this blockManager, we could just use readMode.allowsSequentialPrefetch() value and remove the above method. In the future, if we have more readModes, we dont have to make changes in both enum class and this method.

public enum ReadMode {
  SYNC(true),
  ASYNC(true),
  SMALL_OBJECT_PREFETCH(true),
  SEQUENTIAL_FILE_PREFETCH(true),
  READ_VECTORED(false),
  REMAINING_COLUMN_PREFETCH(false),
  COLUMN_PREFETCH(false),
  DICTIONARY_PREFETCH(false),
  PREFETCH_TAIL(false),

  private final boolean allowsSequentialPrefetch;

  ReadMode(boolean allowsSequentialPrefetch) {
    this.allowsSequentialPrefetch = allowsSequentialPrefetch;
  }

  public boolean allowsSequentialPrefetch() {
    return allowsSequentialPrefetch;
  }
}

Collaborator Author

ahmarsuhail Jun 17, 2025

thanks this is a good suggestion :) will do

fuatbasik reviewed

View reviewed changes

...java/software/amazon/s3/analyticsaccelerator/io/logical/parquet/ParquetPrefetchTailTask.java

                           IOPlan ioPlan = new IOPlan(ranges);
                           // Create a non-empty IOPlan only if we have a valid range to work with
-                          physicalIO.execute(ioPlan);
+                          physicalIO.execute(ioPlan, ReadMode.PREFETCH_TAIL);

Collaborator

fuatbasik Jun 19, 2025

Can a single IOPlan have multiple ranges with different readmodes? If yes, this would not handle that case. If now, should we make ReadMode a property of IOPlan?

Collaborator Author

ahmarsuhail Jun 19, 2025

no, single IoPlan should not have multiple readModes. so this is ok here

fuatbasik reviewed

View reviewed changes

Collaborator

fuatbasik left a comment

thanks a lot @ahmarsuhail . I put 2-3 comments. I have a question about concurrency in one of my comments if you think that is OK my concern is not valid, i am happy to approve

...m/src/main/java/software/amazon/s3/analyticsaccelerator/io/physical/impl/PhysicalIOImpl.java

+                 *
+                 * @param objectRanges Vectored ranges to fetch
+                 */
+                private void makeReadVectoredRangesAvailable(List<ObjectRange> objectRanges) {

Collaborator

fuatbasik Jun 19, 2025

I am concerned about concurrency here. The scenario I have in mind 2 streams against the same S3 key calling readvectored concurrently with overlapping ranges. Would that be a problem?


          review comments

cc7f477

ahmarsuhail had a problem deploying to integration-tests

June 23, 2025 09:36

— with

GitHub Actions Failure


          readVectored concurrent tests

adfe314

ahmarsuhail had a problem deploying to integration-tests

June 23, 2025 10:29

— with

GitHub Actions Failure


          spotlessApply

ahmarsuhail had a problem deploying to integration-tests

June 23, 2025 12:42

— with

GitHub Actions Failure

fuatbasik approved these changes

View reviewed changes

Collaborator

fuatbasik left a comment

@ahmarsuhail thanks a lot! LGTM.

ahmarsuhail merged commit bbc4aef into awslabs:main

2 of 3 checks passed

ozkoca pushed a commit that referenced this pull request


          Adds test cases for readVectored() (#284)

859da4e

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet