Skip to content

Adds test cases for readVectored()#284

Merged
ahmarsuhail merged 9 commits intoawslabs:mainfrom
ahmarsuhail:read-vectored-tests
Jun 23, 2025
Merged

Adds test cases for readVectored()#284
ahmarsuhail merged 9 commits intoawslabs:mainfrom
ahmarsuhail:read-vectored-tests

Conversation

@ahmarsuhail
Copy link
Collaborator

Description of change

Adds in more test cases for readVectored()

Relevant issues

Does this contribution introduce any breaking changes to the existing APIs or behaviors?

Does this contribution introduce any new public APIs or behaviors?

How was the contribution tested?

Does this contribution need a changelog entry?

  • I have updated the CHANGELOG or README if appropriate

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and I agree to the terms of the Developer Certificate of Origin (DCO).

Copy link
Contributor

@SanjayMarreddi SanjayMarreddi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for the PR!

I just want to see your thoughts on if we can move some of these tests to unit-tests as they test a single logical unit of functionality ( like validating the ranges )

}

@Test
void testNullRangeList() throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we make testNullRangeList a unit test at S3SeekableInputStream level as it just checks the arguments at the same level?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, thanks

import software.amazon.s3.analyticsaccelerator.S3SeekableInputStream;
import software.amazon.s3.analyticsaccelerator.common.ObjectRange;

public class ReadVectoredTest extends IntegrationTestBase {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not related to this change, but can we make sure these tests are not using the default seekable stream configurations as they can cause out of memory errors, can you check read correctness or concurrency correctness tests I had addressed this issue in those.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, thanks!

import software.amazon.s3.analyticsaccelerator.S3SeekableInputStream;
import software.amazon.s3.analyticsaccelerator.common.ObjectRange;

public class ReadVectoredTest extends IntegrationTestBase {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be good to consider certain test cases that will test known underlying interactions here. For example, test cases where ranges will span multiple blocks or where multiple ranges are accessing to same block.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like we really need to think critically about test cases for read vectored including example i gave before. We should consider different failure modes in detail and make sure that we are protected against them.

Copy link
Collaborator

@fuatbasik fuatbasik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ahmarsuhail please see my comments inline.

@ahmarsuhail ahmarsuhail force-pushed the read-vectored-tests branch from f83ba39 to a55f484 Compare June 13, 2025 16:49
@ahmarsuhail ahmarsuhail temporarily deployed to integration-tests June 13, 2025 16:49 — with GitHub Actions Inactive
@ahmarsuhail ahmarsuhail mentioned this pull request Jun 13, 2025
1 task
Comment on lines +234 to +247
private boolean shouldExtendRequest(ReadMode readMode) {
// Do not apply sequential prefetching when the read is coming from parquet prefetcher or the
// readVectored(),
// in this case we know exactly the ranges we want, and don't want to extend them further.
if (readMode == ReadMode.READ_VECTORED
|| readMode == ReadMode.REMAINING_COLUMN_PREFETCH
|| readMode == ReadMode.COLUMN_PREFETCH
|| readMode == ReadMode.DICTIONARY_PREFETCH
|| readMode == ReadMode.PREFETCH_TAIL) {
return false;
}

return true;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May be a nit, but I think this logic can be simplified if we update our ReadMode enum class to something like below: ( The boolean values assigned may be wrong, but syntax should be like that ). Then in this blockManager, we could just use readMode.allowsSequentialPrefetch() value and remove the above method. In the future, if we have more readModes, we dont have to make changes in both enum class and this method.

public enum ReadMode {
  SYNC(true),
  ASYNC(true),
  SMALL_OBJECT_PREFETCH(true),
  SEQUENTIAL_FILE_PREFETCH(true),
  READ_VECTORED(false),
  REMAINING_COLUMN_PREFETCH(false),
  COLUMN_PREFETCH(false),
  DICTIONARY_PREFETCH(false),
  PREFETCH_TAIL(false),

  private final boolean allowsSequentialPrefetch;

  ReadMode(boolean allowsSequentialPrefetch) {
    this.allowsSequentialPrefetch = allowsSequentialPrefetch;
  }

  public boolean allowsSequentialPrefetch() {
    return allowsSequentialPrefetch;
  }
}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks this is a good suggestion :) will do

IOPlan ioPlan = new IOPlan(ranges);
// Create a non-empty IOPlan only if we have a valid range to work with
physicalIO.execute(ioPlan);
physicalIO.execute(ioPlan, ReadMode.PREFETCH_TAIL);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can a single IOPlan have multiple ranges with different readmodes? If yes, this would not handle that case. If now, should we make ReadMode a property of IOPlan?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, single IoPlan should not have multiple readModes. so this is ok here

Copy link
Collaborator

@fuatbasik fuatbasik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks a lot @ahmarsuhail . I put 2-3 comments. I have a question about concurrency in one of my comments if you think that is OK my concern is not valid, i am happy to approve

*
* @param objectRanges Vectored ranges to fetch
*/
private void makeReadVectoredRangesAvailable(List<ObjectRange> objectRanges) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am concerned about concurrency here. The scenario I have in mind 2 streams against the same S3 key calling readvectored concurrently with overlapping ranges. Would that be a problem?

Copy link
Collaborator

@fuatbasik fuatbasik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ahmarsuhail thanks a lot! LGTM.

@ahmarsuhail ahmarsuhail merged commit bbc4aef into awslabs:main Jun 23, 2025
2 of 3 checks passed
ozkoca pushed a commit that referenced this pull request Jun 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants