Add request coalescing and relevant configuration#356
Add request coalescing and relevant configuration#356fuatbasik merged 4 commits intoawslabs:mainfrom
Conversation
stubz151
left a comment
There was a problem hiding this comment.
ty this awesome, just some smallish comments.
|
|
||
| @Override | ||
| public String toString() { | ||
| return String.format("offset: %d, length: %d", offset, length); |
There was a problem hiding this comment.
what do u think about adding some info about the future as well? like is there any id? or can even add it's current state? it might come in handy down the line.
There was a problem hiding this comment.
Thanks, i am now adding future's to-string here
| @@ -264,6 +264,45 @@ void testGetRequiredStringThrowsIfNotSet() { | |||
| IllegalArgumentException.class, () -> configuration.getRequiredString("stringConfig1")); | |||
| } | |||
|
|
|||
There was a problem hiding this comment.
Can we squash some of these tests down using parameterized tests?
| /** Flag to enable request Coalescing */ | ||
| @Builder.Default private boolean requestCoalesce = DEFAULT_COALESCE_REQUEST; | ||
|
|
||
| private static final String REQUEST_COALESCE_KEY = "request.coalesce"; |
There was a problem hiding this comment.
why add it as a config? when would we not want to do this?
| currentRange = nextRange; | ||
| } | ||
| } | ||
| coalescedRanges.add(currentRange); |
There was a problem hiding this comment.
public void coalesce(long tolerance) {
if (this.prefetchRanges.size() < 2) {
return;
}
// Ensure ranges are ordered by their start position
Collections.sort(this.prefetchRanges);
int writeIndex = 0;
Range currentRange = this.prefetchRanges.get(0);
for (int i = 1; i < this.prefetchRanges.size(); i++) {
Range nextRange = this.prefetchRanges.get(i);
if (currentRange.getEnd() + tolerance >= nextRange.getStart()) {
// Merge ranges
currentRange = new Range(currentRange.getStart(),
Math.max(currentRange.getEnd(), nextRange.getEnd()));
} else {
// Store the completed merged range
this.prefetchRanges.set(writeIndex++, currentRange);
currentRange = nextRange;
}
}
this.prefetchRanges.set(writeIndex++, currentRange);
//test this a lil.
while (this.prefetchRanges.size() > writeIndex) {
this.prefetchRanges.remove(this.prefetchRanges.size() - 1);
}
}
What do you think of this? you don't need a second list to keep track of it then which I like. I also find it a bit easier to follow.
There was a problem hiding this comment.
Thanks. I updated this part in rev2
ahmarsuhail
left a comment
There was a problem hiding this comment.
Looks good, just think we need to add some additional logic in that execute function
| @@ -391,12 +404,12 @@ protected void testReadVectoredInSingleBlock( | |||
| protected void testReadVectoredForSequentialRanges( | |||
There was a problem hiding this comment.
i see that this tests range coalescing, but it would be good to add a range coalescing test as well, which will test ranges are within that 1MB distance. eg: [0-4MB, 4.3MB - 5MB, 5.9MB - 7MB].
There was a problem hiding this comment.
Thanks @ahmarsuhail. Do test under IOPlanTest cover your concern. I was trying to put corner cases for actual coalesing business logic there instead of here.
...m/src/main/java/software/amazon/s3/analyticsaccelerator/io/physical/impl/PhysicalIOImpl.java
Outdated
Show resolved
Hide resolved
input-stream/src/main/java/software/amazon/s3/analyticsaccelerator/io/physical/plan/IOPlan.java
Show resolved
Hide resolved
c6cf086 to
c6960a7
Compare
| /** Flag to enable request Coalescing */ | ||
| @Builder.Default private boolean requestCoalesce = DEFAULT_COALESCE_REQUEST; | ||
|
|
||
| private static final String REQUEST_COALESCE_KEY = "request.coalesce"; |
There was a problem hiding this comment.
nit: We may rename like request.coalesce.enabled
...m/src/main/java/software/amazon/s3/analyticsaccelerator/io/physical/impl/PhysicalIOImpl.java
Outdated
Show resolved
Hide resolved
d3fa0ba to
19c1be7
Compare
common/src/main/java/software/amazon/s3/analyticsaccelerator/request/ReadMode.java
Outdated
Show resolved
Hide resolved
common/src/main/java/software/amazon/s3/analyticsaccelerator/request/ReadMode.java
Outdated
Show resolved
Hide resolved
ahmarsuhail
left a comment
There was a problem hiding this comment.
typo in the coalesce method, using the wrong variable
### What changes were proposed in this pull request? This PR aims to upgrade `analyticsaccelerator-s3` to 1.3.1 for Apache Spark 4.2.0 in line with Apache Hadoop 3.4.3 (HADOOP-19742). - apache/hadoop#8093 ### Why are the changes needed? To bring the latest fixes. - https://github.com/awslabs/analytics-accelerator-s3/releases/tag/v1.3.1 - awslabs/analytics-accelerator-s3#360 - awslabs/analytics-accelerator-s3#361 - awslabs/analytics-accelerator-s3#363 - awslabs/analytics-accelerator-s3#356 - awslabs/analytics-accelerator-s3#358 ### Does this PR introduce _any_ user-facing change? No behavior change. ### How was this patch tested? Pass the CIs. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #54031 from dongjoon-hyun/SPARK-55254. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
Description of change
With this change, AAL can coalesce requests in a close proximity to each other.
This enables to take spatial locality
into account and reduce request count. In my testing I observed that this is particularly important for vectored reads
reading vectors near each other.
There are two configurations to manage this feature. First one is to enable disable request coalescing:
request.coalescewhich is by default set totrue.Second one is
request.coalesce.tolerance, which is the number of bytes to tolerate (read even if not immediately needed) when merging ranges.Two ranges will be merged if end of the first range is less than
toleratebytes away from the second one.Relevant issues
N/A
Does this contribution introduce any breaking changes to the existing APIs or behaviors?
No. It changes the behaviour of vectored reads and merges smaller requests to bigger requests.
These bigger requests might be split to smaller requests later to make sure all requests are around
target.request.sizeDoes this contribution introduce any new public APIs or behaviors?
How was the contribution tested?
Added new unit tests. Extended existing IntegrationTests to run for both coalescing enabled and disabled.
Does this contribution need a changelog entry?
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and I agree to the terms of the Developer Certificate of Origin (DCO).