[New Physical IO] Added memory manager changes to new physical IO#288
[New Physical IO] Added memory manager changes to new physical IO#288ozkoca merged 4 commits intoawslabs:physical-iofrom
Conversation
| this.data = data; | ||
| dataReadyLatch.countDown(); | ||
| this.indexCache.put(this.blockKey, this.blockKey.getRange().getLength()); | ||
| this.aggregatingMetrics.add(MetricKey.MEMORY_USAGE, data.length); |
There was a problem hiding this comment.
putting into the index cache and adding the metrics should be done before dataReadyLatch.countDown(), as per the current logic we do everything inside the data future
this.source.thenApply(
objectContent -> {
try {
byte[] bytes =
StreamUtils.toByteArray(
objectContent,
this.blockKey.getObjectKey(),
this.blockKey.getRange(),
this.readTimeout);
int blockRange = blockKey.getRange().getLength();
this.aggregatingMetrics.add(MetricKey.MEMORY_USAGE, blockRange);
this.indexCache.put(blockKey, blockRange);
return bytes;
} catch (IOException | TimeoutException e) {
throw new RuntimeException(
"Error while converting InputStream to byte array", e);
}
and the read starts after we join on the data.
This is likely to cause issue in the cleanup logic where we are doing
if (block.isDataReady() && !indexCache.contains(block.getBlockKey())) {
try {
iterator.remove();
BlockKey blockKey = block.getBlockKey();
aggregatingMetrics.reduce(MetricKey.MEMORY_USAGE, blockKey.getRange().getLength());
LOG.debug(
"Removed block with key {}-{}-{} from block store during cleanup",
blockKey.getObjectKey().getS3URI(),
blockKey.getRange().getStart(),
blockKey.getRange().getEnd());
} catch (Exception e) {
LOG.error("Error in removing block {}", e.getMessage());
}
}
There was a problem hiding this comment.
good catch. Addressed in next rev
...m/src/main/java/software/amazon/s3/analyticsaccelerator/io/physical/data/DataBlockStore.java
Outdated
Show resolved
Hide resolved
...m/src/main/java/software/amazon/s3/analyticsaccelerator/io/physical/data/DataBlockStore.java
Show resolved
Hide resolved
| if (block.isPresent()) { | ||
| aggregatingMetrics.add(MetricKey.CACHE_HIT, 1L); | ||
| } else { | ||
| aggregatingMetrics.add(MetricKey.CACHE_MISS, 1L); |
There was a problem hiding this comment.
If we are adding the metrics while finding the missing blocks, do we need to publish the metrics from here? I don't think this is required.
There was a problem hiding this comment.
This is required for positional read, but in next revision, I will remove direct block access for positional read and use makeRangeAvailable instead to be consistent.
...m/src/main/java/software/amazon/s3/analyticsaccelerator/io/physical/data/DataBlockStore.java
Outdated
Show resolved
Hide resolved
## Description of change This PR adds a new method optimizeReads to the RangeOptimiser class to improve read performance by intelligently grouping and splitting block indexes. The implementation reduces the complexity in DataBlockManager and makes the optimization logic more testable. Changes are: - Adds readAheadBytes logic - Adds sequential prefetching logic - Groups sequential block indexes together - Splits large sequential groups into smaller chunks based on configuration parameters - Refactored DataBlockManager to use the new method instead of implementing the logic itself - Added comprehensive unit tests for the new method Out of Scope - Range coalescing will be implemented in a separate PR #### Relevant issues PR History: #286 #287 #288 #### Does this contribution introduce any breaking changes to the existing APIs or behaviors? No #### Does this contribution introduce any new public APIs or behaviors? No --- By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and I agree to the terms of the [Developer Certificate of Origin (DCO)](https://developercertificate.org/).
## Description of change This PR merges the new PhysicalIO changes to the Blob object and start to use the new implementation. Next Steps: - Range coalescing implementation - Retry policy implementation #### Relevant issues PR History: #286 #287 #288 #289 #### Does this contribution introduce any breaking changes to the existing APIs or behaviors? No #### Does this contribution introduce any new public APIs or behaviors? No #### How was the contribution tested? Unit test #### Does this contribution need a changelog entry? n/A --- By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and I agree to the terms of the [Developer Certificate of Origin (DCO)](https://developercertificate.org/).
…slabs#288) ## Description of change This PR adopts the memory manager changes to new physicalIO/ #### Relevant issues PR History: - awslabs#286 - awslabs#287 #### Does this contribution introduce any breaking changes to the existing APIs or behaviors? No #### Does this contribution introduce any new public APIs or behaviors? No --- By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and I agree to the terms of the [Developer Certificate of Origin (DCO)](https://developercertificate.org/). --------- Co-authored-by: Erdogan Ozkoca <ozkoca@amazon.com>
## Description of change This PR adds a new method optimizeReads to the RangeOptimiser class to improve read performance by intelligently grouping and splitting block indexes. The implementation reduces the complexity in DataBlockManager and makes the optimization logic more testable. Changes are: - Adds readAheadBytes logic - Adds sequential prefetching logic - Groups sequential block indexes together - Splits large sequential groups into smaller chunks based on configuration parameters - Refactored DataBlockManager to use the new method instead of implementing the logic itself - Added comprehensive unit tests for the new method Out of Scope - Range coalescing will be implemented in a separate PR #### Relevant issues PR History: awslabs#286 awslabs#287 awslabs#288 #### Does this contribution introduce any breaking changes to the existing APIs or behaviors? No #### Does this contribution introduce any new public APIs or behaviors? No --- By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and I agree to the terms of the [Developer Certificate of Origin (DCO)](https://developercertificate.org/).
This PR merges the new PhysicalIO changes to the Blob object and start to use the new implementation. Next Steps: - Range coalescing implementation - Retry policy implementation PR History: awslabs#286 awslabs#287 awslabs#288 awslabs#289 existing APIs or behaviors? No No Unit test n/A --- By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and I agree to the terms of the [Developer Certificate of Origin (DCO)](https://developercertificate.org/).
…slabs#288) ## Description of change This PR adopts the memory manager changes to new physicalIO/ #### Relevant issues PR History: - awslabs#286 - awslabs#287 #### Does this contribution introduce any breaking changes to the existing APIs or behaviors? No #### Does this contribution introduce any new public APIs or behaviors? No --- By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and I agree to the terms of the [Developer Certificate of Origin (DCO)](https://developercertificate.org/). --------- Co-authored-by: Erdogan Ozkoca <ozkoca@amazon.com>
## Description of change This PR adds a new method optimizeReads to the RangeOptimiser class to improve read performance by intelligently grouping and splitting block indexes. The implementation reduces the complexity in DataBlockManager and makes the optimization logic more testable. Changes are: - Adds readAheadBytes logic - Adds sequential prefetching logic - Groups sequential block indexes together - Splits large sequential groups into smaller chunks based on configuration parameters - Refactored DataBlockManager to use the new method instead of implementing the logic itself - Added comprehensive unit tests for the new method Out of Scope - Range coalescing will be implemented in a separate PR #### Relevant issues PR History: awslabs#286 awslabs#287 awslabs#288 #### Does this contribution introduce any breaking changes to the existing APIs or behaviors? No #### Does this contribution introduce any new public APIs or behaviors? No --- By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and I agree to the terms of the [Developer Certificate of Origin (DCO)](https://developercertificate.org/).
This PR merges the new PhysicalIO changes to the Blob object and start to use the new implementation. Next Steps: - Range coalescing implementation - Retry policy implementation PR History: awslabs#286 awslabs#287 awslabs#288 awslabs#289 existing APIs or behaviors? No No Unit test n/A --- By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and I agree to the terms of the [Developer Certificate of Origin (DCO)](https://developercertificate.org/).
…slabs#288) ## Description of change This PR adopts the memory manager changes to new physicalIO/ #### Relevant issues PR History: - awslabs#286 - awslabs#287 #### Does this contribution introduce any breaking changes to the existing APIs or behaviors? No #### Does this contribution introduce any new public APIs or behaviors? No --- By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and I agree to the terms of the [Developer Certificate of Origin (DCO)](https://developercertificate.org/). --------- Co-authored-by: Erdogan Ozkoca <ozkoca@amazon.com>
## Description of change This PR adds a new method optimizeReads to the RangeOptimiser class to improve read performance by intelligently grouping and splitting block indexes. The implementation reduces the complexity in DataBlockManager and makes the optimization logic more testable. Changes are: - Adds readAheadBytes logic - Adds sequential prefetching logic - Groups sequential block indexes together - Splits large sequential groups into smaller chunks based on configuration parameters - Refactored DataBlockManager to use the new method instead of implementing the logic itself - Added comprehensive unit tests for the new method Out of Scope - Range coalescing will be implemented in a separate PR #### Relevant issues PR History: awslabs#286 awslabs#287 awslabs#288 #### Does this contribution introduce any breaking changes to the existing APIs or behaviors? No #### Does this contribution introduce any new public APIs or behaviors? No --- By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and I agree to the terms of the [Developer Certificate of Origin (DCO)](https://developercertificate.org/).
This PR merges the new PhysicalIO changes to the Blob object and start to use the new implementation. Next Steps: - Range coalescing implementation - Retry policy implementation PR History: awslabs#286 awslabs#287 awslabs#288 awslabs#289 existing APIs or behaviors? No No Unit test n/A --- By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and I agree to the terms of the [Developer Certificate of Origin (DCO)](https://developercertificate.org/).
This PR merges the new PhysicalIO changes to the Blob object and start to use the new implementation. Next Steps: - Range coalescing implementation - Retry policy implementation PR History: #286 #287 #288 #289 existing APIs or behaviors? No No Unit test n/A --- By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and I agree to the terms of the [Developer Certificate of Origin (DCO)](https://developercertificate.org/).
…slabs#288) ## Description of change This PR adopts the memory manager changes to new physicalIO/ #### Relevant issues PR History: - awslabs#286 - awslabs#287 #### Does this contribution introduce any breaking changes to the existing APIs or behaviors? No #### Does this contribution introduce any new public APIs or behaviors? No --- By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and I agree to the terms of the [Developer Certificate of Origin (DCO)](https://developercertificate.org/). --------- Co-authored-by: Erdogan Ozkoca <ozkoca@amazon.com>
## Description of change This PR adds a new method optimizeReads to the RangeOptimiser class to improve read performance by intelligently grouping and splitting block indexes. The implementation reduces the complexity in DataBlockManager and makes the optimization logic more testable. Changes are: - Adds readAheadBytes logic - Adds sequential prefetching logic - Groups sequential block indexes together - Splits large sequential groups into smaller chunks based on configuration parameters - Refactored DataBlockManager to use the new method instead of implementing the logic itself - Added comprehensive unit tests for the new method Out of Scope - Range coalescing will be implemented in a separate PR #### Relevant issues PR History: awslabs#286 awslabs#287 awslabs#288 #### Does this contribution introduce any breaking changes to the existing APIs or behaviors? No #### Does this contribution introduce any new public APIs or behaviors? No --- By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and I agree to the terms of the [Developer Certificate of Origin (DCO)](https://developercertificate.org/).
This PR merges the new PhysicalIO changes to the Blob object and start to use the new implementation. Next Steps: - Range coalescing implementation - Retry policy implementation PR History: awslabs#286 awslabs#287 awslabs#288 awslabs#289 existing APIs or behaviors? No No Unit test n/A --- By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and I agree to the terms of the [Developer Certificate of Origin (DCO)](https://developercertificate.org/).
…slabs#288) ## Description of change This PR adopts the memory manager changes to new physicalIO/ #### Relevant issues PR History: - awslabs#286 - awslabs#287 #### Does this contribution introduce any breaking changes to the existing APIs or behaviors? No #### Does this contribution introduce any new public APIs or behaviors? No --- By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and I agree to the terms of the [Developer Certificate of Origin (DCO)](https://developercertificate.org/). --------- Co-authored-by: Erdogan Ozkoca <ozkoca@amazon.com>
## Description of change This PR adds a new method optimizeReads to the RangeOptimiser class to improve read performance by intelligently grouping and splitting block indexes. The implementation reduces the complexity in DataBlockManager and makes the optimization logic more testable. Changes are: - Adds readAheadBytes logic - Adds sequential prefetching logic - Groups sequential block indexes together - Splits large sequential groups into smaller chunks based on configuration parameters - Refactored DataBlockManager to use the new method instead of implementing the logic itself - Added comprehensive unit tests for the new method Out of Scope - Range coalescing will be implemented in a separate PR #### Relevant issues PR History: awslabs#286 awslabs#287 awslabs#288 #### Does this contribution introduce any breaking changes to the existing APIs or behaviors? No #### Does this contribution introduce any new public APIs or behaviors? No --- By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and I agree to the terms of the [Developer Certificate of Origin (DCO)](https://developercertificate.org/).
This PR merges the new PhysicalIO changes to the Blob object and start to use the new implementation. Next Steps: - Range coalescing implementation - Retry policy implementation PR History: awslabs#286 awslabs#287 awslabs#288 awslabs#289 existing APIs or behaviors? No No Unit test n/A --- By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and I agree to the terms of the [Developer Certificate of Origin (DCO)](https://developercertificate.org/).
…slabs#288) ## Description of change This PR adopts the memory manager changes to new physicalIO/ #### Relevant issues PR History: - awslabs#286 - awslabs#287 #### Does this contribution introduce any breaking changes to the existing APIs or behaviors? No #### Does this contribution introduce any new public APIs or behaviors? No --- By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and I agree to the terms of the [Developer Certificate of Origin (DCO)](https://developercertificate.org/). --------- Co-authored-by: Erdogan Ozkoca <ozkoca@amazon.com>
## Description of change This PR adds a new method optimizeReads to the RangeOptimiser class to improve read performance by intelligently grouping and splitting block indexes. The implementation reduces the complexity in DataBlockManager and makes the optimization logic more testable. Changes are: - Adds readAheadBytes logic - Adds sequential prefetching logic - Groups sequential block indexes together - Splits large sequential groups into smaller chunks based on configuration parameters - Refactored DataBlockManager to use the new method instead of implementing the logic itself - Added comprehensive unit tests for the new method Out of Scope - Range coalescing will be implemented in a separate PR #### Relevant issues PR History: awslabs#286 awslabs#287 awslabs#288 #### Does this contribution introduce any breaking changes to the existing APIs or behaviors? No #### Does this contribution introduce any new public APIs or behaviors? No --- By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and I agree to the terms of the [Developer Certificate of Origin (DCO)](https://developercertificate.org/).
This PR merges the new PhysicalIO changes to the Blob object and start to use the new implementation. Next Steps: - Range coalescing implementation - Retry policy implementation PR History: awslabs#286 awslabs#287 awslabs#288 awslabs#289 existing APIs or behaviors? No No Unit test n/A --- By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and I agree to the terms of the [Developer Certificate of Origin (DCO)](https://developercertificate.org/).
## Description of change This PR adds the capability of retry to the new PhysicalIO #### Relevant issues #286 #287 #288 #289 #294 #316 #### Does this contribution introduce any breaking changes to the existing APIs or behaviors? No #### Does this contribution introduce any new public APIs or behaviors? No #### How was the contribution tested? Unit tests --- By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and I agree to the terms of the [Developer Certificate of Origin (DCO)](https://developercertificate.org/).
Added iostats get request callback to streamReader (awslabs#317) This PR moves IOStat callback method request from Block to StreamReader. Needs to be done as part of code rebase --- By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and I agree to the terms of the [Developer Certificate of Origin (DCO)](https://developercertificate.org/). Telemetry support for physical IO (awslabs#318) This PR adds telemetry measures for StreamReader and BlockManager. Introducing retry policy to new PhysicalIO (awslabs#320) This PR adds the capability of retry to the new PhysicalIO awslabs#286 awslabs#287 awslabs#288 awslabs#289 awslabs#294 awslabs#316 existing APIs or behaviors? No No Unit tests --- By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and I agree to the terms of the [Developer Certificate of Origin (DCO)](https://developercertificate.org/).
Added iostats get request callback to streamReader (awslabs#317) This PR moves IOStat callback method request from Block to StreamReader. Needs to be done as part of code rebase --- By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and I agree to the terms of the [Developer Certificate of Origin (DCO)](https://developercertificate.org/). Telemetry support for physical IO (awslabs#318) This PR adds telemetry measures for StreamReader and BlockManager. Introducing retry policy to new PhysicalIO (awslabs#320) This PR adds the capability of retry to the new PhysicalIO awslabs#286 awslabs#287 awslabs#288 awslabs#289 awslabs#294 awslabs#316 existing APIs or behaviors? No No Unit tests --- By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and I agree to the terms of the [Developer Certificate of Origin (DCO)](https://developercertificate.org/).
## Description of change This PR rebases ant integrates the changes in PR #321 #### Relevant issues #286 #287 #288 #289 #294 #316 #320 #### Does this contribution introduce any breaking changes to the existing APIs or behaviors? No #### Does this contribution introduce any new public APIs or behaviors? No #### How was the contribution tested? Unit tests --- By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and I agree to the terms of the [Developer Certificate of Origin (DCO)](https://developercertificate.org/).
## Description of change This PR changes default read buffer size to 128KB to have a better performance #### Relevant issues #286 #287 #288 #289 #294 #316 #320 #323 #### Does this contribution introduce any breaking changes to the existing APIs or behaviors? No #### Does this contribution introduce any new public APIs or behaviors? No #### How was the contribution tested? Unit tests #### Does this contribution need a changelog entry? N/A --- By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and I agree to the terms of the [Developer Certificate of Origin (DCO)](https://developercertificate.org/).
## Description of change This PR implements a new PhysicalIO design with key improvements: **Fixed Block Size:** Previously, block sizes varied based on request ranges, requiring entire request ranges to complete before blocks became available. The new design uses fixed-size blocks that become ready as soon as individual blocks are filled, enabling faster data access and better parallelization. **Direct Block Writing:** Eliminates an extra memory copy by writing S3 data directly into Block storage instead of copying from intermediate buffers, reducing memory overhead and CPU usage. **Improved Concurrency:** Fixed-size blocks allow multiple blocks to be processed independently, improving throughput for concurrent read operations. **Better Memory Management:** Predictable block sizes enable more efficient memory allocation and cache management strategies. **Enhanced Read Performance:** Blocks become available for reading as soon as they're filled, rather than waiting for entire request ranges to complete, reducing read latency. #### Relevant issues #286 #287 #288 #289 #294 #316 #320 #323 #324 #### Does this contribution introduce any breaking changes to the existing APIs or behaviors? No #### Does this contribution introduce any new public APIs or behaviors? No #### How was the contribution tested? Unit tests, microbenchmarks #### Does this contribution need a changelog entry? - [ ] I have updated the CHANGELOG or README if appropriate --- By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and I agree to the terms of the [Developer Certificate of Origin (DCO)](https://developercertificate.org/).
Description of change
This PR adopts the memory manager changes to new physicalIO/
Relevant issues
PR History:
Does this contribution introduce any breaking changes to the existing APIs or behaviors?
No
Does this contribution introduce any new public APIs or behaviors?
No
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and I agree to the terms of the Developer Certificate of Origin (DCO).