Closed
Conversation
…alIO (#287) ## Description of change - This change adopts the changes from [PR](#283) to the new Physical IO implementation. - Updates comment in DataBlock object #### Relevant issues [OpenStremInformation PR](#283) [Initial version of Physical IO](#286) --- By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and I agree to the terms of the [Developer Certificate of Origin (DCO)](https://developercertificate.org/). --------- Co-authored-by: Erdogan Ozkoca <ozkoca@amazon.com>
## Description of change This PR adopts the memory manager changes to new physicalIO/ #### Relevant issues PR History: - #286 - #287 #### Does this contribution introduce any breaking changes to the existing APIs or behaviors? No #### Does this contribution introduce any new public APIs or behaviors? No --- By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and I agree to the terms of the [Developer Certificate of Origin (DCO)](https://developercertificate.org/). --------- Co-authored-by: Erdogan Ozkoca <ozkoca@amazon.com>
## Description of change This PR adds a new method optimizeReads to the RangeOptimiser class to improve read performance by intelligently grouping and splitting block indexes. The implementation reduces the complexity in DataBlockManager and makes the optimization logic more testable. Changes are: - Adds readAheadBytes logic - Adds sequential prefetching logic - Groups sequential block indexes together - Splits large sequential groups into smaller chunks based on configuration parameters - Refactored DataBlockManager to use the new method instead of implementing the logic itself - Added comprehensive unit tests for the new method Out of Scope - Range coalescing will be implemented in a separate PR #### Relevant issues PR History: #286 #287 #288 #### Does this contribution introduce any breaking changes to the existing APIs or behaviors? No #### Does this contribution introduce any new public APIs or behaviors? No --- By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and I agree to the terms of the [Developer Certificate of Origin (DCO)](https://developercertificate.org/).
## Description of change This PR merges the new PhysicalIO changes to the Blob object and start to use the new implementation. Next Steps: - Range coalescing implementation - Retry policy implementation #### Relevant issues PR History: #286 #287 #288 #289 #### Does this contribution introduce any breaking changes to the existing APIs or behaviors? No #### Does this contribution introduce any new public APIs or behaviors? No #### How was the contribution tested? Unit test #### Does this contribution need a changelog entry? n/A --- By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and I agree to the terms of the [Developer Certificate of Origin (DCO)](https://developercertificate.org/).
## Description of change <!-- Thank you for submitting a pull request!--> <!-- Please describe your contribution here. What and why? --> <!-- Please ensure your commit messages follow these [guidelines](https://chris.beams.io/posts/git-commit/). --> Support passing of sse_c customer key to pass this to s3 for encryption/decryption. S3A currently has this customer key as Optional String https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/auth/delegation/EncryptionSecretOperations.java#L41 while Iceberg has this key as a String https://github.com/apache/iceberg/blob/f9cc62eb0d98e360b452a3ab8fdc6efdc4969f6e/aws/src/main/java/org/apache/iceberg/aws/s3/S3FileIOProperties.java#L499. So decided to accept this key as Optional String. #### Relevant issues <!-- Please add issue numbers. --> <!-- Please also link them to this PR. --> #### Does this contribution introduce any breaking changes to the existing APIs or behaviors? <!-- Please explain why this was necessary. --> #### Does this contribution introduce any new public APIs or behaviors? <!-- Please describe them and explain what scenarios they target. --> #### How was the contribution tested? <!-- Please describe how this contribution was tested. --> #### Does this contribution need a changelog entry? - [ ] I have updated the CHANGELOG or README if appropriate --- By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and I agree to the terms of the [Developer Certificate of Origin (DCO)](https://developercertificate.org/).
## Description of change This PR modifies the GitHub Actions workflow to run integration tests automatically after PR merge, removing the manual approval requirement for post-merge cases. <!-- Thank you for submitting a pull request!--> <!-- Please describe your contribution here. What and why? --> <!-- Please ensure your commit messages follow these [guidelines](https://chris.beams.io/posts/git-commit/). --> #### Does this contribution introduce any breaking changes to the existing APIs or behaviors? No #### Does this contribution introduce any new public APIs or behaviors? No #### Does this contribution need a changelog entry? No --- By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and I agree to the terms of the [Developer Certificate of Origin (DCO)](https://developercertificate.org/).
## Description of change We want to support `readFully` as a part of our ongoing effort to integrate with Iceberg S3FileIO by default. [RangeReadable in Iceberg](https://github.com/apache/iceberg/blob/main/api/src/main/java/org/apache/iceberg/io/RangeReadable.java#L47) **Note:** Here's the comparison with existing read method: | Aspect | `read(byte[] buffer, int offset, int length)` | `readFully(long position, byte[] buffer, int offset, int length)` | |--------|----------------------------------------------|-------------------------------------------------------------------| | Method Signature | Reads from current stream position | Reads from specified position | | Position Behavior | Advances the stream position by the number of bytes actually read | Does not modify the stream position (position-independent read) | | Return & Error Handling | Returns `int` indicating bytes read; returns -1 at end of stream; may return fewer bytes than requested | Returns `void`; throws `IOException` if unable to read exact number of requested bytes | #### Relevant issues Once this feature is released as `1.2.0`, we will update this [Iceberg PR](apache/iceberg#13361) #### Does this contribution introduce any breaking changes to the existing APIs or behaviors? No #### Does this contribution introduce any new public APIs or behaviors? Yes #### How was the contribution tested? Ran existing and new tests locally #### Does this contribution need a changelog entry? - [x] I have updated the CHANGELOG or README if appropriate --- By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and I agree to the terms of the [Developer Certificate of Origin (DCO)](https://developercertificate.org/).
## Description of change This PR replaces all remaining references to DAT with AAL, reflecting the current and accurate name of the project. These legacy references have been updated for consistency and clarity. #### Relevant issues N/A #### Does this contribution introduce any breaking changes to the existing APIs or behaviors? No #### Does this contribution introduce any new public APIs or behaviors? No #### How was the contribution tested? No need to test #### Does this contribution need a changelog entry? N/A --- By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and I agree to the terms of the [Developer Certificate of Origin (DCO)](https://developercertificate.org/).
This PR merges the new PhysicalIO changes to the Blob object and start to use the new implementation. Next Steps: - Range coalescing implementation - Retry policy implementation PR History: #286 #287 #288 #289 existing APIs or behaviors? No No Unit test n/A --- By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and I agree to the terms of the [Developer Certificate of Origin (DCO)](https://developercertificate.org/).
Collaborator
Author
|
New PR #304 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description of change
// To be filled
Relevant issues
N/A
Does this contribution introduce any breaking changes to the existing APIs or behaviors?
Yes
Does this contribution introduce any new public APIs or behaviors?
No
How was the contribution tested?
Unit test, microbenchmarks
Does this contribution need a changelog entry?
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and I agree to the terms of the Developer Certificate of Origin (DCO).