Skip to content

New Physical IO Implementation#301

Closed
ozkoca wants to merge 15 commits intomainfrom
physical-io
Closed

New Physical IO Implementation#301
ozkoca wants to merge 15 commits intomainfrom
physical-io

Conversation

@ozkoca
Copy link
Collaborator

@ozkoca ozkoca commented Jun 26, 2025

Description of change

// To be filled

Relevant issues

N/A

Does this contribution introduce any breaking changes to the existing APIs or behaviors?

Yes

Does this contribution introduce any new public APIs or behaviors?

No

How was the contribution tested?

Unit test, microbenchmarks

Does this contribution need a changelog entry?

  • I have updated the CHANGELOG or README if appropriate

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and I agree to the terms of the Developer Certificate of Origin (DCO).

Erdogan Ozkoca and others added 9 commits June 3, 2025 13:25
…alIO (#287)

## Description of change
- This change adopts the changes from
[PR](#283) to
the new Physical IO implementation.
- Updates comment in DataBlock object

#### Relevant issues
[OpenStremInformation
PR](#283)
[Initial version of Physical
IO](#286)

---

By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license and I agree to the terms of
the [Developer Certificate of Origin
(DCO)](https://developercertificate.org/).

---------

Co-authored-by: Erdogan Ozkoca <ozkoca@amazon.com>
## Description of change
This PR adopts the memory manager changes to new physicalIO/

#### Relevant issues
PR History:
- #286
- #287

#### Does this contribution introduce any breaking changes to the
existing APIs or behaviors?
No

#### Does this contribution introduce any new public APIs or behaviors?
No

---

By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license and I agree to the terms of
the [Developer Certificate of Origin
(DCO)](https://developercertificate.org/).

---------

Co-authored-by: Erdogan Ozkoca <ozkoca@amazon.com>
## Description of change
This PR adds a new method optimizeReads to the RangeOptimiser class to
improve read performance by intelligently grouping and splitting block
indexes. The implementation reduces the complexity in DataBlockManager
and makes the optimization logic more testable.

Changes are: 
- Adds readAheadBytes logic
- Adds sequential prefetching logic
- Groups sequential block indexes together
- Splits large sequential groups into smaller chunks based on
configuration parameters
- Refactored DataBlockManager to use the new method instead of
implementing the logic itself
- Added comprehensive unit tests for the new method

Out of Scope
- Range coalescing will be implemented in a separate PR


#### Relevant issues
PR History:

#286
#287
#288

#### Does this contribution introduce any breaking changes to the
existing APIs or behaviors?
No

#### Does this contribution introduce any new public APIs or behaviors?
No



---

By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license and I agree to the terms of
the [Developer Certificate of Origin
(DCO)](https://developercertificate.org/).
## Description of change
This PR merges the new PhysicalIO changes to the Blob object and start
to use the new implementation.

Next Steps:
- Range coalescing implementation
- Retry policy implementation

#### Relevant issues
PR History:

#286
#287
#288
#289

#### Does this contribution introduce any breaking changes to the
existing APIs or behaviors?
No

#### Does this contribution introduce any new public APIs or behaviors?
No
#### How was the contribution tested?
Unit test

#### Does this contribution need a changelog entry?
n/A

---

By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license and I agree to the terms of
the [Developer Certificate of Origin
(DCO)](https://developercertificate.org/).
@ozkoca ozkoca had a problem deploying to integration-tests June 26, 2025 12:33 — with GitHub Actions Failure
rajdchak and others added 6 commits June 26, 2025 15:55
## Description of change
<!-- Thank you for submitting a pull request!-->
<!-- Please describe your contribution here. What and why? -->
<!-- Please ensure your commit messages follow these
[guidelines](https://chris.beams.io/posts/git-commit/). -->

Support passing of sse_c customer key to pass this to s3 for
encryption/decryption.

S3A currently has this customer key as Optional String
https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/auth/delegation/EncryptionSecretOperations.java#L41
while Iceberg has this key as a String
https://github.com/apache/iceberg/blob/f9cc62eb0d98e360b452a3ab8fdc6efdc4969f6e/aws/src/main/java/org/apache/iceberg/aws/s3/S3FileIOProperties.java#L499.
So decided to accept this key as Optional String.

#### Relevant issues
<!-- Please add issue numbers. -->
<!-- Please also link them to this PR. -->

#### Does this contribution introduce any breaking changes to the
existing APIs or behaviors?
<!-- Please explain why this was necessary. -->

#### Does this contribution introduce any new public APIs or behaviors?
<!-- Please describe them and explain what scenarios they target.  -->

#### How was the contribution tested?
<!-- Please describe how this contribution was tested. -->

#### Does this contribution need a changelog entry?
- [ ] I have updated the CHANGELOG or README if appropriate

---

By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license and I agree to the terms of
the [Developer Certificate of Origin
(DCO)](https://developercertificate.org/).
## Description of change
This PR modifies the GitHub Actions workflow to run integration tests
automatically after PR merge, removing the manual approval requirement
for post-merge cases.
<!-- Thank you for submitting a pull request!-->
<!-- Please describe your contribution here. What and why? -->
<!-- Please ensure your commit messages follow these
[guidelines](https://chris.beams.io/posts/git-commit/). -->


#### Does this contribution introduce any breaking changes to the
existing APIs or behaviors?
No

#### Does this contribution introduce any new public APIs or behaviors?
No

#### Does this contribution need a changelog entry?
No

---

By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license and I agree to the terms of
the [Developer Certificate of Origin
(DCO)](https://developercertificate.org/).
## Description of change
We want to support `readFully` as a part of our ongoing effort to
integrate with Iceberg S3FileIO by default. [RangeReadable in
Iceberg](https://github.com/apache/iceberg/blob/main/api/src/main/java/org/apache/iceberg/io/RangeReadable.java#L47)

**Note:** Here's the comparison with existing read method:

| Aspect | `read(byte[] buffer, int offset, int length)` |
`readFully(long position, byte[] buffer, int offset, int length)` |

|--------|----------------------------------------------|-------------------------------------------------------------------|
| Method Signature | Reads from current stream position | Reads from
specified position |
| Position Behavior | Advances the stream position by the number of
bytes actually read | Does not modify the stream position
(position-independent read) |
| Return & Error Handling | Returns `int` indicating bytes read; returns
-1 at end of stream; may return fewer bytes than requested | Returns
`void`; throws `IOException` if unable to read exact number of requested
bytes |

#### Relevant issues
Once this feature is released as `1.2.0`, we will update this [Iceberg
PR](apache/iceberg#13361)

#### Does this contribution introduce any breaking changes to the
existing APIs or behaviors?
No

#### Does this contribution introduce any new public APIs or behaviors?
Yes

#### How was the contribution tested?
Ran existing and new tests locally

#### Does this contribution need a changelog entry?
- [x] I have updated the CHANGELOG or README if appropriate

---

By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license and I agree to the terms of
the [Developer Certificate of Origin
(DCO)](https://developercertificate.org/).
## Description of change
This PR replaces all remaining references to DAT with AAL, reflecting
the current and accurate name of the project. These legacy references
have been updated for consistency and clarity.

#### Relevant issues
N/A

#### Does this contribution introduce any breaking changes to the
existing APIs or behaviors?
No

#### Does this contribution introduce any new public APIs or behaviors?
No

#### How was the contribution tested?
No need to test

#### Does this contribution need a changelog entry?
N/A

---

By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license and I agree to the terms of
the [Developer Certificate of Origin
(DCO)](https://developercertificate.org/).
This PR merges the new PhysicalIO changes to the Blob object and start
to use the new implementation.

Next Steps:
- Range coalescing implementation
- Retry policy implementation

PR History:

#286
#287
#288
#289

existing APIs or behaviors?
No

No
Unit test

n/A

---

By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license and I agree to the terms of
the [Developer Certificate of Origin
(DCO)](https://developercertificate.org/).
@ozkoca ozkoca had a problem deploying to integration-tests June 26, 2025 14:55 — with GitHub Actions Failure
@ozkoca ozkoca closed this Jun 26, 2025
@ozkoca
Copy link
Collaborator Author

ozkoca commented Jun 26, 2025

New PR #304

@ozkoca ozkoca deleted the physical-io branch August 7, 2025 16:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants