Skip to content

Conversation

@aokolnychyi
Copy link
Contributor

This PR optimizes our check for referenced data files in BaseRowDelta by pushing down the conflict detection filter. Previously, we would open manifests even though they belonged to partitions out of our interest.

@github-actions github-actions bot added the core label Sep 3, 2021
if (!referencedDataFiles.isEmpty()) {
validateDataFilesExist(base, startingSnapshotId, referencedDataFiles, !validateDeletes);
validateDataFilesExist(
base, startingSnapshotId, referencedDataFiles, !validateDeletes, conflictDetectionFilter);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does make the assumption that the conflict detection filter and referenced data files are related.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that is a correct assumption as the conflict detection filter is our scan condition and referenced data files are data files were read.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's a valid assumption.

@aokolnychyi
Copy link
Contributor Author

}

@Test
public void testValidateDataFilesExistWithConflictDetectionFilter() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also add a test where the validation fails? It looks like this one just checks that you can do isolated operations but I think we should do a conflicting test as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I'll add one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a negative test too.

Copy link
Contributor

@rdblue rdblue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is ready to go once @RussellSpitzer's suggestion to add a test case is fixed.

@aokolnychyi
Copy link
Contributor Author

I'll merge this one to unblock subsequent PRs. I added the missing test.

@aokolnychyi aokolnychyi merged commit 838cc65 into apache:master Sep 13, 2021
@aokolnychyi
Copy link
Contributor Author

Thanks for reviewing, @RussellSpitzer @rdblue!

kbendick pushed a commit to kbendick/iceberg that referenced this pull request Oct 27, 2021
…e#3071)

This change optimizes our check for referenced data files in BaseRowDelta by pushing down the conflict detection filter. Previously, we would open manifests even though they belonged to partitions out of our interest.
kbendick pushed a commit to kbendick/iceberg that referenced this pull request Nov 1, 2021
…e#3071)

This change optimizes our check for referenced data files in BaseRowDelta by pushing down the conflict detection filter. Previously, we would open manifests even though they belonged to partitions out of our interest.
rdblue pushed a commit that referenced this pull request Nov 1, 2021
This change optimizes our check for referenced data files in BaseRowDelta by pushing down the conflict detection filter. Previously, we would open manifests even though they belonged to partitions out of our interest.
@kbendick kbendick added this to the Java 0.12.1 Release milestone Nov 1, 2021
@kbendick
Copy link
Contributor

kbendick commented Nov 1, 2021

Added this as it was included in 0.12.1 (made cherry-picking easier).

izchen pushed a commit to izchen/iceberg that referenced this pull request Dec 7, 2021
…e#3071)

This change optimizes our check for referenced data files in BaseRowDelta by pushing down the conflict detection filter. Previously, we would open manifests even though they belonged to partitions out of our interest.
Initial-neko pushed a commit to Initial-neko/iceberg that referenced this pull request Dec 13, 2021
Initial-neko pushed a commit to Initial-neko/iceberg that referenced this pull request Dec 17, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants