-
Notifications
You must be signed in to change notification settings - Fork 3k
Core: Optimize check for referenced data files in BaseRowDelta #3071
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| if (!referencedDataFiles.isEmpty()) { | ||
| validateDataFilesExist(base, startingSnapshotId, referencedDataFiles, !validateDeletes); | ||
| validateDataFilesExist( | ||
| base, startingSnapshotId, referencedDataFiles, !validateDeletes, conflictDetectionFilter); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does make the assumption that the conflict detection filter and referenced data files are related.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that is a correct assumption as the conflict detection filter is our scan condition and referenced data files are data files were read.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's a valid assumption.
e02d97b to
8afb26d
Compare
| } | ||
|
|
||
| @Test | ||
| public void testValidateDataFilesExistWithConflictDetectionFilter() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we also add a test where the validation fails? It looks like this one just checks that you can do isolated operations but I think we should do a conflicting test as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I'll add one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a negative test too.
rdblue
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is ready to go once @RussellSpitzer's suggestion to add a test case is fixed.
966cede to
e3b7c68
Compare
|
I'll merge this one to unblock subsequent PRs. I added the missing test. |
|
Thanks for reviewing, @RussellSpitzer @rdblue! |
…e#3071) This change optimizes our check for referenced data files in BaseRowDelta by pushing down the conflict detection filter. Previously, we would open manifests even though they belonged to partitions out of our interest.
…e#3071) This change optimizes our check for referenced data files in BaseRowDelta by pushing down the conflict detection filter. Previously, we would open manifests even though they belonged to partitions out of our interest.
This change optimizes our check for referenced data files in BaseRowDelta by pushing down the conflict detection filter. Previously, we would open manifests even though they belonged to partitions out of our interest.
|
Added this as it was included in 0.12.1 (made cherry-picking easier). |
…e#3071) This change optimizes our check for referenced data files in BaseRowDelta by pushing down the conflict detection filter. Previously, we would open manifests even though they belonged to partitions out of our interest.
This PR optimizes our check for referenced data files in
BaseRowDeltaby pushing down the conflict detection filter. Previously, we would open manifests even though they belonged to partitions out of our interest.