-
Notifications
You must be signed in to change notification settings - Fork 3k
API, Core: Don't persist useless file and position bounds #8360
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API, Core: Don't persist useless file and position bounds #8360
Conversation
be8ca32 to
c7698f5
Compare
c7698f5 to
94cda10
Compare
94cda10 to
96dabf6
Compare
RussellSpitzer
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for OSS
singhpk234
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM as well, considering lower and upper bounds are optional in spec.
| private Metrics metrics() { | ||
| Metrics metrics = appender.metrics(); | ||
| if (referencedDataFiles.size() > 1) { | ||
| return MetricsUtil.copyWithoutFieldBounds(metrics, SINGLE_REFERENCED_FILE_BOUNDS_ONLY); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like the idea is only to drop the bounds for _file and _pos? I guess that would mean that we keep data column ranges that might be used for filtering?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not aware of any engines that persist deleted rows but I opted in for an incremental change in behavior to be safe.
| Assert.assertNull(deleteFile.lowerBounds()); | ||
| Assert.assertNull(deleteFile.upperBounds()); | ||
| } else { | ||
| Assert.assertEquals(2, deleteFile.lowerBounds().size()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this also assert that referencedDataFiles.size() == 1?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll add a check.
|
Looks good to me. |
|
Thanks for reviewing, @RussellSpitzer @singhpk234 @rdblue! |
This PR changes the metrics collection in position delete files to drop useless boundaries. As discussed earlier, loading and comparing lower and upper boundaries in
DeleteFileIndexis fairly expensive and should be done only if makes sense. Given the nature of generated data file paths, it only makes sense to persist the boundaries if a position delete file covers a single data file. Otherwise, it only harms the planning performance.