-
Notifications
You must be signed in to change notification settings - Fork 2.9k
Core: Support combining position deletes during writes #11222
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Core: Support combining position deletes during writes #11222
Conversation
| } | ||
| } | ||
|
|
||
| public static <T extends StructLike> CharSequenceMap<PositionDeleteIndex> toPositionIndexes( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Purely to avoid breaking the API.
| private final Supplier<FileWriter<PositionDelete<T>, DeleteWriteResult>> writers; | ||
| private final DeleteGranularity granularity; | ||
| private final CharSequenceMap<Roaring64Bitmap> positionsByPath; | ||
| private final CharSequenceMap<PositionDeleteIndex> positionsByPath; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this an area we'd want to explore using Map<String, PositionDeleteIndex> instead of the CharSequenceMap? Doesn't need to be in this PR, more so just wondering
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I think it'll probably make more sense to look at that when I do the update to use location instead of the deprecated path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may want to keep this as CharSequenceMap as writers may use arbitrary CharSequence implementations and it is a bit different from DataFile/DeleteFile structs.
|
|
||
| try { | ||
| PositionDelete<T> positionDelete = PositionDelete.create(); | ||
| for (CharSequence path : sort(paths)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another aspect I'm curious about, have we ever compared with using a TreeMap instead of sorting? It'll be the same time complexity in the end but interested in seeing if there's any significant differences in practice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd say TreeMap starts to make sense if we access the collection in sorted order more than once. Otherwise, paying the extra cost during inserts may not be worth it.
41fd3b0 to
19c1779
Compare
amogh-jahagirdar
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @aokolnychyi! Had minor comments but I think this looks great overall.
| } | ||
|
|
||
| private PositionDeleteIndex loadPreviousDeletes(CharSequence path) { | ||
| return loadPreviousDeletes != null ? loadPreviousDeletes.apply(path) : null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Would it make sense to default loadPreviousDeletes to be a function implementation which just returns null (I think it'd be a one line lambda charSequence -> null? Then I think we could remove this helper method and directly use loadPreviousDeletes.apply(path) on line 150.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like that, let me update.
| } | ||
|
|
||
| FileWriter<PositionDelete<T>, DeleteWriteResult> writer = writers.get(); | ||
| List<DeleteFile> rewrittenDeleteFile = Lists.newArrayList(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rewrittenDeleteFiles?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch.
|
Thanks, @amogh-jahagirdar! |
This PR adds support for combing historical position deletes in writers, enabling sync maintenance.