-
Notifications
You must be signed in to change notification settings - Fork 2.5k
[HUDI-5569] Fixing TableFileSystemView to detect early failed commits #7738
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| connectTableAndReloadMetaClient(tablePath); | ||
| HoodieTableFileSystemView fsView = new HoodieTableFileSystemView(metaClient, | ||
| metaClient.getActiveTimeline().getCommitsTimeline().filterCompletedInstants(), | ||
| TimelineUtils.getFirstNotCompleted(metaClient.getActiveTimeline().getCommitsTimeline()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any possibility just pass around the metaClient.getActiveTimeline().getCommitsTimeline() and resolve the first not completed instant inside the HoodieTableFileSystemView constructor. This can avoid many unnecessary changes and the code is cleaner to maintain.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, thought about it. couple of reasons.
- at times we do send the fileGroup fully to timeline server on which case we are sending entire timeline (not just completed) and might take up more space.
- also, we really don't need entire timeline. we just need the first Active instant. and hence took this route. Also, most of FileSystemView code assume its completed timeline and hence didn't want to change that assumption.
nsivabalan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
responded to comments
| } | ||
| } | ||
|
|
||
| public static Option<String> getFirstNotCompleted(HoodieTimeline timeline) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should make this method getFirstInstant(timeline).
essentially irrespective fo whether its complete or inflight, first instant in write timeline.
|
|
||
| public static HoodieFileGroup toFileGroup(FileGroupDTO dto, HoodieTableMetaClient metaClient) { | ||
| HoodieFileGroup fileGroup = | ||
| new HoodieFileGroup(dto.partition, dto.id, TimelineDTO.toTimeline(dto.timeline, metaClient)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we might need to fix FileGroupDTO to hold the first instant as well.
| * Initialize the view. | ||
| */ | ||
| protected void init(HoodieTableMetaClient metaClient, HoodieTimeline visibleActiveTimeline) { | ||
| protected void init(HoodieTableMetaClient metaClient, HoodieTimeline visibleActiveTimeline,Option<String> firstNotCompleted) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: firstActiveInstant
| connectTableAndReloadMetaClient(tablePath); | ||
| HoodieTableFileSystemView fsView = new HoodieTableFileSystemView(metaClient, | ||
| metaClient.getActiveTimeline().getCommitsTimeline().filterCompletedInstants(), | ||
| TimelineUtils.getFirstNotCompleted(metaClient.getActiveTimeline().getCommitsTimeline()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, thought about it. couple of reasons.
- at times we do send the fileGroup fully to timeline server on which case we are sending entire timeline (not just completed) and might take up more space.
- also, we really don't need entire timeline. we just need the first Active instant. and hence took this route. Also, most of FileSystemView code assume its completed timeline and hence didn't want to change that assumption.
| // To get here: | ||
| // 1. the timestamp must be <= the last commit | ||
| // 2. not in the completed timeline | ||
| // 3. the timestamp must be >= the first active instant |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, the #isBeforeTimelineStarts is only valid when the timeline is the whole complete active timeline.
…se it was filtering out cleaning instances
…System View correctly
and rebase
5cf1313 to
f2e3a99
Compare
codope
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jonvex Can you please rebase and resolve conflicts?
@danny0405 @nsivabalan Is this a blocker for 0.13.1?
|
Closing this in favor of #8783 |
Change Logs
When we have some failed commits in the timeline before first successful commit, FS based listing could return data from the failed commit. This is not an issue w/ single writer. Could only happen when multi-writers are enabled or when async table services are enabled along w/ deltastreamer.
For eg, if timeline is:
c1.inflight, c2.complete,c3.complete
when we query hudi, data files from c1 is also returned. Fixing it as part of this patch.
Impact
FS based listing will not return data from failed commit.
Risk level (write none, low medium or high below)
low
Documentation Update
N/A
Contributor's checklist