Skip to content

Conversation

@jonvex
Copy link
Contributor

@jonvex jonvex commented Jan 23, 2023

Change Logs

When we have some failed commits in the timeline before first successful commit, FS based listing could return data from the failed commit. This is not an issue w/ single writer. Could only happen when multi-writers are enabled or when async table services are enabled along w/ deltastreamer.

For eg, if timeline is:
c1.inflight, c2.complete,c3.complete
when we query hudi, data files from c1 is also returned. Fixing it as part of this patch.

Impact

FS based listing will not return data from failed commit.

Risk level (write none, low medium or high below)

low

Documentation Update

N/A

Contributor's checklist

  • Read through contributor's guide
  • Change Logs and Impact were stated clearly
  • Adequate tests were added if applicable
  • CI passed

@jonvex jonvex marked this pull request as ready for review January 23, 2023 20:06
@nsivabalan nsivabalan added the priority:blocker Production down; release blocker label Jan 23, 2023
@nsivabalan nsivabalan self-assigned this Jan 23, 2023
connectTableAndReloadMetaClient(tablePath);
HoodieTableFileSystemView fsView = new HoodieTableFileSystemView(metaClient,
metaClient.getActiveTimeline().getCommitsTimeline().filterCompletedInstants(),
TimelineUtils.getFirstNotCompleted(metaClient.getActiveTimeline().getCommitsTimeline()),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any possibility just pass around the metaClient.getActiveTimeline().getCommitsTimeline() and resolve the first not completed instant inside the HoodieTableFileSystemView constructor. This can avoid many unnecessary changes and the code is cleaner to maintain.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, thought about it. couple of reasons.

  1. at times we do send the fileGroup fully to timeline server on which case we are sending entire timeline (not just completed) and might take up more space.
  2. also, we really don't need entire timeline. we just need the first Active instant. and hence took this route. Also, most of FileSystemView code assume its completed timeline and hence didn't want to change that assumption.

Copy link
Contributor

@nsivabalan nsivabalan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

responded to comments

}
}

public static Option<String> getFirstNotCompleted(HoodieTimeline timeline) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should make this method getFirstInstant(timeline).
essentially irrespective fo whether its complete or inflight, first instant in write timeline.


public static HoodieFileGroup toFileGroup(FileGroupDTO dto, HoodieTableMetaClient metaClient) {
HoodieFileGroup fileGroup =
new HoodieFileGroup(dto.partition, dto.id, TimelineDTO.toTimeline(dto.timeline, metaClient));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we might need to fix FileGroupDTO to hold the first instant as well.

* Initialize the view.
*/
protected void init(HoodieTableMetaClient metaClient, HoodieTimeline visibleActiveTimeline) {
protected void init(HoodieTableMetaClient metaClient, HoodieTimeline visibleActiveTimeline,Option<String> firstNotCompleted) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: firstActiveInstant

connectTableAndReloadMetaClient(tablePath);
HoodieTableFileSystemView fsView = new HoodieTableFileSystemView(metaClient,
metaClient.getActiveTimeline().getCommitsTimeline().filterCompletedInstants(),
TimelineUtils.getFirstNotCompleted(metaClient.getActiveTimeline().getCommitsTimeline()),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, thought about it. couple of reasons.

  1. at times we do send the fileGroup fully to timeline server on which case we are sending entire timeline (not just completed) and might take up more space.
  2. also, we really don't need entire timeline. we just need the first Active instant. and hence took this route. Also, most of FileSystemView code assume its completed timeline and hence didn't want to change that assumption.

// To get here:
// 1. the timestamp must be <= the last commit
// 2. not in the completed timeline
// 3. the timestamp must be >= the first active instant
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the #isBeforeTimelineStarts is only valid when the timeline is the whole complete active timeline.

@nsivabalan nsivabalan added priority:critical Production degraded; pipelines stalled and removed priority:blocker Production down; release blocker labels Jan 25, 2023
@hudi-bot
Copy link
Collaborator

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

Copy link
Member

@codope codope left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jonvex Can you please rebase and resolve conflicts?
@danny0405 @nsivabalan Is this a blocker for 0.13.1?

@nsivabalan
Copy link
Contributor

Closing this in favor of #8783

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

priority:critical Production degraded; pipelines stalled release-0.14.0

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

6 participants