[HUDI-2761] Fixing unnecessary refreshing of timeline in Timelineserver #4800

nsivabalan · 2022-02-13T00:20:56Z

What is the purpose of the pull request

Timeline server when serving remote requests, has a logic to refresh its local view of the timeline based on timeline hash. Client sends a timeline hash and timeline server compares with its local timeline hash and if they differ, a refresh of timeline happens before serving the request. But this refresh gets triggered even if the client is behind, but the server is already caught up. This could have severe perf impact with async table services and spark streaming pipeline use-cases where commit throughput is high. So, adding a new value to be maintained by the timeline for lastUpdatedTime. and the same will be sent as param with remote request as well.

Fix: So, the fix ensures that timeline server triggers a refresh of local timeline only if its lastUpdatedTime < client's lastUpdatedTime.

To discuss:
Clocks could differ in timeline server compared to that of the executor (client) and there could be drift as well. So, not very sure if we can rely on the exact comparison of last updated time between client and server.

Another option: I am wondering if we can rely on lastKnownInstant(HoodieInstant) from client and compare it w/ that of timeline in timeline server and decide whether to refresh or not instead of the lastUpdatedTime.

Brief change log

Added lastUpdatedTime to HoodieTimeline which gets refreshed whenever the timeline is updated. The same value is sent via remote requests to Timeline server.
Timeline server triggers a refresh of its local timeline only if its lastUpdatedTime < client's lastUpdatedTime in addition to timeline hash mismatch.

Thanks to guanziyue who helped us with the fix.

Verify this pull request

Couple of users in the community actually tested this and contributed the patch.

Committer checklist

Has a corresponding JIRA in PR title & commit
Commit message is descriptive of the change
CI is green
Necessary doc changes done or have another open PR
For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.

… time and timeline hash

nsivabalan · 2022-02-13T00:24:01Z

@bvaradar @xushiyan @n3nash @vinothchandar : Would appreciate if you folks can review this patch. We are making some tweaks to how we refresh local view in timeline server. Wanted to ensure I am not missing anything and there are no gaps.

hudi-bot · 2022-02-13T01:18:29Z

CI report:

5bccb23 Azure: SUCCESS

Bot commands

@hudi-bot supports the following commands:

@hudi-bot run azure re-run the last Azure build

n3nash · 2022-02-13T01:52:30Z

@nsivabalan Thanks for the fix. Quick comment before reviewing the diff: Is there a particular reason for choosing lastUpdatedTime instead of the HoodieInstant itself like you pointed out in your proposal ? To reduce complexity of understanding, I feel comparing HoodieInstants choice is better but I'd like to understand your reasoning.

danny0405 · 2022-02-13T02:49:47Z

@bvaradar @xushiyan @n3nash @vinothchandar : Would appreciate if you folks can review this patch. We are making some tweaks to how we refresh local view in timeline server. Wanted to ensure I am not missing anything and there are no gaps.

Generally i think the hoodie instant time should be the only truth for timeline versioning. In the before, i found that the timeline service refresh frequently if there are async table services to change the timeline metadata, such as cleaning and compaction,
for compaction the refresh is valid and necessary, but for cleaning, most of the refresh are invalid/unnecessary, wondering whether we can resolve this issue in the PR.

xushiyan · 2022-02-13T12:16:11Z

@nsivabalan Thanks for the fix. Quick comment before reviewing the diff: Is there a particular reason for choosing lastUpdatedTime instead of the HoodieInstant itself like you pointed out in your proposal ? To reduce complexity of understanding, I feel comparing HoodieInstants choice is better but I'd like to understand your reasoning.

@nsivabalan I had similar view to what @n3nash was asking here. The problem boils down to a cache invalidation issue: the local timeline view is a cache and we need to compare some timestamps to decide whether to invalidate the cache and reload the timeline view or not. So to avoid unnecessary complexity, is there any strong reason why instant time can't be used here?

nsivabalan · 2022-02-13T13:09:07Z

@xushiyan @n3nash :
I put up a patch as is what I got from the community user. and I heard that its being already run in prod if I am not wrong. so, went ahead and put up a patch. atleast I wanted to have discussions on both approaches.
but as I mentioned in the description, I am also inclined towards using lastInstantTime which makes sense. Will go ahead and fix the patch.

@danny0405 : we need to think more about cleaning not triggering any refresh. If I am not wrong, none of the apis in FileSystemView knows for which operation it is being executed for (for eg, getLatestBaseFiles). So, ignoring the timeline refresh just for cleaning will mean that we leak such information to the FileSystemView which needs some thinking. I am to take a look at the code to see how this might pan out. Will keep you posted.
but thanks for bringing up a good point. appreciate it.

nsivabalan · 2022-02-25T15:53:54Z

Closing this in favor of #4812

Fixing refreshing of timeline in Timelineserver based on last updated…

5bccb23

… time and timeline hash

nsivabalan changed the title ~~[HUDI-2400] Fixing refreshing of timeline in Timelineserver based on last updated time and timeline hash~~ [HUDI-2761] Fixing refreshing of timeline in Timelineserver based on last updated time and timeline hash Feb 13, 2022

nsivabalan changed the title ~~[HUDI-2761] Fixing refreshing of timeline in Timelineserver based on last updated time and timeline hash~~ [HUDI-2761] Fixing unnecessary refreshing of timeline in Timelineserver Feb 13, 2022

nsivabalan closed this Feb 25, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[HUDI-2761] Fixing unnecessary refreshing of timeline in Timelineserver #4800

[HUDI-2761] Fixing unnecessary refreshing of timeline in Timelineserver #4800

Uh oh!

nsivabalan commented Feb 13, 2022 •

edited

Loading

Uh oh!

nsivabalan commented Feb 13, 2022

Uh oh!

hudi-bot commented Feb 13, 2022

Uh oh!

n3nash commented Feb 13, 2022

Uh oh!

danny0405 commented Feb 13, 2022

Uh oh!

xushiyan commented Feb 13, 2022

Uh oh!

nsivabalan commented Feb 13, 2022

Uh oh!

nsivabalan commented Feb 25, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

[HUDI-2761] Fixing unnecessary refreshing of timeline in Timelineserver #4800

[HUDI-2761] Fixing unnecessary refreshing of timeline in Timelineserver #4800

Uh oh!

Conversation

nsivabalan commented Feb 13, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What is the purpose of the pull request

Brief change log

Verify this pull request

Committer checklist

Uh oh!

nsivabalan commented Feb 13, 2022

Uh oh!

hudi-bot commented Feb 13, 2022

CI report:

Uh oh!

n3nash commented Feb 13, 2022

Uh oh!

danny0405 commented Feb 13, 2022

Uh oh!

xushiyan commented Feb 13, 2022

Uh oh!

nsivabalan commented Feb 13, 2022

Uh oh!

nsivabalan commented Feb 25, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

nsivabalan commented Feb 13, 2022 •

edited

Loading