-
Notifications
You must be signed in to change notification settings - Fork 2.5k
[HUDI-2761] Fixing timeline server for repeated refreshes #4812
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
373869c to
51b60eb
Compare
|
@danny0405 : I might need your help in this patch. There are few test failures in flink. can you take a look and see whats happening. If you can triage a fix and update the patch, would be of great help. Description of the patch has all the info/context. Failing tests: |
51b60eb to
d59ff5d
Compare
d59ff5d to
d67e7d1
Compare
| if (!localTimelineHash.equals(timelineHashFromClient)) { | ||
| // refresh if timeline hash mismatches and if local's last known instant < client's last known instant | ||
| if (!localTimelineHash.equals(timelineHashFromClient) | ||
| && HoodieTimeline.compareTimestamps(localLastKnownInstant, HoodieTimeline.LESSER_THAN, lastKnownInstantFromClient)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Finally i got the reason:
We have a start commit method that may generate a rollback instant with greater timestamp than the actual passed in instant time:
hudi/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/BaseHoodieWriteClient.java
Line 862 in 907e60c
| public void startCommitWithTime(String instantTime, String actionType) { |
And unfortunately, flink uses that (spark uses that too), here is how the problem comes:
delta_commit compaction delta_commit rollback_commit
--- t1 --------------- t2 ------------ t3 ------------- t4 ------------
The t4 was created before t3 was created and it was with the highest timestamp t4, then the following sequence happens:
- the rollback action would then refresh the remote timeline service with the latest timestamp t4 (remember the fs view as V1)
- the t3 delta commit start to execute and commit, say the commit was successful
- then we want to trigger the compaction after the commit of t3
And the tricky things happens:
the compaction scheduler takes the client, the client uses the latest timestamps on timeline and tries to fetch all the fileslices, but because the client timestamp t4 equals with the remote timeline time t4, the view does not sync and we still got V1 fs view here and we can not find any compaction plan because there was no log files in the view.
Here is my fix patch to make sure the rollback timestamp not greater than the delta commit time.
danny0405
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
* Fixing timeline server for repeated refreshes
* Fixing timeline server for repeated refreshes
What is the purpose of the pull request
Timeline server when serving remote requests, has a logic to refresh its local view of the timeline based on timeline hash. Client sends a timeline hash and timeline server compares with its local timeline hash and if they differ, a refresh of timeline happens before serving the request. But this refresh gets triggered even if the client is behind, but the server is already caught up. This could have severe perf impact with async table services and spark streaming pipeline use-cases where commit throughput is high. So, adding a new value to be maintained by the timeline for lastUpdatedTime. and the same will be sent as param with remote request as well.
Fix: So, the fix ensures that timeline server triggers a refresh of local timeline only if its last known instant < client's last known instant.
Brief change log
Verify this pull request
Relying on existing tests. Will ask some users in the community to test out the patch.
Committer checklist
Has a corresponding JIRA in PR title & commit
Commit message is descriptive of the change
CI is green
Necessary doc changes done or have another open PR
For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.