-
Notifications
You must be signed in to change notification settings - Fork 2.5k
[HUDI-4062] Only rollback the failed writes pre upgrade under optimis… #5535
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| Map<String, Option<HoodiePendingRollbackInfo>> pendingRollbacks = getPendingRollbackInfos(metaClient); | ||
| instantsToRollback.forEach(entry -> pendingRollbacks.putIfAbsent(entry, Option.empty())); | ||
| Map<String, Option<HoodiePendingRollbackInfo>> pendingRollbacks = getPendingRollbackInfos(metaClient); | ||
| instantsToRollback.forEach(entry -> pendingRollbacks.putIfAbsent(entry, Option.empty())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The pre-upgrade rollback was introduced in #2374, with a pre-condition that when the write concurrency mode is based on optimistic lock.
While in #4114, the pre-condition was removed, which i think is a regression because we already do the rollback when starting new instant/commit before write. In single writer mode, there is no need to do redundant rollback again before upgrade.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@n3nash : change is ok with me. Do you see any issues as such.
CC @alexeykudinkin
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One case that I can think of is when the CLEAN_FAILED_FILES policy is set to LAZY and writer config is set to a single writer - I believe there can be older inflight files lying around. In this case if the rollback is done only for Optimistic Concurrency, some older inflight metadata files may be lying around.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some older inflight metadata files may be lying around
That's true, but what is the problem here in single writer upgrade ? It seems that the problem is not about upgrade but about the metadata table rollback, but shouldn't the metadata table also has MVCC/SI transaction semantics here ? The metadata table reader can not see these inflight meta logs right ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@danny0405 I'm not fully following your comment. From this PR, what I gather is that we want to special case rollback for Optimistic Concurrency only since for Single Writer this will be redundant given rollback is supposed to happen at the start of the commit. I'm merely saying that's not necessarily the case due to the LAZY cleaning policy, does that make sense ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean for LAZY cleaning strategy, we do not rollback failed commits at the start of commits. I totally agree with that.
I'm actually confused here whether there are any issues here for metadata table rollback for single-writer mode, if there is no problem, then this patch is an improvement ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hey @danny0405 : I am confused as to whats the regression here. I thought this is more of an optimization. From your first comment "While in #4114, the pre-condition was removed, which i think is a regression because we already do the rollback when starting new instant/commit before write. In single writer mode, there is no need to do redundant rollback again before upgrade."
can you help me understand why would be do redundant rollback. once we rolledback a given commit, next time if we check if something needs to be rolled back, the same partially failed commit should not appear right. I mean, if the rollback was complete the first time, it should be be part of the list of commits to rollback the 2nd time right. or am I missing something here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @nsivabalan , i agree with your point, but at least, there is no need for redundant metadata checking if the rollback strategy is EAGER.
And still, no one answers the question why we need the rollback (for data set files and metadata files) before upgrade ? What is exactly the purpose here ?
@n3nash say that it is because we need to rollback the metadata table if it is enabled, but we can not see any clues that the rollback is only for metadata. And even if we do not rollback the metadata, the content read from the metadata should still be correct based on that the metadata table itself is a Hudi table and has SI visibility ?
Seems it is clear that the rollback here is not for data set files, correct me if i'm wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello @nsivabalan @n3nash , i have refactored the code a little to
- move the failed writes cleaning all together to when starting the new commit
- the cleaning strategy for commit actions as tweaked: force to clean if metadata table is enabled in case we commit metadata table successfully but data set table failed
Please review again if you have time, thanks so much in advance ~
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I analyze the code a little and find that we would do the metadata table sync when it is initialized, so for the case "commit metadata successfully but data set table failed", it should be fine.
Also i didn't find any upgrade/downgrade code that needs the data set/metadata files, from my perspective, the rollback is totally unnecessary, so i just remove it here.
|
Also cc @alexeykudinkin , can you review if you have time :) |
nsivabalan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lets wait to hear from nishith
30ac93b to
35a1fed
Compare
|
@danny0405 can you please update PR description to capture the intent of this change. Adding context to the description helps tremendously in the review and beyond to understand the original intent. |
|
@danny0405 let's take a step back here: can you please outline what's the problem we're trying to solve for here? Do you see any issues introduced by #4114 or is this purely targeting an optimization? |
|
It is a optimization, but i think i made the logic more complicated, so i will close this PR. |
…tic concurrency
Tips
What is the purpose of the pull request
(For example: This pull request adds quick-start document.)
Brief change log
(for example:)
Verify this pull request
(Please pick either of the following options)
This pull request is a trivial rework / code cleanup without any test coverage.
(or)
This pull request is already covered by existing tests, such as (please describe tests).
(or)
This change added tests and can be verified as follows:
(example:)
Committer checklist
Has a corresponding JIRA in PR title & commit
Commit message is descriptive of the change
CI is green
Necessary doc changes done or have another open PR
For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.