-
Notifications
You must be signed in to change notification settings - Fork 2.5k
[HUDI-2917] rollback insert data appended to log file when using Hbase Index #4840
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@codope : can you review this patch. |
|
@guanziyue : feel free to review the patch. |
...c/test/java/org/apache/hudi/table/action/rollback/TestMergeOnReadRollbackActionExecutor.java
Outdated
Show resolved
Hide resolved
...-client/src/main/java/org/apache/hudi/table/action/commit/BaseSparkCommitActionExecutor.java
Show resolved
Hide resolved
...-client/src/main/java/org/apache/hudi/table/action/commit/BaseSparkCommitActionExecutor.java
Show resolved
Hide resolved
...t/hudi-spark-client/src/main/java/org/apache/hudi/table/action/commit/UpsertPartitioner.java
Outdated
Show resolved
Hide resolved
...c/test/java/org/apache/hudi/table/action/rollback/TestMergeOnReadRollbackActionExecutor.java
Outdated
Show resolved
Hide resolved
|
@codope : good to review again. fixed unnecessary updates/populating output workload stats if not required. |
codope
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. If the else block is not necessary then it's better to remove it. You can land it after that.
...t/hudi-client-common/src/main/java/org/apache/hudi/table/action/compact/HoodieCompactor.java
Show resolved
Hide resolved
4385989 to
260c29f
Compare
|
@hudi-bot run azure |
…e Index (apache#4840) Co-authored-by: guanziyue <[email protected]>
…e Index (apache#4840) Co-authored-by: guanziyue <[email protected]>
…e Index (apache#4840) Co-authored-by: guanziyue <[email protected]>
What is the purpose of the pull request
Redo of #4446
We may find some data which should be rollbacked in hudi table.
Root cause:
Let's first recall how rollback plan generated about log blocks for deltaCommit. Hudi takes two cases into consideration.
For some log file with no base file, they are comprised by records which are all 'insert record'. Delete them directly. Here we assume all inserted record should be covered by this way.
For those fileID which are updated according to inflight commit meta of instant we want to rollback, we append command block to these log file to rollback. Here all updated record are handled.
However, the first condition is not always true. For indexes which can index log file, they could insert record to some existing log file. In current process, inflight hoodieCommitMeta was generated before they are assigned to specific filegroup.
Brief change log
Verify this pull request
Committer checklist
Has a corresponding JIRA in PR title & commit
Commit message is descriptive of the change
CI is green
Necessary doc changes done or have another open PR
For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.