-
Notifications
You must be signed in to change notification settings - Fork 2.5k
[HUDI-1072] Introduce REPLACE top level action #2048
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@vinothchandar @bvaradar FYI. There are few things that I'm not fully happy with. But would like to get initial feedback and get agreement on high level approach. |
d259d1f to
c864f50
Compare
bvaradar
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@satishkotha : I made an initial pass and have given comments.
Regarding Metadata, Lets,
- Create HoodieReplaceMetadata : You need a json representation (active timeline) and an avro representation (for archived).
- Have HoodieReplaceMetadata extend HoodieCommitMetadata in json structure
- Ensure the avro representation has the same structure
- Add test-cases to ensure you are able to read committed replace metadata.
The above mechanism seems to be the cleanest possible way without boiling the ocean w.r.t commit logic().
- Add test-cases to ensure you are able to read
hudi-common/src/main/java/org/apache/hudi/common/table/view/AbstractTableFileSystemView.java
Outdated
Show resolved
Hide resolved
hudi-common/src/main/java/org/apache/hudi/common/model/HoodieReplaceStat.java
Outdated
Show resolved
Hide resolved
...-common/src/main/java/org/apache/hudi/common/table/view/RemoteHoodieTableFileSystemView.java
Outdated
Show resolved
Hide resolved
hudi-timeline-service/src/main/java/org/apache/hudi/timeline/service/FileSystemViewHandler.java
Outdated
Show resolved
Hide resolved
hudi-common/src/test/java/org/apache/hudi/common/testutils/HoodieCommonTestHarness.java
Outdated
Show resolved
Hide resolved
hudi-spark/src/main/java/org/apache/hudi/internal/HoodieDataSourceInternalWriter.java
Outdated
Show resolved
Hide resolved
...meline-service/src/main/java/org/apache/hudi/timeline/service/handlers/FileSliceHandler.java
Outdated
Show resolved
Hide resolved
...-common/src/main/java/org/apache/hudi/common/table/view/SpillableMapBasedFileSystemView.java
Outdated
Show resolved
Hide resolved
hudi-common/src/main/java/org/apache/hudi/common/table/view/TableFileSystemView.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In terms of next steps after this PR,
- Support for DeltaStreamer.
- CLI commands to display replace stats.
- Documentation Update for Insert Overwrite.
Please file jira to track. Also add any other things that I have missed.
|
@satishkotha : How is incremental Reading handled.Are we going to support it ? If not, Are you going to throw exceptions ? As we are cleaning up replaced files during archiving only but perform normal cleanup during cleaner operations, there will be cases when file versions of new files getting cleaned up before the old file versions getting removed. |
hudi-client/src/main/java/org/apache/hudi/table/HoodieTimelineArchiveLog.java
Outdated
Show resolved
Hide resolved
c864f50 to
1f5f737
Compare
aa5e4b6 to
28aad2c
Compare
f5b19ed to
6bd5bce
Compare
|
@bvaradar I addressed your comments.I also created followup items here https://issues.apache.org/jira/browse/HUDI-868 as you suggested. Please take another look when possible. |
00b151b to
6c97930
Compare
|
@vinothchandar @bvaradar Addressed most of your suggestions. Couple other followup items I need help from you on:
Either way, this is somewhat involved change, so would like to get your feedback before starting implementation. What do you guys think?
Let me know if you guys have any other comments. |
I think the suggestion was to simplify HoodieReplaceMetadata such that it only contains the extra information about replaced file groups. and use the HoodieCommitMetadata and its HoodieWriteStat for tracking the new file groups written. On cleaning vs archival, it would be good if we can implement this in cleaning. But can that be a follow-on item? Practically speaking, typical deployments don't configure cleaning that low. |
6c97930 to
f7e55e9
Compare
|
@vinothchandar As discussed, i added boolean in WriteStatus and removed HoodieReplaceStat. See this diff. I committed it as a separate git-sha because this still looks somewhat awkward IMO. Please take a look and I can revert or reimplement in a different way Also, created https://issues.apache.org/jira/browse/HUDI-1276 for cleaning replaced file during clean. I also renamed 'replace' to 'replacecommit' everywhere as you suggested. Please let me know if you have additional comments/suggestions |
vinothchandar
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just getting started
hudi-client/src/main/java/org/apache/hudi/client/AbstractHoodieWriteClient.java
Outdated
Show resolved
Hide resolved
6c6f370 to
067c285
Compare
hudi-client/src/main/java/org/apache/hudi/client/HoodieWriteResult.java
Outdated
Show resolved
Hide resolved
hudi-client/src/main/java/org/apache/hudi/table/HoodieTimelineArchiveLog.java
Outdated
Show resolved
Hide resolved
hudi-client/src/main/java/org/apache/hudi/table/HoodieTimelineArchiveLog.java
Outdated
Show resolved
Hide resolved
hudi-client/src/main/java/org/apache/hudi/table/HoodieTimelineArchiveLog.java
Outdated
Show resolved
Hide resolved
hudi-client/src/main/java/org/apache/hudi/table/HoodieTimelineArchiveLog.java
Outdated
Show resolved
Hide resolved
hudi-common/src/main/java/org/apache/hudi/common/model/HoodieReplaceCommitMetadata.java
Outdated
Show resolved
Hide resolved
...n/src/main/java/org/apache/hudi/common/table/view/IncrementalTimelineSyncFileSystemView.java
Outdated
Show resolved
Hide resolved
9de32ee to
a546978
Compare
|
@satishkotha : Please ping me in the PR when you have updates and I can give incremental comments if needed. |
@bvaradar Incremental FileSystem resotre is the only big pending item. I'll get to it in later part of this week. |
hudi-common/src/main/java/org/apache/hudi/common/table/view/AbstractTableFileSystemView.java
Outdated
Show resolved
Hide resolved
7f3de1b to
980fed7
Compare
bvaradar
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@satishkotha : Please go ahead and add unit-tests.
hudi-common/src/main/java/org/apache/hudi/common/table/view/AbstractTableFileSystemView.java
Outdated
Show resolved
Hide resolved
hudi-common/src/main/java/org/apache/hudi/common/table/view/AbstractTableFileSystemView.java
Outdated
Show resolved
Hide resolved
hudi-common/src/main/java/org/apache/hudi/common/util/CommitUtils.java
Outdated
Show resolved
Hide resolved
hudi-common/src/main/java/org/apache/hudi/common/table/view/AbstractTableFileSystemView.java
Outdated
Show resolved
Hide resolved
hudi-client/src/main/java/org/apache/hudi/client/AbstractHoodieWriteClient.java
Outdated
Show resolved
Hide resolved
hudi-client/src/main/java/org/apache/hudi/client/HoodieWriteClient.java
Outdated
Show resolved
Hide resolved
980fed7 to
c281f2e
Compare
hudi-client/src/main/java/org/apache/hudi/client/HoodieWriteClient.java
Outdated
Show resolved
Hide resolved
|
@satishkotha : when the conflicts are resolved and comments addressed let me know and I will take a final pass to land. |
c281f2e to
d8b2a3e
Compare
…write operation on top of replace action
d8b2a3e to
aeea5b6
Compare
bvaradar
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @satishkotha Looks good assuming the follow-up tasks : https://issues.apache.org/jira/browse/HUDI-1042
…write operation on top of replace action (apache#2048)
What is the purpose of the pull request
Add replace a top level action. Implement insert_overwrite operation on top of replace action
Brief change log
There are two other things that needs to be addressed:
Verify this pull request
This change added tests. Verified basic actions using quick start and docker setup. Added tests for insertOverwrite and FileSystemView changes. Need to add additional tests for backward compatibility. Also need to change integration tests to include insertOverwrite operations. Created followup tickets here https://issues.apache.org/jira/browse/HUDI-868
This is an example .hoodie folder from quick start setup:
-rw-r--r-- 1 satishkotha wheel 1933 Aug 27 16:39 20200827163904.commit
-rw-r--r-- 1 satishkotha wheel 0 Aug 27 16:39 20200827163904.commit.requested
-rw-r--r-- 1 satishkotha wheel 1015 Aug 27 16:39 20200827163904.inflight
-rw-r--r-- 1 satishkotha wheel 2610 Aug 27 16:39 20200827163927.replace
-rw-r--r-- 1 satishkotha wheel 1024 Aug 27 16:39 20200827163927.replace.inflight
-rw-r--r-- 1 satishkotha wheel 0 Aug 27 16:39 20200827163927.replace.requested
drwxr-xr-x 2 satishkotha wheel 64 Aug 27 16:39 archived
-rw-r--r-- 1 satishkotha wheel 235 Aug 27 16:39 hoodie.properties
Committer checklist
Has a corresponding JIRA in PR title & commit
Commit message is descriptive of the change
CI is green
Necessary doc changes done or have another open PR
For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.