-
Notifications
You must be signed in to change notification settings - Fork 2.5k
[HUDI-1072] Add replace metadata file to timeline #1853
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,44 @@ | ||
| /* | ||
| * Licensed to the Apache Software Foundation (ASF) under one | ||
| * or more contributor license agreements. See the NOTICE file | ||
| * distributed with this work for additional information | ||
| * regarding copyright ownership. The ASF licenses this file | ||
| * to you under the Apache License, Version 2.0 (the | ||
| * "License"); you may not use this file except in compliance | ||
| * with the License. You may obtain a copy of the License at | ||
| * | ||
| * http://www.apache.org/licenses/LICENSE-2.0 | ||
| * | ||
| * Unless required by applicable law or agreed to in writing, software | ||
| * distributed under the License is distributed on an "AS IS" BASIS, | ||
| * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| * See the License for the specific language governing permissions and | ||
| * limitations under the License. | ||
| */ | ||
|
|
||
| /* | ||
| * Note that all 'replace' instants are read for every query | ||
| * So it is important to keep this small. Please be careful | ||
| * before tracking additional information in this file. | ||
| * This will be used for 'insert_overwrite' (RFC-18) and also 'clustering' (RFC-19) | ||
| */ | ||
| {"namespace": "org.apache.hudi.avro.model", | ||
| "type": "record", | ||
| "name": "HoodieReplaceMetadata", | ||
| "fields": [ | ||
| {"name": "totalFilesReplaced", "type": "int"}, | ||
| {"name": "command", "type": "string"}, | ||
| {"name": "partitionMetadata", "type": { | ||
| "type" : "map", "values" : { | ||
| "type": "array", | ||
| "items": "string" | ||
| } | ||
| } | ||
| }, | ||
| { | ||
| "name":"version", | ||
| "type":["int", "null"], | ||
| "default": 1 | ||
| } | ||
| ] | ||
| } | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -65,7 +65,8 @@ public class HoodieActiveTimeline extends HoodieDefaultTimeline { | |
| COMMIT_EXTENSION, INFLIGHT_COMMIT_EXTENSION, REQUESTED_COMMIT_EXTENSION, DELTA_COMMIT_EXTENSION, | ||
| INFLIGHT_DELTA_COMMIT_EXTENSION, REQUESTED_DELTA_COMMIT_EXTENSION, SAVEPOINT_EXTENSION, | ||
| INFLIGHT_SAVEPOINT_EXTENSION, CLEAN_EXTENSION, REQUESTED_CLEAN_EXTENSION, INFLIGHT_CLEAN_EXTENSION, | ||
| INFLIGHT_COMPACTION_EXTENSION, REQUESTED_COMPACTION_EXTENSION, INFLIGHT_RESTORE_EXTENSION, RESTORE_EXTENSION)); | ||
| INFLIGHT_COMPACTION_EXTENSION, REQUESTED_COMPACTION_EXTENSION, INFLIGHT_RESTORE_EXTENSION, RESTORE_EXTENSION, | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Since the intention of this PR is to simply introduce a new action, do we need to change the VALID_ACTION in this PR ? Do the other methods you implemented in the ActiveTimeline require this ? I'm wondering if we can avoid this change in case we want to land this PR incrementally
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I added this for tests. But can split this and related test out into a different review
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Its better to avoid this for rollout purpose. In case, this PR gets landed before the next and a release cut, then we need to worry about ordering of rollout between readers and writers. |
||
| INFLIGHT_REPLACE_EXTENSION, REPLACE_EXTENSION)); | ||
|
|
||
| private static final Logger LOG = LogManager.getLogger(HoodieActiveTimeline.class); | ||
| protected HoodieTableMetaClient metaClient; | ||
|
|
@@ -304,6 +305,22 @@ public HoodieInstant transitionCleanRequestedToInflight(HoodieInstant requestedI | |
| return inflight; | ||
| } | ||
|
|
||
| /** | ||
| * Transition Clean State from inflight to Committed. | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. s/Clean/Replace
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. will update. thanks |
||
| * | ||
| * @param inflightInstant Inflight instant | ||
| * @param data Extra Metadata | ||
| * @return commit instant | ||
| */ | ||
| public HoodieInstant transitionReplaceInflightToComplete(HoodieInstant inflightInstant, Option<byte[]> data) { | ||
| ValidationUtils.checkArgument(inflightInstant.getAction().equals(HoodieTimeline.REPLACE_ACTION)); | ||
| ValidationUtils.checkArgument(inflightInstant.isInflight()); | ||
| HoodieInstant commitInstant = new HoodieInstant(State.COMPLETED, REPLACE_ACTION, inflightInstant.getTimestamp()); | ||
| // Then write to timeline | ||
| transitionState(inflightInstant, commitInstant, data); | ||
| return commitInstant; | ||
| } | ||
|
|
||
| private void transitionState(HoodieInstant fromInstant, HoodieInstant toInstant, Option<byte[]> data) { | ||
| transitionState(fromInstant, toInstant, data, false); | ||
| } | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -48,6 +48,7 @@ public interface HoodieTimeline extends Serializable { | |
| String CLEAN_ACTION = "clean"; | ||
| String ROLLBACK_ACTION = "rollback"; | ||
| String SAVEPOINT_ACTION = "savepoint"; | ||
| String REPLACE_ACTION = "replace"; | ||
| String INFLIGHT_EXTENSION = ".inflight"; | ||
| // With Async Compaction, compaction instant can be in 3 states : | ||
| // (compaction-requested), (compaction-inflight), (completed) | ||
|
|
@@ -57,7 +58,7 @@ public interface HoodieTimeline extends Serializable { | |
|
|
||
| String[] VALID_ACTIONS_IN_TIMELINE = {COMMIT_ACTION, DELTA_COMMIT_ACTION, | ||
| CLEAN_ACTION, SAVEPOINT_ACTION, RESTORE_ACTION, ROLLBACK_ACTION, | ||
| COMPACTION_ACTION}; | ||
| COMPACTION_ACTION, REPLACE_ACTION}; | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. same comment as above |
||
|
|
||
| String COMMIT_EXTENSION = "." + COMMIT_ACTION; | ||
| String DELTA_COMMIT_EXTENSION = "." + DELTA_COMMIT_ACTION; | ||
|
|
@@ -78,6 +79,8 @@ public interface HoodieTimeline extends Serializable { | |
| String INFLIGHT_COMPACTION_EXTENSION = StringUtils.join(".", COMPACTION_ACTION, INFLIGHT_EXTENSION); | ||
| String INFLIGHT_RESTORE_EXTENSION = "." + RESTORE_ACTION + INFLIGHT_EXTENSION; | ||
| String RESTORE_EXTENSION = "." + RESTORE_ACTION; | ||
| String INFLIGHT_REPLACE_EXTENSION = "." + REPLACE_ACTION + INFLIGHT_EXTENSION; | ||
| String REPLACE_EXTENSION = "." + REPLACE_ACTION; | ||
|
|
||
| String INVALID_INSTANT_TS = "0"; | ||
|
|
||
|
|
@@ -126,6 +129,13 @@ public interface HoodieTimeline extends Serializable { | |
| */ | ||
| HoodieTimeline getCommitsAndCompactionTimeline(); | ||
|
|
||
| /** | ||
| * Timeline to just include replace instants that have valid (commit/deltacommit) actions. | ||
| * | ||
| * @return | ||
| */ | ||
| HoodieTimeline getCompletedAndReplaceTimeline(); | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. May be you have used this API in a different PR, but where exactly are we going to use this specific method ?
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yes, this is used to build filesystem view in #1859, can move this to that PR
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Does the return timeline contains only replace timeline ? The naming is confusing to me. How about getValidReplaceTimeline or anything else to reflect the intention |
||
|
|
||
| /** | ||
| * Filter this timeline to just include requested and inflight compaction instants. | ||
| * | ||
|
|
@@ -348,6 +358,14 @@ static String makeInflightRestoreFileName(String instant) { | |
| return StringUtils.join(instant, HoodieTimeline.INFLIGHT_RESTORE_EXTENSION); | ||
| } | ||
|
|
||
| static String makeReplaceFileName(String instant) { | ||
| return StringUtils.join(instant, HoodieTimeline.REPLACE_EXTENSION); | ||
| } | ||
|
|
||
| static String makeInflightReplaceFileName(String instant) { | ||
| return StringUtils.join(instant, HoodieTimeline.INFLIGHT_REPLACE_EXTENSION); | ||
| } | ||
|
|
||
| static String makeDeltaFileName(String instantTime) { | ||
| return instantTime + HoodieTimeline.DELTA_COMMIT_EXTENSION; | ||
| } | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the partitionMetadata contain partitionPath -> new file groups ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be obtained from related commit file. Every t.replace has a corresponding t.[delta]commit (we decided to write two files to reduce overhead of reading all commit files in the query path)
Plus, it helps on query side to keep replace metadata files small. I also didn't want to repeat same information in two places.
Let me know if you think its important to store new file group info here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
High level Question : To make sure we are all on the same page : Is this metadata enough to achieve clustering ? Do you foresee any changes that needs to happen to this metadata to support clustering ? The PR mentions that this is for both clustering and overwrite. Hence, asking this question.