Skip to content

Conversation

@zhuanshenbsj1
Copy link
Contributor

@zhuanshenbsj1 zhuanshenbsj1 commented Jun 22, 2023

Change Logs

Illustration of issue before fix

Incremental has a sliding window that bounds it's cleaning partition as shown below. If there is an async table service action as shown in the illustration below, partitions might be skipped if it falls behind the cleaning window.

image

Note: The illustration might not be the entirely correct in the determination of earliest commit to retain, but it does give a general illustration of the sliding window.

Fix

In #7568, clean will be blocked by any pending action. As such, by factoring COMPACTION actions into the active timeline, the sliding window of incremental range will be bounded correctly to not ahead of any pending writes made via compaction.

image

Impact

Describe any public API or user-facing feature change or any performance impact.

Risk level (write none, low medium or high below)

none

Documentation Update

Describe any necessary documentation update if there is any new feature, config, or user-facing change

  • The config description must be updated if new configs are added or the default value of the configs are changed
  • Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the
    ticket number here and follow the instruction to make
    changes to the website.

Contributor's checklist

  • Read through contributor's guide
  • Change Logs and Impact were stated clearly
  • Adequate tests were added if applicable
  • CI passed

public Option<HoodieInstant> getEarliestCommitToRetain() {
return CleanerUtils.getEarliestCommitToRetain(
hoodieTable.getMetaClient().getActiveTimeline().getCommitsTimeline(),
hoodieTable.getMetaClient().getActiveTimeline().getCommitsAndMergesTimeline(),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously, there was no consideration of inflight compaction instant. When partitioning by time, if the inflight compaction instant is in the previous partition and is skipped for incremental cleaning, it will cause the log file not be cleaned up.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch ~

@danny0405 danny0405 added the area:table-service Table services label Jun 26, 2023
*/
public HoodieTimeline getCommitsAndMergesTimeline() {
return getTimelineOfActions(CollectionUtils.createSet(COMMIT_ACTION, DELTA_COMMIT_ACTION, REPLACE_COMMIT_ACTION, COMPACTION_ACTION));
}
Copy link
Contributor

@danny0405 danny0405 Jun 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getCommitsAndMergesTimeline -> getCommitsAndCompactionTimeline

Can we also add a test case for this incremental cleaning scenario, where partition path got switched and the old partition files could not be cleaned.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@zhuanshenbsj1 zhuanshenbsj1 force-pushed the HUDI-6423 branch 2 times, most recently from a65a29c to 34f8823 Compare June 29, 2023 06:19
@zhuanshenbsj1
Copy link
Contributor Author

@hudi-bot run azure

1 similar comment
@danny0405
Copy link
Contributor

@hudi-bot run azure

@zhuanshenbsj1
Copy link
Contributor Author

@danny0405 danny0405 added release-0.14.0 priority:blocker Production down; release blocker labels Jul 4, 2023
Copy link
Contributor

@danny0405 danny0405 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1,cc @yihua for visibility ~

HoodieTableMetadataWriter metadataWriter = SparkHoodieBackedTableMetadataWriter.create(hadoopConf, config, context);

final String partition = "2016/03/15";
String timePrefix = "00000000000";
Copy link
Contributor Author

@zhuanshenbsj1 zhuanshenbsj1 Jul 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clean instant is after instant-000 before adjustment, this is unreasonable, it should be at the end

before adjustment:
localTimeline=[[000__commit__COMPLETED__20230704200732288], [000000001__clean__COMPLETED__20230704200742843], [001__commit__COMPLETED__20230704200733129], [003__commit__COMPLETED__20230704200734117], [==>004__compaction__REQUESTED__20230704200734125], [005__commit__COMPLETED__20230704200734948], [0055__commit__COMPLETED__20230704200735880], [==>006__compaction__REQUESTED__20230704200735885], [007__commit__COMPLETED__20230704200736807], [0075__commit__COMPLETED__20230704200737690], [==>008__compaction__REQUESTED__20230704200737694], [009__commit__COMPLETED__20230704200738629], [0095__commit__COMPLETED__20230704200739576],[==>010__compaction__REQUESTED__20230704200739580], [011__commit__COMPLETED__20230704200740426], [013__commit__COMPLETED__20230704200741363]

after adjustment:
localTimeline=[[00000000000000__commit__COMPLETED__20230704200400940], [00000000000001__commit__COMPLETED__20230704200401790], [00000000000003__commit__COMPLETED__20230704200402888], [==>00000000000004__compaction__REQUESTED__20230704200402896], [00000000000005__commit__COMPLETED__20230704200403841], [000000000000055__commit__COMPLETED__20230704200404879], [==>00000000000006__compaction__REQUESTED__20230704200404883], [00000000000007__commit__COMPLETED__20230704200405861], [000000000000075__commit__COMPLETED__20230704200406790], [==>00000000000008__compaction__REQUESTED__20230704200406797], [00000000000009__commit__COMPLETED__20230704200407808], [000000000000095__commit__COMPLETED__20230704200408834], [==>00000000000010__compaction__REQUESTED__20230704200408839], [00000000000011__commit__COMPLETED__20230704200410653], [00000000000013__commit__COMPLETED__20230704200411934], [00000000000014__clean__COMPLETED__20230704200413695]]

@zhuanshenbsj1
Copy link
Contributor Author

@hudi-bot run azure

@zhuanshenbsj1 zhuanshenbsj1 force-pushed the HUDI-6423 branch 2 times, most recently from 5b354dd to 68d67e5 Compare July 4, 2023 15:12
@hudi-bot
Copy link
Collaborator

hudi-bot commented Jul 4, 2023

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

@voonhous
Copy link
Member

Description and context please...

Illustration of issue before fix

Incremental has a sliding window that bounds it's cleaning partition as shown below. If there is an async table service action as shown in the illustration below, partitions might be skipped if it falls behind the cleaning window.

image

Note: The illustration might not be the entirely correct in the determination of earliest commit to retain, but it does give a general illustration of the sliding window.

Fix

In #7568, clean will be blocked by any pending action. As such, by factoring COMPACTION actions into the active timeline, the sliding window of incremental range will be bounded correctly to not ahead of any pending writes made via compaction.

image

@zhuanshenbsj1
Copy link
Contributor Author

Thanks for the description of the scene~~ Added to current pr change-logs. @voonhous

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:table-service Table services priority:blocker Production down; release blocker release-0.14.0

Projects

Status: ✅ Done

Development

Successfully merging this pull request may close these issues.

5 participants