Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -706,7 +706,20 @@ protected void compactIfNecessary(AbstractHoodieWriteClient writeClient, String
}
}

protected void doClean(AbstractHoodieWriteClient writeClient, String instantTime) {
protected void cleanIfNecessary(AbstractHoodieWriteClient writeClient, String instantTime) {
Option<HoodieInstant> lastCompletedCompactionInstant = metadataMetaClient.reloadActiveTimeline()
.getCommitTimeline().filterCompletedInstants().lastInstant();
if (lastCompletedCompactionInstant.isPresent()
&& metadataMetaClient.getActiveTimeline().filterCompletedInstants()
.findInstantsAfter(lastCompletedCompactionInstant.get().getTimestamp()).countInstants() < 3) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pull the 3 into a final variable?

// do not clean the log files immediately after compaction to give some buffer time for metadata table reader,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can happen only if cleaner is very aggressive right? i.e. retain just 1 commit. for older file slices, we should not encounter this. correct me if I am wrong.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now the default compaction delta commits is 10, the min retain commits is 20, the max retain commits is 30.

When the second compaction schedules and triggers, there are about 22 commits on the timeline, which would then trigger the cleaning immediately.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

min and max commits is for archival. cleaner is controlled by hoodie.metadata.cleaner.commits.retained which is 3. But I am good with the change.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this problem should also exist in the MOR table data path? Is there any solution there?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess so, we need some protection logic too for the MOR table log files or even COW table parquet files, because if we write COW table very frequently (like each 10 seconds), the reader would very probably encounter FileNotFoundException .

// because there is case that the reader has prepared for the log file readers already before the compaction completes
// while before/during the reading of the log files, the cleaning triggers and delete the reading files,
// then a FileNotFoundException(for LogFormatReader) or NPE(for HFileReader) would throw.

// 3 is a value that I think is enough for metadata table reader.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's this based on?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a subjective value that i think the metadata table reads fast enough within 3 transaction interval. Do you have better suggestion ?

return;
}
// Trigger cleaning with suffixes based on the same instant time. This ensures that any future
// delta commits synced over will not have an instant time lesser than the last completed instant on the
// metadata table.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -139,7 +139,7 @@ protected void commit(HoodieData<HoodieRecord> hoodieDataRecords, String partiti
metadataMetaClient.reloadActiveTimeline();
if (canTriggerTableService) {
compactIfNecessary(writeClient, instantTime);
doClean(writeClient, instantTime);
cleanIfNecessary(writeClient, instantTime);
writeClient.archive();
}
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -154,7 +154,7 @@ protected void commit(HoodieData<HoodieRecord> hoodieDataRecords, String partiti
metadataMetaClient.reloadActiveTimeline();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reloadActiveTimelice called here so not necessary in ccleanIfNeceasry/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

compactIfNecessary may trigger compaction so the timeline changes.

if (canTriggerTableService) {
compactIfNecessary(writeClient, instantTime);
doClean(writeClient, instantTime);
cleanIfNecessary(writeClient, instantTime);
writeClient.archive();
}
}
Expand Down