Skip to content

Conversation

@yihua
Copy link
Contributor

@yihua yihua commented Mar 7, 2022

What is the purpose of the pull request

For Hudi MOR table, the scheduling of compaction is triggered under certain conditions, configured by hoodie.compact.inline.trigger.strategy. The default triggering condition is the number of delta commits, with the config of hoodie.compact.inline.max.delta.commits. If this setting is larger than the archival config of hoodie.keep.max.commits, there is not enough delta commits in the active timeline and the compaction will never happen.

To guard around such configs, for MOR table with triggering strategy of NUM_COMMITS (trigger compaction when reach N delta commits) and NUM_AND_TIME (trigger compaction when both NUM_COMMITS and TIME_ELAPSED are satisfied), the archival always makes sure that there are enough delta commits in the active timeline to trigger compaction scheduling, besides other conditions.

Brief change log

  • Add new logic in HoodieTimelineArchiver to make sure that there are enough delta commits in the active timeline to trigger compaction scheduling, when the trigger strategy of compaction is NUM_COMMITS or NUM_AND_TIME.
  • Add util methods in CompactionUtils and refactor ScheduleCompactionActionExecutor to use the same method for checking latest complete compaction and delta commits as HoodieTimelineArchiver
  • Fix an issue of checking delta commits in active timeline when the timeline does not have any delta commit
  • Add new tests for the new logic.

Verify this pull request

This PR adds new tests in TesthoodieTimelineArchiver and TestCompactionUtils for the new logic.

Committer checklist

  • Has a corresponding JIRA in PR title & commit

  • Commit message is descriptive of the change

  • CI is green

  • Necessary doc changes done or have another open PR

  • For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.

@nsivabalan nsivabalan self-assigned this Mar 8, 2022
@nsivabalan nsivabalan added the priority:critical Production degraded; pipelines stalled label Mar 8, 2022
@yihua yihua changed the title [HUDI-3449] Consider triggering condition of MOR compaction during archival [HUDI-3494] Consider triggering condition of MOR compaction during archival Mar 8, 2022
@apache apache deleted a comment from hudi-bot Mar 8, 2022
@nsivabalan nsivabalan added priority:blocker Production down; release blocker and removed priority:critical Production degraded; pipelines stalled labels Mar 8, 2022
@yihua yihua force-pushed the HUDI-3449-mor-compaction-archival branch from 8f7ed79 to 681207d Compare March 17, 2022 00:29
@hudi-bot
Copy link
Collaborator

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

@nsivabalan nsivabalan merged commit 5ba2d9a into apache:master Mar 17, 2022
vingov pushed a commit to vingov/hudi that referenced this pull request Apr 3, 2022
stayrascal pushed a commit to stayrascal/hudi that referenced this pull request Apr 12, 2022
@Zouxxyy
Copy link
Contributor

Zouxxyy commented Aug 10, 2023

The default triggering condition is the number of delta commits, with the config of hoodie.compact.inline.max.delta.commits. If this setting is larger than the archival config of hoodie.keep.max.commits, there is not enough delta commits in the active timeline and the compaction will never happen.

why not just throw exception when hoodie.compact.inline.max.delta.commits > hoodie.keep.max.commits

@yihua
Copy link
Contributor Author

yihua commented Sep 24, 2023

The default triggering condition is the number of delta commits, with the config of hoodie.compact.inline.max.delta.commits. If this setting is larger than the archival config of hoodie.keep.max.commits, there is not enough delta commits in the active timeline and the compaction will never happen.

why not just throw exception when hoodie.compact.inline.max.delta.commits > hoodie.keep.max.commits

I think auto adjustment in archival process is better than failing the job in this case. Also, in Hudi 1.x we plan to deprecate archival as the new LSM timeline design can provide fast lookup on much larger number of commits, and the problem in this PR will not be relevant in Hudi 1.x.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

priority:blocker Production down; release blocker

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants