-
Notifications
You must be signed in to change notification settings - Fork 2.5k
[HUDI-3494] Consider triggering condition of MOR compaction during archival #4974
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[HUDI-3494] Consider triggering condition of MOR compaction during archival #4974
Conversation
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/HoodieTimelineArchiver.java
Outdated
Show resolved
Hide resolved
...mon/src/main/java/org/apache/hudi/table/action/compact/ScheduleCompactionActionExecutor.java
Show resolved
Hide resolved
hudi-common/src/main/java/org/apache/hudi/common/util/CompactionUtils.java
Show resolved
Hide resolved
hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/io/TestHoodieTimelineArchiver.java
Show resolved
Hide resolved
hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/io/TestHoodieTimelineArchiver.java
Show resolved
Hide resolved
8f7ed79 to
681207d
Compare
why not just throw exception when |
I think auto adjustment in archival process is better than failing the job in this case. Also, in Hudi 1.x we plan to deprecate archival as the new LSM timeline design can provide fast lookup on much larger number of commits, and the problem in this PR will not be relevant in Hudi 1.x. |
What is the purpose of the pull request
For Hudi MOR table, the scheduling of compaction is triggered under certain conditions, configured by
hoodie.compact.inline.trigger.strategy. The default triggering condition is the number of delta commits, with the config ofhoodie.compact.inline.max.delta.commits. If this setting is larger than the archival config ofhoodie.keep.max.commits, there is not enough delta commits in the active timeline and the compaction will never happen.To guard around such configs, for MOR table with triggering strategy of
NUM_COMMITS(trigger compaction when reach N delta commits) andNUM_AND_TIME(trigger compaction when both NUM_COMMITS and TIME_ELAPSED are satisfied), the archival always makes sure that there are enough delta commits in the active timeline to trigger compaction scheduling, besides other conditions.Brief change log
HoodieTimelineArchiverto make sure that there are enough delta commits in the active timeline to trigger compaction scheduling, when the trigger strategy of compaction is NUM_COMMITS or NUM_AND_TIME.CompactionUtilsand refactorScheduleCompactionActionExecutorto use the same method for checking latest complete compaction and delta commits asHoodieTimelineArchiverVerify this pull request
This PR adds new tests in
TesthoodieTimelineArchiverandTestCompactionUtilsfor the new logic.Committer checklist
Has a corresponding JIRA in PR title & commit
Commit message is descriptive of the change
CI is green
Necessary doc changes done or have another open PR
For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.