Skip to content
Merged
Changes from 6 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -48,10 +48,10 @@ public class ClusteringPlanActionExecutor<T extends HoodieRecordPayload, I, K, O
private final Option<Map<String, String>> extraMetadata;

public ClusteringPlanActionExecutor(HoodieEngineContext context,
HoodieWriteConfig config,
HoodieTable<T, I, K, O> table,
String instantTime,
Option<Map<String, String>> extraMetadata) {
HoodieWriteConfig config,
HoodieTable<T, I, K, O> table,
String instantTime,
Option<Map<String, String>> extraMetadata) {
super(context, config, table, instantTime);
this.extraMetadata = extraMetadata;
}
Expand All @@ -63,6 +63,7 @@ protected Option<HoodieClusteringPlan> createClusteringPlan() {
int commitsSinceLastClustering = table.getActiveTimeline().getCommitsTimeline().filterCompletedInstants()
.findInstantsAfter(lastClusteringInstant.map(HoodieInstant::getTimestamp).orElse("0"), Integer.MAX_VALUE)
.countInstants();

if (config.inlineClusteringEnabled() && config.getInlineClusterMaxCommits() > commitsSinceLastClustering) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This and the condition below guarantee that the clustering is only scheduled based on the max_commits config. @eric9204 could you double check the logic?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yihua yes, this is indeed a redundant inspection, I'm testing whether this condition is needed.

By adding these two conditions, it can really be guaranteed that only one clustering is running at the same time, and if there is no completed clustering, no new clustering plan will be generated.

Configure only these three parameters.

'clustering.schedule.enabled'='true',
'clustering.async.enabled'='false',
'clustering.delta_commits'='6',

    0 2022-09-02 10:38 /tmp/hudi/insert_cow_clustering_12_state_1/.hoodie/.aux
    0 2022-09-02 10:38 /tmp/hudi/insert_cow_clustering_12_state_1/.hoodie/.schema
    0 2022-09-02 10:41 /tmp/hudi/insert_cow_clustering_12_state_1/.hoodie/.temp
2.6 K 2022-09-02 10:39 /tmp/hudi/insert_cow_clustering_12_state_1/.hoodie/20220902103807454.commit
    0 2022-09-02 10:39 /tmp/hudi/insert_cow_clustering_12_state_1/.hoodie/20220902103807454.commit.requested
    0 2022-09-02 10:39 /tmp/hudi/insert_cow_clustering_12_state_1/.hoodie/20220902103807454.inflight
2.6 K 2022-09-02 10:39 /tmp/hudi/insert_cow_clustering_12_state_1/.hoodie/20220902103813399.commit
    0 2022-09-02 10:39 /tmp/hudi/insert_cow_clustering_12_state_1/.hoodie/20220902103813399.commit.requested
    0 2022-09-02 10:39 /tmp/hudi/insert_cow_clustering_12_state_1/.hoodie/20220902103813399.inflight
2.6 K 2022-09-02 10:39 /tmp/hudi/insert_cow_clustering_12_state_1/.hoodie/20220902103823232.commit
    0 2022-09-02 10:39 /tmp/hudi/insert_cow_clustering_12_state_1/.hoodie/20220902103823232.commit.requested
    0 2022-09-02 10:39 /tmp/hudi/insert_cow_clustering_12_state_1/.hoodie/20220902103823232.inflight
2.6 K 2022-09-02 10:39 /tmp/hudi/insert_cow_clustering_12_state_1/.hoodie/20220902103833587.commit
    0 2022-09-02 10:39 /tmp/hudi/insert_cow_clustering_12_state_1/.hoodie/20220902103833587.commit.requested
    0 2022-09-02 10:39 /tmp/hudi/insert_cow_clustering_12_state_1/.hoodie/20220902103833587.inflight
2.6 K 2022-09-02 10:39 /tmp/hudi/insert_cow_clustering_12_state_1/.hoodie/20220902103842538.commit
    0 2022-09-02 10:39 /tmp/hudi/insert_cow_clustering_12_state_1/.hoodie/20220902103842538.commit.requested
    0 2022-09-02 10:39 /tmp/hudi/insert_cow_clustering_12_state_1/.hoodie/20220902103842538.inflight
2.6 K 2022-09-02 10:39 /tmp/hudi/insert_cow_clustering_12_state_1/.hoodie/20220902103856152.commit
    0 2022-09-02 10:39 /tmp/hudi/insert_cow_clustering_12_state_1/.hoodie/20220902103856152.commit.requested
    0 2022-09-02 10:39 /tmp/hudi/insert_cow_clustering_12_state_1/.hoodie/20220902103856152.inflight
3.8 K 2022-09-02 10:39 /tmp/hudi/insert_cow_clustering_12_state_1/.hoodie/20220902103902693.replacecommit.requested
2.6 K 2022-09-02 10:40 /tmp/hudi/insert_cow_clustering_12_state_1/.hoodie/20220902103902852.commit
    0 2022-09-02 10:39 /tmp/hudi/insert_cow_clustering_12_state_1/.hoodie/20220902103902852.commit.requested
    0 2022-09-02 10:39 /tmp/hudi/insert_cow_clustering_12_state_1/.hoodie/20220902103902852.inflight
2.6 K 2022-09-02 10:40 /tmp/hudi/insert_cow_clustering_12_state_1/.hoodie/20220902103912533.commit
    0 2022-09-02 10:40 /tmp/hudi/insert_cow_clustering_12_state_1/.hoodie/20220902103912533.commit.requested
    0 2022-09-02 10:40 /tmp/hudi/insert_cow_clustering_12_state_1/.hoodie/20220902103912533.inflight
2.6 K 2022-09-02 10:40 /tmp/hudi/insert_cow_clustering_12_state_1/.hoodie/20220902103922361.commit
    0 2022-09-02 10:40 /tmp/hudi/insert_cow_clustering_12_state_1/.hoodie/20220902103922361.commit.requested
    0 2022-09-02 10:40 /tmp/hudi/insert_cow_clustering_12_state_1/.hoodie/20220902103922361.inflight
2.6 K 2022-09-02 10:40 /tmp/hudi/insert_cow_clustering_12_state_1/.hoodie/20220902103932640.commit
    0 2022-09-02 10:40 /tmp/hudi/insert_cow_clustering_12_state_1/.hoodie/20220902103932640.commit.requested
    0 2022-09-02 10:40 /tmp/hudi/insert_cow_clustering_12_state_1/.hoodie/20220902103932640.inflight
2.6 K 2022-09-02 10:40 /tmp/hudi/insert_cow_clustering_12_state_1/.hoodie/20220902103943515.commit
    0 2022-09-02 10:40 /tmp/hudi/insert_cow_clustering_12_state_1/.hoodie/20220902103943515.commit.requested
    0 2022-09-02 10:40 /tmp/hudi/insert_cow_clustering_12_state_1/.hoodie/20220902103943515.inflight
2.6 K 2022-09-02 10:40 /tmp/hudi/insert_cow_clustering_12_state_1/.hoodie/20220902103952954.commit
    0 2022-09-02 10:40 /tmp/hudi/insert_cow_clustering_12_state_1/.hoodie/20220902103952954.commit.requested
    0 2022-09-02 10:40 /tmp/hudi/insert_cow_clustering_12_state_1/.hoodie/20220902103952954.inflight
2.6 K 2022-09-02 10:41 /tmp/hudi/insert_cow_clustering_12_state_1/.hoodie/20220902104004110.commit
    0 2022-09-02 10:40 /tmp/hudi/insert_cow_clustering_12_state_1/.hoodie/20220902104004110.commit.requested
    0 2022-09-02 10:40 /tmp/hudi/insert_cow_clustering_12_state_1/.hoodie/20220902104004110.inflight
2.6 K 2022-09-02 10:41 /tmp/hudi/insert_cow_clustering_12_state_1/.hoodie/20220902104013404.commit
    0 2022-09-02 10:41 /tmp/hudi/insert_cow_clustering_12_state_1/.hoodie/20220902104013404.commit.requested
    0 2022-09-02 10:41 /tmp/hudi/insert_cow_clustering_12_state_1/.hoodie/20220902104013404.inflight
2.6 K 2022-09-02 10:41 /tmp/hudi/insert_cow_clustering_12_state_1/.hoodie/20220902104022645.commit
    0 2022-09-02 10:41 /tmp/hudi/insert_cow_clustering_12_state_1/.hoodie/20220902104022645.commit.requested
    0 2022-09-02 10:41 /tmp/hudi/insert_cow_clustering_12_state_1/.hoodie/20220902104022645.inflight
2.6 K 2022-09-02 10:41 /tmp/hudi/insert_cow_clustering_12_state_1/.hoodie/20220902104032517.commit
    0 2022-09-02 10:41 /tmp/hudi/insert_cow_clustering_12_state_1/.hoodie/20220902104032517.commit.requested
    0 2022-09-02 10:41 /tmp/hudi/insert_cow_clustering_12_state_1/.hoodie/20220902104032517.inflight
2.6 K 2022-09-02 10:41 /tmp/hudi/insert_cow_clustering_12_state_1/.hoodie/20220902104042885.commit
    0 2022-09-02 10:41 /tmp/hudi/insert_cow_clustering_12_state_1/.hoodie/20220902104042885.commit.requested
    0 2022-09-02 10:41 /tmp/hudi/insert_cow_clustering_12_state_1/.hoodie/20220902104042885.inflight
2.6 K 2022-09-02 10:41 /tmp/hudi/insert_cow_clustering_12_state_1/.hoodie/20220902104052515.commit
    0 2022-09-02 10:41 /tmp/hudi/insert_cow_clustering_12_state_1/.hoodie/20220902104052515.commit.requested
    0 2022-09-02 10:41 /tmp/hudi/insert_cow_clustering_12_state_1/.hoodie/20220902104052515.inflight
    0 2022-09-02 10:41 /tmp/hudi/insert_cow_clustering_12_state_1/.hoodie/20220902104105238.commit.requested
    0 2022-09-02 10:41 /tmp/hudi/insert_cow_clustering_12_state_1/.hoodie/20220902104105238.inflight
    0 2022-09-02 10:38 /tmp/hudi/insert_cow_clustering_12_state_1/.hoodie/archived
 1005 2022-09-02 10:38 /tmp/hudi/insert_cow_clustering_12_state_1/.hoodie/hoodie.properties

Configure only these three parameters.

'clustering.schedule.enabled'='true',
'clustering.async.enabled'='true',
'clustering.delta_commits'='6',

    0 2022-09-02 10:30 /tmp/hudi/insert_cow_clustering_12_state_2/.hoodie/.aux
    0 2022-09-02 10:30 /tmp/hudi/insert_cow_clustering_12_state_2/.hoodie/.schema
    0 2022-09-02 10:37 /tmp/hudi/insert_cow_clustering_12_state_2/.hoodie/.temp
2.6 K 2022-09-02 10:31 /tmp/hudi/insert_cow_clustering_12_state_2/.hoodie/20220902103029577.commit
    0 2022-09-02 10:31 /tmp/hudi/insert_cow_clustering_12_state_2/.hoodie/20220902103029577.commit.requested
    0 2022-09-02 10:31 /tmp/hudi/insert_cow_clustering_12_state_2/.hoodie/20220902103029577.inflight
2.6 K 2022-09-02 10:31 /tmp/hudi/insert_cow_clustering_12_state_2/.hoodie/20220902103038501.commit
    0 2022-09-02 10:31 /tmp/hudi/insert_cow_clustering_12_state_2/.hoodie/20220902103038501.commit.requested
    0 2022-09-02 10:31 /tmp/hudi/insert_cow_clustering_12_state_2/.hoodie/20220902103038501.inflight
2.6 K 2022-09-02 10:31 /tmp/hudi/insert_cow_clustering_12_state_2/.hoodie/20220902103046820.commit
    0 2022-09-02 10:31 /tmp/hudi/insert_cow_clustering_12_state_2/.hoodie/20220902103046820.commit.requested
    0 2022-09-02 10:31 /tmp/hudi/insert_cow_clustering_12_state_2/.hoodie/20220902103046820.inflight
2.6 K 2022-09-02 10:31 /tmp/hudi/insert_cow_clustering_12_state_2/.hoodie/20220902103056890.commit
    0 2022-09-02 10:31 /tmp/hudi/insert_cow_clustering_12_state_2/.hoodie/20220902103056890.commit.requested
    0 2022-09-02 10:31 /tmp/hudi/insert_cow_clustering_12_state_2/.hoodie/20220902103056890.inflight
2.6 K 2022-09-02 10:32 /tmp/hudi/insert_cow_clustering_12_state_2/.hoodie/20220902103106350.commit
    0 2022-09-02 10:32 /tmp/hudi/insert_cow_clustering_12_state_2/.hoodie/20220902103106350.commit.requested
    0 2022-09-02 10:32 /tmp/hudi/insert_cow_clustering_12_state_2/.hoodie/20220902103106350.inflight
2.6 K 2022-09-02 10:32 /tmp/hudi/insert_cow_clustering_12_state_2/.hoodie/20220902103116318.commit
    0 2022-09-02 10:32 /tmp/hudi/insert_cow_clustering_12_state_2/.hoodie/20220902103116318.commit.requested
    0 2022-09-02 10:32 /tmp/hudi/insert_cow_clustering_12_state_2/.hoodie/20220902103116318.inflight
2.9 K 2022-09-02 10:33 /tmp/hudi/insert_cow_clustering_12_state_2/.hoodie/20220902103126473.replacecommit
    0 2022-09-02 10:32 /tmp/hudi/insert_cow_clustering_12_state_2/.hoodie/20220902103126473.replacecommit.inflight
3.8 K 2022-09-02 10:32 /tmp/hudi/insert_cow_clustering_12_state_2/.hoodie/20220902103126473.replacecommit.requested
2.6 K 2022-09-02 10:32 /tmp/hudi/insert_cow_clustering_12_state_2/.hoodie/20220902103128014.commit
    0 2022-09-02 10:32 /tmp/hudi/insert_cow_clustering_12_state_2/.hoodie/20220902103128014.commit.requested
    0 2022-09-02 10:32 /tmp/hudi/insert_cow_clustering_12_state_2/.hoodie/20220902103128014.inflight
2.6 K 2022-09-02 10:32 /tmp/hudi/insert_cow_clustering_12_state_2/.hoodie/20220902103136075.commit
    0 2022-09-02 10:32 /tmp/hudi/insert_cow_clustering_12_state_2/.hoodie/20220902103136075.commit.requested
    0 2022-09-02 10:32 /tmp/hudi/insert_cow_clustering_12_state_2/.hoodie/20220902103136075.inflight
2.6 K 2022-09-02 10:32 /tmp/hudi/insert_cow_clustering_12_state_2/.hoodie/20220902103146039.commit
    0 2022-09-02 10:32 /tmp/hudi/insert_cow_clustering_12_state_2/.hoodie/20220902103146039.commit.requested
    0 2022-09-02 10:32 /tmp/hudi/insert_cow_clustering_12_state_2/.hoodie/20220902103146039.inflight
2.6 K 2022-09-02 10:33 /tmp/hudi/insert_cow_clustering_12_state_2/.hoodie/20220902103156252.commit
    0 2022-09-02 10:32 /tmp/hudi/insert_cow_clustering_12_state_2/.hoodie/20220902103156252.commit.requested
    0 2022-09-02 10:32 /tmp/hudi/insert_cow_clustering_12_state_2/.hoodie/20220902103156252.inflight
2.6 K 2022-09-02 10:33 /tmp/hudi/insert_cow_clustering_12_state_2/.hoodie/20220902103208166.commit
    0 2022-09-02 10:33 /tmp/hudi/insert_cow_clustering_12_state_2/.hoodie/20220902103208166.commit.requested
    0 2022-09-02 10:33 /tmp/hudi/insert_cow_clustering_12_state_2/.hoodie/20220902103208166.inflight
2.6 K 2022-09-02 10:33 /tmp/hudi/insert_cow_clustering_12_state_2/.hoodie/20220902103217149.commit
    0 2022-09-02 10:33 /tmp/hudi/insert_cow_clustering_12_state_2/.hoodie/20220902103217149.commit.requested
    0 2022-09-02 10:33 /tmp/hudi/insert_cow_clustering_12_state_2/.hoodie/20220902103217149.inflight
2.9 K 2022-09-02 10:34 /tmp/hudi/insert_cow_clustering_12_state_2/.hoodie/20220902103226275.replacecommit
    0 2022-09-02 10:33 /tmp/hudi/insert_cow_clustering_12_state_2/.hoodie/20220902103226275.replacecommit.inflight
4.0 K 2022-09-02 10:33 /tmp/hudi/insert_cow_clustering_12_state_2/.hoodie/20220902103226275.replacecommit.requested
2.6 K 2022-09-02 10:33 /tmp/hudi/insert_cow_clustering_12_state_2/.hoodie/20220902103226480.commit
    0 2022-09-02 10:33 /tmp/hudi/insert_cow_clustering_12_state_2/.hoodie/20220902103226480.commit.requested
    0 2022-09-02 10:33 /tmp/hudi/insert_cow_clustering_12_state_2/.hoodie/20220902103226480.inflight
2.6 K 2022-09-02 10:33 /tmp/hudi/insert_cow_clustering_12_state_2/.hoodie/20220902103236458.commit
    0 2022-09-02 10:33 /tmp/hudi/insert_cow_clustering_12_state_2/.hoodie/20220902103236458.commit.requested
    0 2022-09-02 10:33 /tmp/hudi/insert_cow_clustering_12_state_2/.hoodie/20220902103236458.inflight
2.6 K 2022-09-02 10:33 /tmp/hudi/insert_cow_clustering_12_state_2/.hoodie/20220902103246596.commit
    0 2022-09-02 10:33 /tmp/hudi/insert_cow_clustering_12_state_2/.hoodie/20220902103246596.commit.requested
    0 2022-09-02 10:33 /tmp/hudi/insert_cow_clustering_12_state_2/.hoodie/20220902103246596.inflight
2.6 K 2022-09-02 10:33 /tmp/hudi/insert_cow_clustering_12_state_2/.hoodie/20220902103256288.commit
    0 2022-09-02 10:33 /tmp/hudi/insert_cow_clustering_12_state_2/.hoodie/20220902103256288.commit.requested
    0 2022-09-02 10:33 /tmp/hudi/insert_cow_clustering_12_state_2/.hoodie/20220902103256288.inflight
2.2 K 2022-09-02 10:33 /tmp/hudi/insert_cow_clustering_12_state_2/.hoodie/20220902103257588.clean
2.4 K 2022-09-02 10:33 /tmp/hudi/insert_cow_clustering_12_state_2/.hoodie/20220902103257588.clean.inflight
2.4 K 2022-09-02 10:33 /tmp/hudi/insert_cow_clustering_12_state_2/.hoodie/20220902103257588.clean.requested
2.6 K 2022-09-02 10:34 /tmp/hudi/insert_cow_clustering_12_state_2/.hoodie/20220902103306174.commit
    0 2022-09-02 10:33 /tmp/hudi/insert_cow_clustering_12_state_2/.hoodie/20220902103306174.commit.requested
    0 2022-09-02 10:33 /tmp/hudi/insert_cow_clustering_12_state_2/.hoodie/20220902103306174.inflight
2.6 K 2022-09-02 10:34 /tmp/hudi/insert_cow_clustering_12_state_2/.hoodie/20220902103316779.commit
    0 2022-09-02 10:34 /tmp/hudi/insert_cow_clustering_12_state_2/.hoodie/20220902103316779.commit.requested
    0 2022-09-02 10:34 /tmp/hudi/insert_cow_clustering_12_state_2/.hoodie/20220902103316779.inflight
2.6 K 2022-09-02 10:34 /tmp/hudi/insert_cow_clustering_12_state_2/.hoodie/20220902103327090.commit
    0 2022-09-02 10:34 /tmp/hudi/insert_cow_clustering_12_state_2/.hoodie/20220902103327090.commit.requested
    0 2022-09-02 10:34 /tmp/hudi/insert_cow_clustering_12_state_2/.hoodie/20220902103327090.inflight
2.6 K 2022-09-02 10:34 /tmp/hudi/insert_cow_clustering_12_state_2/.hoodie/20220902103336441.commit
    0 2022-09-02 10:34 /tmp/hudi/insert_cow_clustering_12_state_2/.hoodie/20220902103336441.commit.requested
    0 2022-09-02 10:34 /tmp/hudi/insert_cow_clustering_12_state_2/.hoodie/20220902103336441.inflight
3.0 K 2022-09-02 10:36 /tmp/hudi/insert_cow_clustering_12_state_2/.hoodie/20220902103346469.replacecommit
    0 2022-09-02 10:34 /tmp/hudi/insert_cow_clustering_12_state_2/.hoodie/20220902103346469.replacecommit.inflight
4.4 K 2022-09-02 10:34 /tmp/hudi/insert_cow_clustering_12_state_2/.hoodie/20220902103346469.replacecommit.requested
2.6 K 2022-09-02 10:34 /tmp/hudi/insert_cow_clustering_12_state_2/.hoodie/20220902103346978.commit
    0 2022-09-02 10:34 /tmp/hudi/insert_cow_clustering_12_state_2/.hoodie/20220902103346978.commit.requested
    0 2022-09-02 10:34 /tmp/hudi/insert_cow_clustering_12_state_2/.hoodie/20220902103346978.inflight
2.6 K 2022-09-02 10:34 /tmp/hudi/insert_cow_clustering_12_state_2/.hoodie/20220902103356123.commit
    0 2022-09-02 10:34 /tmp/hudi/insert_cow_clustering_12_state_2/.hoodie/20220902103356123.commit.requested
    0 2022-09-02 10:34 /tmp/hudi/insert_cow_clustering_12_state_2/.hoodie/20220902103356123.inflight
2.6 K 2022-09-02 10:35 /tmp/hudi/insert_cow_clustering_12_state_2/.hoodie/20220902103406453.commit
    0 2022-09-02 10:35 /tmp/hudi/insert_cow_clustering_12_state_2/.hoodie/20220902103406453.commit.requested
    0 2022-09-02 10:35 /tmp/hudi/insert_cow_clustering_12_state_2/.hoodie/20220902103406453.inflight
2.6 K 2022-09-02 10:35 /tmp/hudi/insert_cow_clustering_12_state_2/.hoodie/20220902103417218.commit
    0 2022-09-02 10:35 /tmp/hudi/insert_cow_clustering_12_state_2/.hoodie/20220902103417218.commit.requested
    0 2022-09-02 10:35 /tmp/hudi/insert_cow_clustering_12_state_2/.hoodie/20220902103417218.inflight
2.4 K 2022-09-02 10:34 /tmp/hudi/insert_cow_clustering_12_state_2/.hoodie/20220902103427314.clean
2.5 K 2022-09-02 10:34 /tmp/hudi/insert_cow_clustering_12_state_2/.hoodie/20220902103427314.clean.inflight
2.5 K 2022-09-02 10:34 /tmp/hudi/insert_cow_clustering_12_state_2/.hoodie/20220902103427314.clean.requested
2.6 K 2022-09-02 10:35 /tmp/hudi/insert_cow_clustering_12_state_2/.hoodie/20220902103427491.commit
    0 2022-09-02 10:35 /tmp/hudi/insert_cow_clustering_12_state_2/.hoodie/20220902103427491.commit.requested
    0 2022-09-02 10:35 /tmp/hudi/insert_cow_clustering_12_state_2/.hoodie/20220902103427491.inflight
2.6 K 2022-09-02 10:35 /tmp/hudi/insert_cow_clustering_12_state_2/.hoodie/20220902103438027.commit
    0 2022-09-02 10:35 /tmp/hudi/insert_cow_clustering_12_state_2/.hoodie/20220902103438027.commit.requested
    0 2022-09-02 10:35 /tmp/hudi/insert_cow_clustering_12_state_2/.hoodie/20220902103438027.inflight
2.6 K 2022-09-02 10:35 /tmp/hudi/insert_cow_clustering_12_state_2/.hoodie/20220902103446426.commit
    0 2022-09-02 10:35 /tmp/hudi/insert_cow_clustering_12_state_2/.hoodie/20220902103446426.commit.requested
    0 2022-09-02 10:35 /tmp/hudi/insert_cow_clustering_12_state_2/.hoodie/20220902103446426.inflight
2.6 K 2022-09-02 10:36 /tmp/hudi/insert_cow_clustering_12_state_2/.hoodie/20220902103500190.commit
    0 2022-09-02 10:35 /tmp/hudi/insert_cow_clustering_12_state_2/.hoodie/20220902103500190.commit.requested
    0 2022-09-02 10:35 /tmp/hudi/insert_cow_clustering_12_state_2/.hoodie/20220902103500190.inflight
2.6 K 2022-09-02 10:36 /tmp/hudi/insert_cow_clustering_12_state_2/.hoodie/20220902103507122.commit
    0 2022-09-02 10:36 /tmp/hudi/insert_cow_clustering_12_state_2/.hoodie/20220902103507122.commit.requested
    0 2022-09-02 10:36 /tmp/hudi/insert_cow_clustering_12_state_2/.hoodie/20220902103507122.inflight
2.6 K 2022-09-02 10:36 /tmp/hudi/insert_cow_clustering_12_state_2/.hoodie/20220902103516798.commit
    0 2022-09-02 10:36 /tmp/hudi/insert_cow_clustering_12_state_2/.hoodie/20220902103516798.commit.requested
    0 2022-09-02 10:36 /tmp/hudi/insert_cow_clustering_12_state_2/.hoodie/20220902103516798.inflight
2.6 K 2022-09-02 10:36 /tmp/hudi/insert_cow_clustering_12_state_2/.hoodie/20220902103526552.commit
    0 2022-09-02 10:36 /tmp/hudi/insert_cow_clustering_12_state_2/.hoodie/20220902103526552.commit.requested
    0 2022-09-02 10:36 /tmp/hudi/insert_cow_clustering_12_state_2/.hoodie/20220902103526552.inflight
2.6 K 2022-09-02 10:36 /tmp/hudi/insert_cow_clustering_12_state_2/.hoodie/20220902103536698.commit
    0 2022-09-02 10:36 /tmp/hudi/insert_cow_clustering_12_state_2/.hoodie/20220902103536698.commit.requested
    0 2022-09-02 10:36 /tmp/hudi/insert_cow_clustering_12_state_2/.hoodie/20220902103536698.inflight
    0 2022-09-02 10:36 /tmp/hudi/insert_cow_clustering_12_state_2/.hoodie/20220902103546517.replacecommit.inflight
5.3 K 2022-09-02 10:36 /tmp/hudi/insert_cow_clustering_12_state_2/.hoodie/20220902103546517.replacecommit.requested
2.6 K 2022-09-02 10:36 /tmp/hudi/insert_cow_clustering_12_state_2/.hoodie/20220902103546726.commit
    0 2022-09-02 10:36 /tmp/hudi/insert_cow_clustering_12_state_2/.hoodie/20220902103546726.commit.requested
    0 2022-09-02 10:36 /tmp/hudi/insert_cow_clustering_12_state_2/.hoodie/20220902103546726.inflight
2.6 K 2022-09-02 10:36 /tmp/hudi/insert_cow_clustering_12_state_2/.hoodie/20220902103556430.commit
    0 2022-09-02 10:36 /tmp/hudi/insert_cow_clustering_12_state_2/.hoodie/20220902103556430.commit.requested
    0 2022-09-02 10:36 /tmp/hudi/insert_cow_clustering_12_state_2/.hoodie/20220902103556430.inflight
2.6 K 2022-09-02 10:37 /tmp/hudi/insert_cow_clustering_12_state_2/.hoodie/20220902103606316.commit
    0 2022-09-02 10:36 /tmp/hudi/insert_cow_clustering_12_state_2/.hoodie/20220902103606316.commit.requested
    0 2022-09-02 10:36 /tmp/hudi/insert_cow_clustering_12_state_2/.hoodie/20220902103606316.inflight
2.6 K 2022-09-02 10:37 /tmp/hudi/insert_cow_clustering_12_state_2/.hoodie/20220902103618745.commit
    0 2022-09-02 10:37 /tmp/hudi/insert_cow_clustering_12_state_2/.hoodie/20220902103618745.commit.requested
    0 2022-09-02 10:37 /tmp/hudi/insert_cow_clustering_12_state_2/.hoodie/20220902103618745.inflight
2.6 K 2022-09-02 10:36 /tmp/hudi/insert_cow_clustering_12_state_2/.hoodie/20220902103627453.clean
2.9 K 2022-09-02 10:36 /tmp/hudi/insert_cow_clustering_12_state_2/.hoodie/20220902103627453.clean.inflight
2.9 K 2022-09-02 10:36 /tmp/hudi/insert_cow_clustering_12_state_2/.hoodie/20220902103627453.clean.requested
    0 2022-09-02 10:37 /tmp/hudi/insert_cow_clustering_12_state_2/.hoodie/20220902103627564.commit.requested
    0 2022-09-02 10:37 /tmp/hudi/insert_cow_clustering_12_state_2/.hoodie/20220902103627564.inflight
    0 2022-09-02 10:30 /tmp/hudi/insert_cow_clustering_12_state_2/.hoodie/archived
 1005 2022-09-02 10:30 /tmp/hudi/insert_cow_clustering_12_state_2/.hoodie/hoodie.properties

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, so the issue we are trying to solve is:

there is a regular writer which just schedules clustering and we have a async clustering job which does the execution of clustering.

if clustering is pending (may be will be executed by an async clustering job), every new successful commit with regular writer will keep adding new replacecommit.requested.

If yes, then the fix makes sense to me.
@yihua @danny0405 : wdyt.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but one thing which I am finding it hard to comprehend is. wrt clustering, either both planning and execution is inline. or both are async atleast wrt spark datasource writer. So, not sure how the user ended up where clustering was just scheduled w/o getting to completion.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the fix makes sense to me too.

but one thing which I am finding it hard to comprehend

Flink writer schedules a clustering plan on each successful regular commit and there is a async pipeline that executes the clustering continuously, this patch can solve the problem that the clustering plan schedules too frequently if there is pending clustering.

So, +1 from my side.

LOG.info("Not scheduling inline clustering as only " + commitsSinceLastClustering
+ " commits was found since last clustering " + lastClusteringInstant + ". Waiting for "
Expand All @@ -77,11 +78,14 @@ protected Option<HoodieClusteringPlan> createClusteringPlan() {
return Option.empty();
}

LOG.info("Generating clustering plan for table " + config.getBasePath());
ClusteringPlanStrategy strategy = (ClusteringPlanStrategy)
ReflectionUtils.loadClass(ClusteringPlanStrategy.checkAndGetClusteringPlanStrategy(config), table, context, config);
ClusteringPlanStrategy strategy = null;
if (config.getAsyncClusterMaxCommits() <= commitsSinceLastClustering) {
LOG.info("Generating clustering plan for table " + config.getBasePath());
strategy = (ClusteringPlanStrategy)
ReflectionUtils.loadClass(ClusteringPlanStrategy.checkAndGetClusteringPlanStrategy(config), table, context, config);
}

return strategy.generateClusteringPlan();
return strategy == null ? Option.empty() : strategy.generateClusteringPlan();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eric9204 This does not seem to solve the problem you mentioned, around frequent clustering scheduling. This only avoids NPE.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yihua yes, because my last commit deleted the following condition,which can avoid the frequent clustering scheduling.

if (table.getActiveTimeline().filterPendingReplaceTimeline().countInstants() != 0) {
      LOG.info("The last clustering is running,there is no need to generate a new clustering plan" + config.getBasePath());
      return Option.empty();
    }

}

@Override
Expand Down