[HUDI-3045] New clustering regex match config to choose partitions when building clustering plan#4346
Conversation
|
@hudi-bot run azure |
|
@hudi-bot run azure |
1 similar comment
|
@hudi-bot run azure |
vinothchandar
left a comment
There was a problem hiding this comment.
Thanks for the good contribution. Could we just add this as a config instead of a new clustering strategy. I think this can be useful more broadly even.
|
Ack! Will do it asap :) |
|
Thanks a lot for your attention @yihua and @vinothchandar. Just add a new config to do regex pattern match. |
yihua
left a comment
There was a problem hiding this comment.
LGTM overall. Left a couple of nits.
| .withDocumentation("Files smaller than the size specified here are candidates for clustering"); | ||
|
|
||
| public static final ConfigProperty<String> PARTITION_REGEX_PATTERN = ConfigProperty | ||
| .key(CLUSTERING_STRATEGY_PARAM_PREFIX + "cluster.partition.regex.pattern") |
There was a problem hiding this comment.
nit: partition.regex.pattern?
|
|
||
| @Test | ||
| public void testFilterPartitionPaths() { | ||
| PartitionAwareClusteringPlanStrategy sg = new DummyPartitionAwareClusteringPlanStrategy(table, context, hoodieWriteConfig); |
There was a problem hiding this comment.
nit: better variable naming here?
There was a problem hiding this comment.
Changed. Thanks a lot for your review :)
|
@yihua I feel it would be better to add a new option in |
…en building clustering plan (apache#4346) Co-authored-by: yuezhang <yuezhang@freewheel.tv>
Yes, that could allow more flexible filtering. @zhangyue19921010 @YuweiXiao do either of you want to take a stab at this before 0.11.0 release? |
|
Sure, pick it up |
…en building clustering plan (apache#4346) Co-authored-by: yuezhang <yuezhang@freewheel.tv>
…en building clustering plan (apache#4346) Co-authored-by: yuezhang <yuezhang@freewheel.tv>
What is the purpose of the pull request
https://issues.apache.org/jira/browse/HUDI-3045
new ClusteringPlanStrategy to use regex choose partitions when building clustering plan
Brief change log
(for example:)
Verify this pull request
(Please pick either of the following options)
This pull request is a trivial rework / code cleanup without any test coverage.
(or)
This pull request is already covered by existing tests, such as (please describe tests).
(or)
This change added tests and can be verified as follows:
(example:)
Committer checklist
Has a corresponding JIRA in PR title & commit
Commit message is descriptive of the change
CI is green
Necessary doc changes done or have another open PR
For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.