[HUDI-5278] Support more conf to cluster procedure#7304
Conversation
627aedf to
1b87e59
Compare
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieClusteringConfig.java
Outdated
Show resolved
Hide resolved
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieClusteringConfig.java
Show resolved
Hide resolved
...ark/src/main/scala/org/apache/spark/sql/hudi/command/procedures/RunClusteringProcedure.scala
Outdated
Show resolved
Hide resolved
| assert(1 == metaClient.getActiveTimeline.getCompletedReplaceTimeline.getInstants.count()) | ||
| assert(0 == metaClient.getActiveTimeline.filterPendingReplaceTimeline().getInstants.count()) | ||
|
|
||
| spark.sql(s"call run_clustering(table => '$tableName')") |
There was a problem hiding this comment.
missing test case for scheduleandexecute and invalid op?
There was a problem hiding this comment.
scheduleandexecute is default, I will add invalid op case
There was a problem hiding this comment.
oh, it already has invalid case checkExceptionContain(s"call run_clustering(table => '$tableName', op => 'null')")("Invalid value")
| /** | ||
| * only execute then pending clustering plans | ||
| */ | ||
| EXECUTE("execute"), |
There was a problem hiding this comment.
can we execute specific pending clustering plan instead of all pending clustering plans?
|
|
||
| pendingClustering = instantsStr match { | ||
| case Some(inst) => | ||
| operator = ClusteringOperator.EXECUTE |
There was a problem hiding this comment.
here why we need set operator to EXECUTE but in line#144 we do not need?
There was a problem hiding this comment.
Because the user does not specify the instants
There was a problem hiding this comment.
I think we need check if users specify the instants with SCHEDULE and SCHEDULE_AND_EXECUTE, we should throw exception instead of set it to EXECUTE when specify instants.
There was a problem hiding this comment.
please put the line operator = ClusteringOperator.EXECUTE below the line logInfo("No op") and please change logInfo("No op") to logInfo("No op and set it to EXECUTE with instants specified.")
There was a problem hiding this comment.
the line can not put below logInfo("No op"), operator default is scheduleAndExecute, if user specific instants, it need be set to execute after check
|
@leesf can you help me resolve this issue, after lastest success ci code, the code has not change any all, and my local compile is success |
|
@KnightChess would you please rebase to latest master as I see the master has some code changed. |
e839fd6 to
728191c
Compare
|
@leesf done, and I update the test to resolve Java CI compile eroor, but I still don't know why |
728191c to
20a511d
Compare
|
I will reopen ci after #7319 be merged |
20a511d to
3f8f9a4
Compare
|
Don't merge, I meet some question in cluster |
look like is something bug in our Internal version, open version is ok, no blocked |
|
@hudi-bot run azure |
|
@leesf @stream2000 thanks for review |
| """.stripMargin) | ||
|
|
||
| val fileNum = 20 | ||
| val numRecords = 400000 |
There was a problem hiding this comment.
Hey @KnightChess, do we need so many files and records per file for this test? This test currently could cost much time.



Change Logs
spark sql cluster procedure support new params: op, order_strategy, options
Impact
none
Risk level (write none, low medium or high below)
low
Documentation Update
will open doc pr to update
Contributor's checklist