Skip to content

[HUDI-5278] Support more conf to cluster procedure#7304

Merged
leesf merged 1 commit intoapache:masterfrom
KnightChess:more-conf-cluster-procedure
Nov 30, 2022
Merged

[HUDI-5278] Support more conf to cluster procedure#7304
leesf merged 1 commit intoapache:masterfrom
KnightChess:more-conf-cluster-procedure

Conversation

@KnightChess
Copy link
Copy Markdown
Contributor

Change Logs

spark sql cluster procedure support new params: op, order_strategy, options

Impact

none

Risk level (write none, low medium or high below)

low

Documentation Update

will open doc pr to update

Contributor's checklist

  • Read through contributor's guide
  • Change Logs and Impact were stated clearly
  • Adequate tests were added if applicable
  • CI passed

@KnightChess KnightChess force-pushed the more-conf-cluster-procedure branch from 627aedf to 1b87e59 Compare November 25, 2022 11:52
assert(1 == metaClient.getActiveTimeline.getCompletedReplaceTimeline.getInstants.count())
assert(0 == metaClient.getActiveTimeline.filterPendingReplaceTimeline().getInstants.count())

spark.sql(s"call run_clustering(table => '$tableName')")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing test case for scheduleandexecute and invalid op?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

scheduleandexecute is default, I will add invalid op case

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, it already has invalid case checkExceptionContain(s"call run_clustering(table => '$tableName', op => 'null')")("Invalid value")

/**
* only execute then pending clustering plans
*/
EXECUTE("execute"),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we execute specific pending clustering plan instead of all pending clustering plans?

@leesf leesf self-assigned this Nov 26, 2022

pendingClustering = instantsStr match {
case Some(inst) =>
operator = ClusteringOperator.EXECUTE
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here why we need set operator to EXECUTE but in line#144 we do not need?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the user does not specify the instants

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need check if users specify the instants with SCHEDULE and SCHEDULE_AND_EXECUTE, we should throw exception instead of set it to EXECUTE when specify instants.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please put the line operator = ClusteringOperator.EXECUTE below the line logInfo("No op") and please change logInfo("No op") to logInfo("No op and set it to EXECUTE with instants specified.")

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the line can not put below logInfo("No op"), operator default is scheduleAndExecute, if user specific instants, it need be set to execute after check

@KnightChess KnightChess reopened this Nov 28, 2022
@KnightChess
Copy link
Copy Markdown
Contributor Author

@leesf can you help me resolve this issue, after lastest success ci code, the code has not change any all, and my local compile is success

image

image

@leesf leesf closed this Nov 29, 2022
@leesf leesf reopened this Nov 29, 2022
Copy link
Copy Markdown
Contributor

@leesf leesf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@leesf leesf closed this Nov 29, 2022
@leesf leesf reopened this Nov 29, 2022
@leesf
Copy link
Copy Markdown
Contributor

leesf commented Nov 29, 2022

@KnightChess would you please rebase to latest master as I see the master has some code changed.

@KnightChess KnightChess force-pushed the more-conf-cluster-procedure branch from e839fd6 to 728191c Compare November 29, 2022 04:45
@KnightChess
Copy link
Copy Markdown
Contributor Author

@leesf done, and I update the test to resolve Java CI compile eroor, but I still don't know why

@KnightChess
Copy link
Copy Markdown
Contributor Author

flink moudle error, and I found
image

@KnightChess KnightChess force-pushed the more-conf-cluster-procedure branch from 728191c to 20a511d Compare November 29, 2022 05:13
@KnightChess
Copy link
Copy Markdown
Contributor Author

I will reopen ci after #7319 be merged

@KnightChess KnightChess force-pushed the more-conf-cluster-procedure branch from 20a511d to 3f8f9a4 Compare November 29, 2022 06:09
@KnightChess
Copy link
Copy Markdown
Contributor Author

KnightChess commented Nov 29, 2022

Don't merge, I meet some question in cluster

@KnightChess
Copy link
Copy Markdown
Contributor Author

Don't merge, I meet some question in cluster

look like is something bug in our Internal version, open version is ok, no blocked

@codope codope added area:sql SQL interfaces priority:high Significant impact; potential bugs labels Nov 29, 2022
@codope codope changed the title [HUDI-5278]support more conf to cluster procedure [HUDI-5278] Support more conf to cluster procedure Nov 29, 2022
@leesf
Copy link
Copy Markdown
Contributor

leesf commented Nov 29, 2022

@hudi-bot run azure

@hudi-bot
Copy link
Copy Markdown
Collaborator

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

@leesf leesf merged commit 418091b into apache:master Nov 30, 2022
@KnightChess
Copy link
Copy Markdown
Contributor Author

@leesf @stream2000 thanks for review

""".stripMargin)

val fileNum = 20
val numRecords = 400000
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @KnightChess, do we need so many files and records per file for this test? This test currently could cost much time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:sql SQL interfaces priority:high Significant impact; potential bugs

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

6 participants