Skip to content

Conversation

@boneanxs
Copy link
Contributor

@boneanxs boneanxs commented Nov 30, 2022

Change Logs

If there are sufficient resources in the clustering job, some clustering groups sometimes could still waits to be triggered, we use forkJoinPool to submit these jobs, and it's also difficult for clients to adjust this configure(--conf spark.driver.extraJavaOptions=-Djava.util.concurrent.ForkJoinPool.common.parallelism), and it could also affect other tasks using the forkJoinPool, so instead, we introduce a new threadPool to control the submitting job parallelism for the clustering.

Impact

Add new configure hoodie.clustering.max.threads to control this behavior.

Risk level (write none, low medium or high below)

none

Documentation Update

Describe any necessary documentation update if there is any new feature, config, or user-facing change

  • The config description must be updated if new configs are added or the default value of the configs are changed
  • Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the
    ticket number here and follow the instruction to make
    changes to the website.

Contributor's checklist

  • Read through contributor's guide
  • Change Logs and Impact were stated clearly
  • Adequate tests were added if applicable
  • CI passed

@boneanxs
Copy link
Contributor Author

boneanxs commented Dec 1, 2022

@hudi-bot run azure

@boneanxs boneanxs force-pushed the clustering_thread_pool branch 2 times, most recently from 15f8b28 to a1538ae Compare December 5, 2022 09:59
@nsivabalan nsivabalan added priority:blocker Production down; release blocker priority:critical Production degraded; pipelines stalled release-0.12.2 Patches targetted for 0.12.2 and removed priority:blocker Production down; release blocker labels Dec 5, 2022
@boneanxs
Copy link
Contributor Author

boneanxs commented Dec 6, 2022

@hudi-bot run azure

@codope codope removed the release-0.12.2 Patches targetted for 0.12.2 label Dec 7, 2022
@stream2000
Copy link
Contributor

Hi, any update on this pr? Would be great if we can land this feature! In my case, adding a thread pool could double the performance of clustering~

@boneanxs
Copy link
Contributor Author

update on this pr? Would be great if we can land this feature! In my case, adding a thread pool could double the performance of clustering~

Hey @xushiyan @danny0405, could you please help to review this?

@boneanxs boneanxs force-pushed the clustering_thread_pool branch from a1538ae to 6052ffe Compare June 25, 2023 11:02
@hudi-bot
Copy link
Collaborator

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

@danny0405 danny0405 merged commit 8eafe17 into apache:master Jun 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

priority:critical Production degraded; pipelines stalled release-0.14.0

Projects

Status: ✅ Done

Development

Successfully merging this pull request may close these issues.

8 participants