Make Iceberg split manager threads configurable#22754
Make Iceberg split manager threads configurable#22754hantangwangd merged 1 commit intoprestodb:masterfrom
Conversation
|
Thanks for the change. I notice that Iceberg use java system property |
03762df to
6bdf1c8
Compare
|
@hantangwangd Thank you for your review. I have added this configuration parameter to the document. |
6bdf1c8 to
5560bba
Compare
|
So that others can more easily know that they need to tune this value, can you please also export the metrics from this executor? See an example in HivePartitionManager#getExecutor. |
steveburnett
left a comment
There was a problem hiding this comment.
LGTM! (docs)
Pull branch, local docs build, looks good. Thanks!
|
Suggest revising the release note entry based on the Order of changes and Phrasing of the Release Notes Guidelines: |
5e58ce2 to
57f8110
Compare
|
Hi @tdcmeehan Thank you for your review. I followed your suggestion and added the Executor information to JMX metrics. |
|
|
||
| ``iceberg.split-manager-threads`` Number of threads to use for generating iceberg splits. ``Number of available processors`` |
There was a problem hiding this comment.
One more thing, should configuration properties add here? See Configuration Properties chapter.
There was a problem hiding this comment.
Great question, and good catch! I have a question for you:
Is the iceberg.split-manager-threads property related to Manifest File Caching?
- If yes, then I think it should stay where it is.
- If no, then I suggest moving the
iceberg.split-manager-threadsproperty to the table in Configuration Properties that you noticed.
Thanks!
There was a problem hiding this comment.
It's not related to Manifest File Caching, so I think it's better to be placed in Configuration Properties.
There was a problem hiding this comment.
@hantangwangd @steveburnett I read the documentation and found that I had indeed added the wrong place, and I have corrected it. Thanks.
8ff1bb2 to
2af039a
Compare
ChunxuTang
left a comment
There was a problem hiding this comment.
@wypb Thanks for your work! Just left a minor suggestion.
| private long maxManifestCacheSize = IO_MANIFEST_CACHE_MAX_TOTAL_BYTES_DEFAULT; | ||
| private long manifestCacheExpireDuration = IO_MANIFEST_CACHE_EXPIRATION_INTERVAL_MS_DEFAULT; | ||
| private long manifestCacheMaxContentLength = IO_MANIFEST_CACHE_MAX_CONTENT_LENGTH_DEFAULT; | ||
| private int splitManagerThreads = Runtime.getRuntime().availableProcessors(); |
There was a problem hiding this comment.
I noticed that they used Runtime.getRuntime().availableProcessors() * 2 as the default value, probably because each processor core commonly supports 2 threads (hyper-threading). Are there any specific reasons for your setting?
There was a problem hiding this comment.
This default value is to be consistent with the default value of Iceberg's iceberg.worker.num-threads attribute:https://github.com/apache/iceberg/blob/main/core/src/main/java/org/apache/iceberg/SystemConfigs.java#L38-L43
There was a problem hiding this comment.
Gotcha. Thanks for the clarification.
steveburnett
left a comment
There was a problem hiding this comment.
Thanks for moving the property to the best location! One final nit just noticed today, about capitalizing Iceberg.
|
|
||
| Set to ``0`` to disable metadata optimization. | ||
|
|
||
| ``iceberg.split-manager-threads`` Number of threads to use for generating iceberg splits. ``Number of available processors`` |
There was a problem hiding this comment.
| ``iceberg.split-manager-threads`` Number of threads to use for generating iceberg splits. ``Number of available processors`` | |
| ``iceberg.split-manager-threads`` Number of threads to use for generating Iceberg splits. ``Number of available processors`` |
There was a problem hiding this comment.
Nice catch @steveburnett, I've already fixed it.
| private long maxManifestCacheSize = IO_MANIFEST_CACHE_MAX_TOTAL_BYTES_DEFAULT; | ||
| private long manifestCacheExpireDuration = IO_MANIFEST_CACHE_EXPIRATION_INTERVAL_MS_DEFAULT; | ||
| private long manifestCacheMaxContentLength = IO_MANIFEST_CACHE_MAX_CONTENT_LENGTH_DEFAULT; | ||
| private int splitManagerThreads = Runtime.getRuntime().availableProcessors(); |
There was a problem hiding this comment.
Gotcha. Thanks for the clarification.
ZacBlanco
left a comment
There was a problem hiding this comment.
LGTM, but since it is a cherry-pick from trino, I think you should credit the original author as the co-author in the commit.
cfc5671
d9591b8 to
070f748
Compare
|
|
|
Hi @ZacBlanco Thank you for your reply, I've added the co-author to the commit log. |
070f748 to
2d0fec1
Compare
Cherry-pick of trinodb/trino@57f0081 Co-authored-by: electrum
2d0fec1 to
51b619b
Compare
steveburnett
left a comment
There was a problem hiding this comment.
LGTM! (docs)
Pull updated branch, local build, everything looks good. Thanks!
Description
We have an Iceberg table with a lot of metadata. Recently, we found that every time we query this table, the Coordinator's CPU usage will be very high. We checked and found that it is because when calling Iceberg's
tableScan#planFiles()API, Iceberg's tableScan will use all cores available in the system (Runtime.getRuntime().availableProcessors()) for calculations. This PR allows us to control the number of threads used by Iceberg's tableScan on the Presto side.BTW, this PR Cherry-pick from trinodb/trino@57f0081
The JMX metrics for executor used for Iceberg tableScan can be tracked at
com.facebook.presto.iceberg:name=iceberg,type=icebergsplitmanagerMotivation and Context
Impact
Test Plan
Contributor checklist
Release Notes
CC: @tdcmeehan @yingsu00 @ChunxuTang @hantangwangd