feat: Make the max bucket count configurable#13392
feat: Make the max bucket count configurable#13392JkSelf wants to merge 1 commit intofacebookincubator:mainfrom
Conversation
✅ Deploy Preview for meta-velox canceled.
|
|
@majetideepak Can you help to review this PR? Thanks. |
|
Does the Spark has limited maximum bucket number, maybe we also needs to restrict the upper bound of this config. |
@jinchengchenghh Yes. Spark has same configuration to limit the maximum of bucket number https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala#L1846-L1851. |
rui-mo
left a comment
There was a problem hiding this comment.
Would you please document this config in the https://github.com/facebookincubator/velox/blob/main/velox/docs/configs.rst? Thanks.
@rui-mo Yes. Updated. Can you help to review again? Thanks. |
| "max_partitions_per_writers"; | ||
|
|
||
| /// Maximum number of bucketed count. | ||
| static constexpr const char* kMaxBucketCount = "max-bucket-count"; |
There was a problem hiding this comment.
This should be hive.max-bucket-count. Same with max-partitions-per-writers. Can you fix both?
There was a problem hiding this comment.
Also kInsertExistingPartitionsBehavior needs the hive. prefix.
| - 100 | ||
| - Maximum number of (bucketed) partitions per a single table writer instance. | ||
| * - hive.max-bucket-count | ||
| - |
There was a problem hiding this comment.
Add the session property name here and above for hive.max-partitions-per-writers
@majetideepak Yes. It seems #13376 resolve the same issue with this pr. And i will close this pr. Thanks for all your review. |
We encountered an exception when enabling bucket writing for non-partitioned tables in Gluten apache/incubator-gluten#9575. Upon investigation, we discovered that the
maxBucketCountvalue is hard-coded and cannot be adjusted through configurations. This PR introduces thekMaxBucketCountparameter inHiveConfig, allowing it to be configurable.