Skip to content

Comments

feat: Make the max bucket count configurable#13392

Closed
JkSelf wants to merge 1 commit intofacebookincubator:mainfrom
JkSelf:bucket-count
Closed

feat: Make the max bucket count configurable#13392
JkSelf wants to merge 1 commit intofacebookincubator:mainfrom
JkSelf:bucket-count

Conversation

@JkSelf
Copy link
Collaborator

@JkSelf JkSelf commented May 20, 2025

We encountered an exception when enabling bucket writing for non-partitioned tables in Gluten apache/incubator-gluten#9575. Upon investigation, we discovered that the maxBucketCount value is hard-coded and cannot be adjusted through configurations. This PR introduces the kMaxBucketCount parameter in HiveConfig, allowing it to be configurable.

Caused by: org.apache.gluten.exception.GlutenException: org.apache.gluten.exception.GlutenException: Exception: VeloxUserError
Error Source: USER
Error Code: INVALID_ARGUMENT
Reason: (100001 vs. 100000) bucketCount exceeds the limit
Retriable: False
Expression: bucketCount_ < maxBucketCount()
Function: HiveDataSink
File: /work/ep/build-velox/build/velox_ep/velox/connectors/hive/HiveDataSink.cpp
Line: 426
Stack trace:
# 0  _ZN8facebook5velox7process10StackTraceC1Ei
# 1  _ZN8facebook5velox14VeloxExceptionC1EPKcmS3_St17basic_string_viewIcSt11char_traitsIcEES7_S7_S7_bNS1_4TypeES7_
# 2  _ZN8facebook5velox6detail14veloxCheckFailINS0_14VeloxUserErrorERKSsEEvRKNS1_18VeloxCheckFailArgsET0_
# 3  _ZN8facebook5velox9connector4hive12HiveDataSinkC2ESt10shared_ptrIKNS0_7RowTypeEES4_IKNS2_21HiveInsertTableHandleEEPKNS1_17ConnectorQueryCtxENS1_14CommitStrategyERKS4_IKNS2_10HiveConfigEEjSt10unique_ptrINS0_4core17PartitionFunctionESt14default_deleteISM_EE
# 4  _ZN8facebook5velox9connector4hive12HiveDataSinkC2ESt10shared_ptrIKNS0_7RowTypeEES4_IKNS2_21HiveInsertTableHandleEEPKNS1_17ConnectorQueryCtxENS1_14CommitStrategyERKS4_IKNS2_10HiveConfigEE
# 5  _ZN8facebook5velox9connector4hive13HiveConnector14createDataSinkESt10shared_ptrIKNS0_7RowTypeEES4_INS1_26ConnectorInsertTableHandleEEPNS1_17ConnectorQueryCtxENS1_14CommitStrategyE
# 6  _ZN8facebook5velox4exec11TableWriter14createDataSinkEv
# 7  _ZN8facebook5velox4exec11TableWriter10initializeEv
# 8  _ZN8facebook5velox4exec6Driver19initializeOperatorsEv
# 9  _ZN8facebook5velox4exec6Driver11runInternalERSt10shared_ptrIS2_ERS3_INS1_13BlockingStateEERS3_INS0_9RowVectorEE
# 10 _ZN8facebook5velox4exec6Driver4nextEPN5folly10SemiFutureINS3_4UnitEEERPNS1_8OperatorERNS1_14BlockingReasonE
# 11 _ZN8facebook5velox4exec4Task4nextEPN5folly10SemiFutureINS3_4UnitEEE
# 12 _ZN6gluten24WholeStageResultIterator4nextEv
# 13 Java_org_apache_gluten_vectorized_ColumnarBatchOutIterator_nativeHasNext
# 14 0x00007f351965dadb

@JkSelf JkSelf requested a review from majetideepak as a code owner May 20, 2025 06:47
@netlify
Copy link

netlify bot commented May 20, 2025

Deploy Preview for meta-velox canceled.

Name Link
🔨 Latest commit 18dd54f
🔍 Latest deploy log https://app.netlify.com/projects/meta-velox/deploys/683001f0af98400008d51c74

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 20, 2025
@JkSelf
Copy link
Collaborator Author

JkSelf commented May 20, 2025

@majetideepak Can you help to review this PR? Thanks.

@JkSelf
Copy link
Collaborator Author

JkSelf commented May 20, 2025

cc @jinchengchenghh @rui-mo

@jinchengchenghh
Copy link
Collaborator

Does the Spark has limited maximum bucket number, maybe we also needs to restrict the upper bound of this config.

@JkSelf
Copy link
Collaborator Author

JkSelf commented May 21, 2025

Does the Spark has limited maximum bucket number, maybe we also needs to restrict the upper bound of this config.

@jinchengchenghh Yes. Spark has same configuration to limit the maximum of bucket number https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala#L1846-L1851.

Copy link
Collaborator

@rui-mo rui-mo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JkSelf
Copy link
Collaborator Author

JkSelf commented May 22, 2025

Would you please document this config in the https://github.com/facebookincubator/velox/blob/main/velox/docs/configs.rst? Thanks.

@rui-mo Yes. Updated. Can you help to review again? Thanks.

"max_partitions_per_writers";

/// Maximum number of bucketed count.
static constexpr const char* kMaxBucketCount = "max-bucket-count";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be hive.max-bucket-count. Same with max-partitions-per-writers. Can you fix both?

Copy link
Collaborator

@majetideepak majetideepak May 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also kInsertExistingPartitionsBehavior needs the hive. prefix.

- 100
- Maximum number of (bucketed) partitions per a single table writer instance.
* - hive.max-bucket-count
-
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add the session property name here and above for hive.max-partitions-per-writers

@majetideepak
Copy link
Collaborator

@JkSelf I see another PR with the same scope as this one #13376

@JkSelf
Copy link
Collaborator Author

JkSelf commented May 24, 2025

@JkSelf I see another PR with the same scope as this one #13376

@majetideepak Yes. It seems #13376 resolve the same issue with this pr. And i will close this pr. Thanks for all your review.

@JkSelf JkSelf closed this May 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants