Skip to content

feat: Support bucket write with non partitioned table#13283

Closed
JkSelf wants to merge 1 commit intofacebookincubator:mainfrom
JkSelf:bucket-non-partition-upstream
Closed

feat: Support bucket write with non partitioned table#13283
JkSelf wants to merge 1 commit intofacebookincubator:mainfrom
JkSelf:bucket-non-partition-upstream

Conversation

@JkSelf
Copy link
Collaborator

@JkSelf JkSelf commented May 9, 2025

The pull request at #9740 has already added support for bucket writing in non-partitioned tables. Consequently, this PR removes the check for buckets in non-partitioned tables within the getUpdateMode() function.

@JkSelf JkSelf requested a review from majetideepak as a code owner May 9, 2025 03:36
@netlify
Copy link

netlify bot commented May 9, 2025

Deploy Preview for meta-velox canceled.

Name Link
🔨 Latest commit 13cca53
🔍 Latest deploy log https://app.netlify.com/projects/meta-velox/deploys/68300b160742c600087d1065

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 9, 2025
@PHILO-HE
Copy link
Contributor

Please revise pr title to indicate it's for bucket write.

@jinchengchenghh
Copy link
Collaborator

jinchengchenghh commented May 12, 2025

@aditi-pandit Can you help review this PR? Thanks!

@jinchengchenghh
Copy link
Collaborator

Do you receive the error message?

Copy link
Collaborator

@rui-mo rui-mo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need to add some unit test? Thanks.

@JkSelf JkSelf changed the title feat: Support bucket with non partition table feat: Support bucket write with non partition table May 13, 2025
@JkSelf
Copy link
Collaborator Author

JkSelf commented May 13, 2025

Do you receive the error message?

@jinchengchenghh Yes, removing the fallback logic in Gluen caused the following exception during unit testing.

org.apache.gluten.exception.GlutenException: org.apache.gluten.exception.GlutenException: Exception: VeloxUserError
Error Source: USER
Error Code: INVALID_ARGUMENT
Reason: Cannot insert into bucketed unpartitioned Hive table
Retriable: False
Function: getUpdateMode
File: /mnt/DP_disk3/jk/projects/gluten/ep/build-velox/build/velox_ep/velox/connectors/hive/HiveDataSink.cpp
Line: 1016
09:53:03.002 ERROR org.apache.spark.sql.execution.VeloxColumnarWriteFilesRDD: Job job_202505130253001383836084494412078_0000 aborted.

@JkSelf
Copy link
Collaborator Author

JkSelf commented May 13, 2025

Do you need to add some unit test? Thanks.

@rui-mo Yes. I will add the related unit tests later. Thanks.

@JkSelf JkSelf force-pushed the bucket-non-partition-upstream branch from 10063e5 to 584665b Compare May 13, 2025 07:39
bucketProperty_,
compressionKind_,
getNumWriters(),
connector::hive::LocationHandle::TableType::kExisting,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#9740 only test the connector::hive::LocationHandle::TableType::kNew use case.

@JkSelf
Copy link
Collaborator Author

JkSelf commented May 13, 2025

@aditi-pandit @majetideepak @xiaoxmeng @rui-mo @jinchengchenghh Can you help to review this PR? Thanks.

@JkSelf JkSelf force-pushed the bucket-non-partition-upstream branch from 584665b to 391dea7 Compare May 13, 2025 08:29
@aditi-pandit
Copy link
Collaborator

aditi-pandit commented May 13, 2025

@JkSelf : The origin of that check came from Presto behavior https://github.com/prestodb/presto/blob/master/presto-hive/src/main/java/com/facebook/presto/hive/HiveWriterFactory.java#L480.. though I don't think that is an inherent Hive limitation. Trino has fixed it trinodb/trino#1127. We should port that fix to Presto. Created an issue prestodb/presto#25104.

But in general, I'm supportive of this change.

@JkSelf
Copy link
Collaborator Author

JkSelf commented May 20, 2025

apache/incubator-gluten#9575 can pass all the unit tests with this PR.

@JkSelf
Copy link
Collaborator Author

JkSelf commented May 22, 2025

@majetideepak @aditi-pandit @rui-mo @PHILO-HE Do you have any further comments? Thanks.

Copy link
Collaborator

@aditi-pandit aditi-pandit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @JkSelf. Looks good.

@JkSelf JkSelf force-pushed the bucket-non-partition-upstream branch from 391dea7 to 13cca53 Compare May 23, 2025 05:43
@majetideepak majetideepak added the ready-to-merge PR that have been reviewed and are ready for merging. PRs with this tag notify the Velox Meta oncall label May 23, 2025
@majetideepak majetideepak changed the title feat: Support bucket write with non partition table feat: Support bucket write with non partitioned table May 23, 2025
@facebook-github-bot
Copy link
Contributor

@bikramSingh91 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@JkSelf
Copy link
Collaborator Author

JkSelf commented May 26, 2025

@bikramSingh91 Can you help to merge this PR? Thanks.

@facebook-github-bot
Copy link
Contributor

@bikramSingh91 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@bikramSingh91 merged this pull request in f384796.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Merged ready-to-merge PR that have been reviewed and are ready for merging. PRs with this tag notify the Velox Meta oncall

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants