feat(native): Insert into bucketed but unpartitioned Hive table by anandamideShakyan · Pull Request #25139 · prestodb/presto

anandamideShakyan · 2025-05-18T11:30:21Z

Description

Addresses #25104
Currently, Presto does not support INSERT INTO operations on bucketed but unpartitioned Hive tables. This limitation originates from a hard check in HiveWriterFactory:

https://github.com/prestodb/presto/blob/master/presto-hive/src/main/java/com/facebook/presto/hive/HiveWriterFactory.java#L480

Motivation and Context

Supporting writes to bucketed unpartitioned Hive tables in Presto would improve compatibility and enhance Presto’s ability to handle modern Hive table layouts. It's a reasonable and useful feature for users who wish to leverage bucketing for performance optimizations even without partitioning.

Impact

This change would align Presto’s behavior with the broader SQL-on-Hadoop ecosystem and remove an artificial limitation that may block valid use cases — particularly in data warehousing environments where bucketing is used independently of partitioning.

Release Notes

== NO RELEASE NOTE ==

aditi-pandit · 2025-05-20T21:55:06Z

@anandamideShakyan : Thanks for this PR.

Have you tried this functionality with Prestissimo ? You might need facebookincubator/velox#13283 as well for it.

anandamideShakyan · 2025-05-22T08:44:11Z

@aditi-pandit Sure I will add the support in Prestissimo after facebookincubator/velox#13283 is merged.

aditi-pandit · 2025-05-22T19:04:13Z

@anandamideShakyan : Ther are failures in product tests. PTAL.

2025-05-18 19:49:10 INFO: [78 of 435] com.facebook.presto.tests.hive.TestHiveBucketedTables.testInsertIntoBucketedTables (Groups: )
2025-05-18 19:49:11 INFO: FAILURE     /    com.facebook.presto.tests.hive.TestHiveBucketedTables.testInsertIntoBucketedTables (Groups: ) took 1.1 seconds
2025-05-18 19:49:11 SEVERE: Failure cause:
java.lang.IllegalArgumentException: No mutable table instance found for name TableHandle{name=bucket_nation}
	at io.prestodb.tempto.fulfillment.table.TablesState.get(TablesState.java:64)
	at io.prestodb.tempto.fulfillment.table.TablesState.get(TablesState.java:48)
	at com.facebook.presto.tests.hive.TestHiveBucketedTables.testInsertIntoBucketedTables(TestHiveBucketedTables.java:173)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.testng.internal.invokers.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:135)
	at org.testng.internal.invokers.TestInvoker.invokeMethod(TestInvoker.java:673)
	at org.testng.internal.invokers.TestInvoker.invokeTestMethod(TestInvoker.java:220)
	at org.testng.internal.invokers.MethodRunner.runInSequence(MethodRunner.java:50)
	at org.testng.internal.invokers.TestInvoker$MethodInvocationAgent.invoke(TestInvoker.java:945)
	at org.testng.internal.invokers.TestInvoker.invokeTestMethods(TestInvoker.java:193)
	at org.testng.internal.invokers.TestMethodWorker.invokeTestMethods(TestMethodWorker.java:146)
	at org.testng.internal.invokers.TestMethodWorker.run(TestMethodWorker.java:128)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

anandamideShakyan · 2025-06-18T07:40:55Z

@anandamideShakyan : Thanks for this PR.

Have you tried this functionality with Prestissimo ? You might need facebookincubator/velox#13283 as well for it.

I tried it on Prestissimo, with one coordinator and one worker. I created a table in hive schema and tpcds catalog using:

CREATE TABLE cars (
    id BIGINT,
    name VARCHAR,
    brand VARCHAR
)
WITH (
    format = 'PARQUET',
    bucketed_by = ARRAY['id'],
    bucket_count = 4
);

Inserted values:

INSERT INTO cars (id, name, brand) VALUES
  (1, 'Model S', 'Tesla'),
  (2, 'Civic', 'Honda'),
  (3, 'Mustang', 'Ford'),
  (4, 'A4', 'Audi');

Was able to see the entries on running the select query:

pramodsatya · 2025-06-25T16:49:20Z

...ive-tests/src/test/java/com/facebook/presto/nativetests/TestHivePartitionedInsertNative.java

+import static java.lang.Boolean.parseBoolean;
+import static org.testng.Assert.assertEquals;
+
+public class TestHivePartitionedInsertNative


Could we move these testcases to presto-tests or presto-product-tests? Ideally, we don't want to add new testcases to presto-native-tests, instead we should just extend the existing e2e tests (such as the ones added to presto-product-tests in this PR) to run with with the native query runner.

steveburnett · 2025-06-30T13:55:08Z

Consider adding an example of how to use this new ability, or at least a mention that this is now possible for users to do and why it's useful (as you wrote in the Description), to the documentation.

aditi-pandit · 2026-02-02T16:29:49Z

@anandamideShakyan : It will be good to complete this work as it has been a long pending item. Please can you take a look at the failures.

anandamideShakyan · 2026-02-06T14:45:35Z

Inserts into bucketed Hive tables using the C++ (Velox) worker were failing during finishInsert with:

VerifyException: computeFileNamesForMissingBuckets

This happens because Presto’s Hive metadata layer assumes exactly one file per bucket per partition.
If any bucket does not produce a file, Presto attempts to synthesize “missing bucket” files during commit.

The Java worker never hits this path because it always creates one file per bucket, even when a bucket receives zero rows.

The Velox (C++) HiveDataSink, however, only created writers for buckets that actually received rows. When a bucket was empty, no writer → no file, causing Presto to think the bucket was missing and fail verification.

This is why inserts succeeded when data happened to hit all buckets, and failed otherwise.

Fix

The fix ensures that Velox creates one writer (and therefore one output file) per bucket, matching Java worker behavior and Presto’s expectations.

Specifically:

During HiveDataSink::splitInputRowsAndEnsureWriters(), we now pre-create writers for all buckets (for each partition, if partitioned).

This guarantees that every bucket produces exactly one file, even if it contains zero rows.

As a result, computeFileNamesForMissingBuckets() is never triggered and finishInsert succeeds.

To Do

This is a Velox-side fix (C++ worker behavior).
The original PR is in Presto, but the correct fix belongs in Velox, so a separate Velox PR is required. Will create velox PR soon.
This change aligns C++ worker semantics with Java worker semantics and Hive’s bucketing contract.

With this fix locally, I am able to insert into bucketed hive tables with and without sidecar. I am now looking at resolving the unit test failure that came after these changes : #25115

aditi-pandit · 2026-02-06T23:14:52Z

@anandamideShakyan : Presto has a property hive.create-empty-bucket-files to control whether to create empty bucket files. Seems like this should always be false for native engine.

But in any case, doesn't Presto server create the missing buckets on the co-ordinator in the TableFinish logic and not in the worker ? I feel it should be on co-ordinator in TableFinish as its only after seeing all the worker files should we know which buckets are empty. The individual worker cannot make this decision.

This error seems like a local problem between Hive and Presto on the co-ordinator.

Please can you recheck if something else is missing.

prestodb-ci added the from:IBM PR from IBM label May 18, 2025

anandamideShakyan mentioned this pull request May 18, 2025

Insert into bucketed but unpartitioned Hive table #25104

Open

anandamideShakyan marked this pull request as ready for review May 18, 2025 16:29

anandamideShakyan requested a review from a team as a code owner May 18, 2025 16:29

anandamideShakyan requested a review from jaystarshot May 18, 2025 16:29

prestodb-ci requested review from a team, namya28 and pramodsatya and removed request for a team May 18, 2025 16:29

anandamideShakyan force-pushed the insert-bucketed-unpar-hive branch 4 times, most recently from 0e437dd to 38805a8 Compare May 26, 2025 07:21

anandamideShakyan force-pushed the insert-bucketed-unpar-hive branch from 38805a8 to 817f7df Compare June 25, 2025 07:33

anandamideShakyan requested a review from a team as a code owner June 25, 2025 07:33

pramodsatya reviewed Jun 25, 2025

View reviewed changes

anandamideShakyan force-pushed the insert-bucketed-unpar-hive branch from 817f7df to ed26ecc Compare June 27, 2025 22:41

anandamideShakyan force-pushed the insert-bucketed-unpar-hive branch from ed26ecc to e8591d3 Compare January 31, 2026 11:15

anandamideShakyan changed the title ~~Insert into bucketed but unpartitioned Hive table~~ feat(native): Insert into bucketed but unpartitioned Hive table Jan 31, 2026

anandamideShakyan added 4 commits January 31, 2026 18:03

Insert into bucketed but unpartitioned Hive table

128e0c2

new test file

05f1f90

Added native tests

097a243

code fix

f68fe3d

anandamideShakyan force-pushed the insert-bucketed-unpar-hive branch from e8591d3 to f68fe3d Compare January 31, 2026 12:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(native): Insert into bucketed but unpartitioned Hive table#25139

feat(native): Insert into bucketed but unpartitioned Hive table#25139
anandamideShakyan wants to merge 4 commits intoprestodb:masterfrom
anandamideShakyan:insert-bucketed-unpar-hive

anandamideShakyan commented May 18, 2025 •

edited

Loading

Uh oh!

aditi-pandit commented May 20, 2025

Uh oh!

anandamideShakyan commented May 22, 2025

Uh oh!

aditi-pandit commented May 22, 2025

Uh oh!

anandamideShakyan commented Jun 18, 2025 •

edited

Loading

Uh oh!

pramodsatya Jun 25, 2025

Uh oh!

steveburnett commented Jun 30, 2025

Uh oh!

aditi-pandit commented Feb 2, 2026

Uh oh!

anandamideShakyan commented Feb 6, 2026

Uh oh!

aditi-pandit commented Feb 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

anandamideShakyan commented May 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Motivation and Context

Impact

Release Notes

Uh oh!

aditi-pandit commented May 20, 2025

Uh oh!

anandamideShakyan commented May 22, 2025

Uh oh!

aditi-pandit commented May 22, 2025

Uh oh!

anandamideShakyan commented Jun 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pramodsatya Jun 25, 2025

Choose a reason for hiding this comment

Uh oh!

steveburnett commented Jun 30, 2025

Uh oh!

aditi-pandit commented Feb 2, 2026

Uh oh!

anandamideShakyan commented Feb 6, 2026

Uh oh!

aditi-pandit commented Feb 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

anandamideShakyan commented May 18, 2025 •

edited

Loading

anandamideShakyan commented Jun 18, 2025 •

edited

Loading