feat(plugin-iceberg): Add support for MERGE INTO#25470
feat(plugin-iceberg): Add support for MERGE INTO#25470tdcmeehan merged 3 commits intoprestodb:masterfrom
Conversation
3e4437e to
5b65ae1
Compare
|
@acarpente-denodo : Thanks for this contribution. The code will need extensive changes for Presto C++ support. Have you investigated that ? Did you try this code with a C++ worker (or side-car enabled ) ? We can do the full C++ support as a follow up, but its important the queries fail with a clean error message on Presto C++ |
|
Hi @aditi-pandit, |
yingsu00
left a comment
There was a problem hiding this comment.
Hi Adrian, were you able to build this PR? I got the following errors:
[ERROR] Failed to execute goal org.basepom.maven:duplicate-finder-maven-plugin:1.5.1:check (default) on project presto-main: Found duplicate classes/resources! -> [Help 1]
Also, how did you get the "handle not found" error? Were you running a query using the query runner? Were you able to get the correct plan? Would you please share the plan and full error stack?
| sql/grant | ||
| sql/grant-roles | ||
| sql/insert | ||
| sql/merge |
There was a problem hiding this comment.
At this commit, the merge into only supports parsing and not the full functionality. Let's move the documentation to the last commit when it's fully supported.
There was a problem hiding this comment.
OK, it's done.
| } | ||
|
|
||
| @Test | ||
| public void testMergeInto() |
There was a problem hiding this comment.
Keep the original name testMerge()
There was a problem hiding this comment.
OK, it's done.
yingsu00
left a comment
There was a problem hiding this comment.
Hi Adrian, were you able to build this PR? The CI build failed with these:
In file included from /__w/presto/presto/presto-native-execution/./presto_cpp/main/PeriodicHeartbeatManager.h:17:
/__w/presto/presto/presto-native-execution/./presto_cpp/presto_protocol/core/presto_protocol_core.h:1684:3: error: unknown type name 'com'
com.facebook.presto.spi.MergeHandle handle = {};
^
/__w/presto/presto/presto-native-execution/./presto_cpp/presto_protocol/core/presto_protocol_core.h:1684:6: error: expected member name or ';' after declaration specifiers
com.facebook.presto.spi.MergeHandle handle = {};
~~~^
/__w/presto/presto/presto-native-execution/./presto_cpp/presto_protocol/core/presto_protocol_core.h:1686:3: error: unknown type name 'ConnectorMergeTableHandle'; did you mean 'ConnectorDeleteTableHandle'?
ConnectorMergeTableHandle connectorMergeTableHandle = {};
^~~~~~~~~~~~~~~~~~~~~~~~~
ConnectorDeleteTableHandle
/__w/presto/presto/presto-native-execution/./presto_cpp/presto_protocol/core/presto_protocol_core.h:[313](https://github.com/prestodb/presto/actions/runs/16068848370/job/45348688297?pr=25470#step:17:314):8: note: 'ConnectorDeleteTableHandle' declared here
struct ConnectorDeleteTableHandle : public JsonEncodedSubclass {};
^
/__w/presto/presto/presto-native-execution/./presto_cpp/presto_protocol/core/presto_protocol_core.h:1709:3: error: unknown type name 'RowChangeParadigm'
RowChangeParadigm paradigm = {};
^
4 errors generated.
Would you please fix the build errors first?
Also, how did you get the "handle not found" error? Were you running a query using the query runner? Since the build fails I wonder how you could do that. If you could run a query, were you able to get the correct plan? Would you please share the plan and full error stack? Thanks!
ace4af6 to
b02d82f
Compare
|
Hi @yingsu00, The PR is still in development. I'm still focused on getting the MERGE INTO command fully working on Java. The coordinator and workers are currently running on Java. I haven't investigated the MERGE INTO support for Presto C++ yet. Automated tests have not been created yet. To test this enhancement, you need to perform a manual test. It is described at the end of the PR description in the section called "Test Plan". I'm facing an error during the serialization of a new class called MergePartitioningHandle, which implements ConnectorPartitioningHandle interface I think there is an issue with The goal of this draft PR is to get assistance in fixing this problem. |
3465776 to
9658fd0
Compare
20d7c91 to
441ed5b
Compare
09c9c54 to
bca3929
Compare
steveburnett
left a comment
There was a problem hiding this comment.
Format nit. Thanks for the doc!
|
Hi @steveburnett! Thank you for reviewing the feature documentation. |
7bb8555 to
e799d9f
Compare
fd4bc78 to
8f49697
Compare
f9e99cb to
994f813
Compare
|
Can you please move the UI changes to a separate PR? |
hantangwangd
left a comment
There was a problem hiding this comment.
Thanks for this great feature. I've left a few comments, mostly small things and a point for discussion.
presto-iceberg/src/main/java/com/facebook/presto/iceberg/IcebergPartitioningHandle.java
Outdated
Show resolved
Hide resolved
presto-iceberg/src/main/java/com/facebook/presto/iceberg/IcebergUpdateHandle.java
Outdated
Show resolved
Hide resolved
presto-iceberg/src/main/java/com/facebook/presto/iceberg/IcebergMetadataColumn.java
Outdated
Show resolved
Hide resolved
presto-iceberg/src/main/java/com/facebook/presto/iceberg/IcebergUpdateablePageSource.java
Outdated
Show resolved
Hide resolved
presto-iceberg/src/main/java/com/facebook/presto/iceberg/IcebergMetadataColumn.java
Outdated
Show resolved
Hide resolved
994f813 to
6510337
Compare
Hi @tdcmeehan! OK. I moved the presto-ui changes to a separate PR: #26825 |
07acbfe to
0b73712
Compare
presto-iceberg/src/main/java/com/facebook/presto/iceberg/IcebergMergeSink.java
Outdated
Show resolved
Hide resolved
presto-iceberg/src/main/java/com/facebook/presto/iceberg/IcebergMetadataColumn.java
Outdated
Show resolved
Hide resolved
presto-iceberg/src/main/java/com/facebook/presto/iceberg/IcebergMetadataColumn.java
Outdated
Show resolved
Hide resolved
0b73712 to
298b708
Compare
79a5b6e to
da4927d
Compare
PingLiuPing
left a comment
There was a problem hiding this comment.
Thanks for the feature.
presto-iceberg/src/main/java/com/facebook/presto/iceberg/IcebergAbstractMetadata.java
Outdated
Show resolved
Hide resolved
presto-iceberg/src/main/java/com/facebook/presto/iceberg/IcebergAbstractMetadata.java
Show resolved
Hide resolved
presto-iceberg/src/main/java/com/facebook/presto/iceberg/IcebergMergeSink.java
Show resolved
Hide resolved
|
Is documentation no longer needed for this PR? Has the doc been moved to a different PR? Thanks! |
Hi @steveburnett, the documentation for this feature was merged in this PR: Add SQL Support for MERGE INTO in Presto (engine) So, no documentation is required for this PR. |
PingLiuPing
left a comment
There was a problem hiding this comment.
Thanks, looks good overall. Could you consolidate the commit message little bit.
Leaving few comments on the tests.
presto-iceberg/src/test/java/com/facebook/presto/iceberg/IcebergDistributedTestBase.java
Outdated
Show resolved
Hide resolved
| { | ||
| String targetTable = "merge_inserts_" + randomTableSuffix(); | ||
| assertUpdate(format("CREATE TABLE %s (customer VARCHAR, purchases INT, address VARCHAR)", targetTable)); | ||
| assertUpdate(format("INSERT INTO %s (customer, purchases, address) VALUES ('Aaron', 11, 'Antioch'), ('Bill', 7, 'Buena')", targetTable), 2); |
There was a problem hiding this comment.
These code repeats lot, maybe adding a helper function.
presto-iceberg/src/test/java/com/facebook/presto/iceberg/IcebergDistributedTestBase.java
Show resolved
Hide resolved
| (long) computeActual("SELECT count(*) FROM tpch.sf1.orders").getOnlyValue()); | ||
|
|
||
| // verify copied rows: same total price | ||
| assertEquals( |
There was a problem hiding this comment.
For consistency, consider to use assertQuery . And all other places around.
|
Hi @tdcmeehan, @steveburnett, @ethanyzhang and @PingLiuPing! I've made the requested changes, and the PR has passed all automated tests. Could you please approve this PR? |
| assertUpdate("DROP TABLE " + targetTable); | ||
| } | ||
| } | ||
|
|
There was a problem hiding this comment.
Could you please add some tests for:
- Empty source table
- Empty target table
- All rows matched
- No rows matched
There was a problem hiding this comment.
Hi @PingLiuPing! Sure, I created the new automated tests you requested.
| fragments.addAll(writePositionDeletes(sink, deletion.rowsToDelete())); | ||
| }); | ||
|
|
||
| return completedFuture(fragments); |
There was a problem hiding this comment.
nit: This method returns CompletableFuture but calls .join() synchronously, and this could blocking the thread and defeats the purpose of returning a CompletableFuture.
There was a problem hiding this comment.
Hi @PingLiuPing! You are correct. I fixed it.
hantangwangd
left a comment
There was a problem hiding this comment.
@acarpente-denodo thanks for this feature, lgtm!
tdcmeehan
left a comment
There was a problem hiding this comment.
Thank you @acarpente-denodo
Support SQL MERGE in the Iceberg connector Cherry-pick of trinodb/trino@6cb188b Co-authored-by: David Phillips <david@acz.org>
SQL MERGE automated tests for Iceberg connector Cherry-pick of trinodb/trino@6cb188b Co-authored-by: David Phillips <david@acz.org>
- Improved CompletableFuture management. - Added new automted tests
Description
Iceberg connector support for the SQL MERGE INTO command.
Syntax:
Example: MERGE INTO usage to update the sales information for existing products and insert the sales information for the new products in the market.
The Presto engine commit introduces an enum called RowChangeParadigm, which describes how a connector modifies rows. The iceberg connector will utilize the
DELETE_ROW_AND_INSERT_ROWparadigm, as it represents an updated row as a combination of a deleted row followed by an inserted row. TheCHANGE_ONLY_UPDATED_COLUMNSparadigm is meant for connectors that support updating individual columns of rows.Note: Changes were made after reviewing the following Trino PR: trinodb/trino#7126
So, this PR is deeply inspired by Trino's implementation.
Motivation and Context
The
MERGE INTOstatement is commonly used to integrate data from two tables with different contents but similar structures.For example, the source table could be part of a production transactional system, while the target table might be located in a data warehouse for analytics.
Regularly, MERGE operations are performed to update the analytics warehouse with the latest production data.
You can also use MERGE with tables that have different structures, as long as you can define a condition to match the rows between them.
Test Plan
Automated tests developed in IcebergDistributedTestBase.
Contributor checklist
Release Notes