Support SQL MERGE in the Trino engine and five connectors by djsagain · Pull Request #7933 · trinodb/trino

djsagain · 2021-05-16T19:21:33Z

This PR is a second take on implementing SQL MERGE. It consists commits that add support for SQL MERGE in the Trino engine and in the Hive, Kudu, Raptor, Iceberg and Delta Lake connectors. The implementation is structured so that most of the work happens in the Trino engine, so adding support in a connector is pretty simple.

The SQL MERGE implementation allows update of all columns, including partition or bucket columns, and the Trino engine performs redistribution to ensure that the updated rows end up on the appropriate nodes.

The Trino engine commit introduces an enum RowChangeParadigm, which characterizes how a connector modifies rows. Hive uses and Iceberg will use the DELETE_ROW_AND_INSERT_ROW paradigm, since they represent an updated row as a deleted row and an inserted row. Kudu uses the CHANGE_ONLY_UPDATED_COLUMNS paradigm.

Each paradigm corresponds to an implementation of the RowChangeProcessor interface. After this PR is merged, the intent is to retrofit SQL UPDATE to use the same RowChangeParadigm/Processor mechanism.

Extensive documentation on the internal MERGE architecture can be found in the developer doc supporting-merge.rst.

Fixes #7708

docs/src/main/sphinx/develop/supporting-merge.rst

plugin/trino-hive/src/main/java/io/trino/plugin/hive/HiveWriterFactory.java

core/trino-spi/src/main/java/io/trino/spi/connector/PagePair.java

plugin/trino-hive/src/main/java/io/trino/plugin/hive/MergeFileWriter.java

electrum

The Kudu commit looks good

core/trino-spi/src/main/java/io/trino/spi/connector/RowChangeParadigm.java

core/trino-spi/src/main/java/io/trino/spi/connector/MergeDetails.java

core/trino-main/src/main/java/io/trino/operator/DeleteAndInsertMergeProcessor.java

plugin/trino-kudu/src/main/java/io/trino/plugin/kudu/KuduPageSink.java

djsagain · 2021-05-26T00:13:05Z

Thanks for the great comments, @electrum. I did everything you suggested.

kasiafi

A lot of questions and some comments. I've gone through the docs, and partially through the analysis.

docs/src/main/sphinx/develop/supporting-merge.rst

core/trino-main/src/main/java/io/trino/sql/analyzer/StatementAnalyzer.java

core/trino-main/src/main/java/io/trino/sql/analyzer/CanonicalizationAware.java

core/trino-main/src/main/java/io/trino/sql/analyzer/StatementAnalyzer.java

kasiafi

Some more comments regarding the analyzer. Initial comments on the planner part.

core/trino-main/src/main/java/io/trino/sql/analyzer/StatementAnalyzer.java

core/trino-main/src/main/java/io/trino/sql/analyzer/Analysis.java

core/trino-main/src/main/java/io/trino/sql/planner/QueryPlanner.java

djsagain · 2021-06-16T18:39:43Z

Thanks for the great first batch of comments, @kasiafi! I believe I've addressed the comments from yesterday except those listed below. It would be great if you could resolve the comments you think have been handled to your satisfaction.

I haven't addressed the more profound comments made 4 hours ago yet, and some of them will require coaching from you or @martint.

Here are the comments from yesterday that I haven't addressed:

Does DuplicateRowFinder need to compare the writeRedistribution columns?
Will matched target table rowIds really come out in order such that DuplicateRowFinder is guaranteed to identify them?
Implementing multiple assignment.
Addressing your comment: "Instead of assigning a scope to an Identifier, the aliased table should parse as AliasedRelation."
Addressing your comment: "What if the table was a materialized view?"

findepi · 2021-06-17T07:36:14Z

re #7933 (comment)

target table rowIds would be partitioned among nodes

@djsstarburst can you please point me to a document outlining how MERGE interacts with connectors?

i would like to learn about the following

what are the assumption on rowIds, can rowIds carry un-updated columns
how should a connector construct rowIds if it needs to create deletion delta files for the sake of updates (e.g. a separate deletion file for an input file which would mark all the rows that got updated)
what is table handle lifecycle for MERGE. for example, how MERGE interacts with partition, file and file chunk pruning

docs/src/main/sphinx/develop/supporting-merge.rst

docs/src/main/sphinx/sql/merge.rst

core/trino-main/src/main/java/io/trino/sql/analyzer/StatementAnalyzer.java

core/trino-main/src/main/java/io/trino/sql/analyzer/Analysis.java

kasiafi

Here are some comments regarding the previously reviewed part. Additionally, I answered some of your replies directly. I resolved all conversations except those that require a follow-up.

I plan to review next portions of code, and put my comments in a new batch.

core/trino-spi/src/main/java/io/trino/spi/connector/ConnectorMergeSink.java

core/trino-main/src/main/java/io/trino/operator/DeleteAndInsertMergeProcessor.java

findepi · 2021-06-17T11:02:25Z

core/trino-main/src/main/java/io/trino/operator/DeleteAndInsertMergeProcessor.java

Why RowBlock is special-cased here?
What if underlyingBlock is a DictionaryBlock over a RowBlock? Would it require special-casing as well?

I had endless trouble with this, and it's one of the main things I hoped review would shed light on.

I had hoped that I could just call rowIdBlock.getPositions(...) and end up with a consistent view of the resulting block. However, when I tried that, way downstream in the Driver I would see out-of-range array references. My assumption is that I'm doing something wrong, but I wasn't successful debugging the problem.

I had endless trouble with this, and it's one of the main things I hoped review would shed light on.

Sorry that i cannot help. Add a TODO comment here, warning the reader we don't exactly know why it's written the way it's written

findepi · 2021-06-17T11:05:39Z

core/trino-main/src/main/java/io/trino/operator/DeleteAndInsertMergeProcessor.java

Shouldn't this actually depend on rowIdType?

also, direct use of ArrayBlock is not correct. Typically you would use io.trino.spi.type.Type#createBlockBuilder(io.trino.spi.block.BlockBuilderStatus, int) to construct a block of values for given type.

Here, however, you actually want to create a single-value NULL block (nativeValueToBlock may be helpful) and wrap it in a RunLengthEncodedBlock instead

core/trino-main/src/main/java/io/trino/operator/DeleteAndInsertMergeProcessor.java

kasiafi

Some comments and questions regarding the planner part. I still have a few classes to review.

core/trino-main/src/main/java/io/trino/sql/planner/QueryPlanner.java

core/trino-main/src/main/java/io/trino/sql/planner/iterative/rule/PruneMergeSourceColumns.java

core/trino-main/src/main/java/io/trino/sql/planner/optimizations/UnaliasSymbolReferences.java

core/trino-main/src/main/java/io/trino/sql/planner/optimizations/SymbolMapper.java

docs/src/main/sphinx/sql/merge.rst

docs/src/main/sphinx/develop/supporting-merge.rst

findepi · 2022-06-13T09:43:47Z

what I have understood, only a rebase to main is missing before pull request will be accepted.

@harlequin not exactly, but hopefully not very far from that. @electrum recently told me there is some correctness issue lingering somewhere, probably on the engine-side of the changes proposed here. Per my understanding David is going to take a stab and hunting it down.

nicor88 · 2022-06-24T06:21:51Z

Any plan to re-work this feature soon? We want to use Trino as the main framework for our ETL, using Iceberg, and having MERGE operations will be indeed awesome, otherwise, we might need to fall back to Spark.

djsagain · 2022-07-01T04:07:25Z

Today @electrum and I found and fixed the long-standing problem with this PR that caused SQL MERGE tests with lots of modified/deleted rows to fail. The root cause of the failures was using Block.getChildren, and was fixed by using ColumnarRow in two places in the DeleteAndInsertRowProcessor. In the process, @electrum added a Raptor connector implementation of SQL MERGE, which is much easier to debug than the product tests for Hive SQL MERGE.

Here are the tasks I know of to finish off the work on SQL MERGE:

Replicate the merge tests in TestHiveTransactionalTable for Raptor. This is mostly cut-and-paste/formatting (assuming the tests all pass).
Eliminate the SPI call ConnectorMetadata.getWriteRedistributionColumnHandles - - replacing it with a call to the existing getInsertLayout method.
Add the new SPI call ConnectorMetadata.getUpdateInsertLayout, or whatever we decide to call it. This layout is needed for unbucketed Raptor and Iceberg SQL MERGE operations, according to @electrum.
Make sure that the internal developer documentation for SQL MERGE reflects reality.

sopel39 · 2022-07-01T19:43:00Z

@djsstarburst Will merge work similar to INSERTS, e.g. allow complex plans that have UNIONS, joins or aggregations?

djsagain · 2022-07-01T20:00:21Z

Will merge work similar to INSERTS, e.g. allow complex plans that have UNIONS, joins or aggregations?

Hi Karol. Yes, I believe so. You can see the new tests in that PR in TestHiveTransactionalTable. The test names all start with testMerge.

If you see something untested, or have a test to propose, please send it my way.

djsagain · 2022-07-08T14:24:48Z

Eliminate the SPI call ConnectorMetadata.getWriteRedistributionColumnHandles - - replacing it with a call to the existing getInsertLayout method.

This is done - - getWriteRedistributionColumns is no longer mentioned in either the code or the developer documentation.

nicor88 · 2022-07-13T08:45:13Z

@djsstarburst will this feature cover also DeltaLake connector?

core/trino-main/src/main/java/io/trino/sql/planner/plan/JoinNode.java

core/trino-main/src/main/java/io/trino/sql/planner/optimizations/AddExchanges.java

core/trino-main/src/main/java/io/trino/sql/planner/plan/MergeProcessorNode.java

sopel39 · 2022-07-26T14:19:41Z

core/trino-main/src/main/java/io/trino/sql/planner/optimizations/AddExchanges.java

don't you need to add exchanges below MergeProcessorNode?

I don't understand the question, but the produced plan seems to be correct. We have these stages in the plan:

(scan[target], scan[source]) -> RightJoin -> MergeProcessor -> MergeWriter -> TableCommit

Take a look at the supporting-merge.rst which might help explain the structure.

How do you make sure RightJoin does not participate in CBO and doesn't get reordered or flipped?

Why would that matter? If the CBO is working properly it should not change the output of the join operation, so if it has a better plan that should be ok. right?

testing/trino-testing/src/main/java/io/trino/testing/BaseConnectorSmokeTest.java

dain

Looks good to me

plugin/trino-hive/src/main/java/io/trino/plugin/hive/AbstractHiveAcidWriters.java

plugin/trino-hive/src/main/java/io/trino/plugin/hive/HiveMetadata.java

dain · 2022-08-02T00:11:13Z

plugin/trino-hive/src/main/java/io/trino/plugin/hive/HiveMetadata.java

Maybe a better message here

Changed to:

The target table in Hive MERGE must be a transactional table

dain · 2022-08-02T03:14:30Z

core/trino-main/src/main/java/io/trino/sql/planner/optimizations/AddExchanges.java

Why would that matter? If the CBO is working properly it should not change the output of the join operation, so if it has a better plan that should be ok. right?

dain · 2022-08-02T03:26:50Z

docs/src/main/sphinx/sql/merge.rst

We should mention what happens if not clauses match the row and there is no default WHEN MATCHED clause

Added:

If a source row is not matched by any ``WHEN`` clause and there is no ``WHEN NOT MATCHED` clause, the source row is ignored.

plugin/trino-kudu/src/main/java/io/trino/plugin/kudu/KuduMetadata.java

plugin/trino-delta-lake/src/main/java/io/trino/plugin/deltalake/DeltaLakeMergeSink.java

core/trino-spi/src/main/java/io/trino/spi/connector/ConnectorNodePartitioningProvider.java

...ino-blackhole/src/main/java/io/trino/plugin/blackhole/BlackHoleNodePartitioningProvider.java

This version works under emulation on M1 Macs.

This allows the engine to make the decision about how many nodes to use as appropriate, based on the number of workers or hash partition count session property. This is also required for MERGE so that the insert and update layouts can use the same mapping.

This commit adds support for SQL MERGE in the Trino engine. It introduces an enum RowChangeParadigm, which characterizes how a connector modifies rows. Hive and Iceberg will use the DELETE_ROW_AND_INSERT_ROW paradigm, since they represent an updated row as a deleted row and an inserted row. Kudu will use the CHANGE_ONLY_UPDATED_COLUMNS paradigm. Each paradigm corresponds to an implementation of the RowChangeProcessor interface. The intent is to retrofit SQL UPDATE to use the same RowChangeParadigm/Processor mechanism. The SQL MERGE implementation allows update of all columns, including partition or bucket columns, and the Trino engine performs redistribution to ensure that the updated rows end up on the appropriate nodes. MERGE processing is extensively documented in the new file in the developer documentation, supporting-merge.rst.

This commit adds SQL MERGE support in the Hive connector and a raft of MERGE tests to verify that it works.

cla-bot bot added the cla-signed label May 16, 2021

djsagain requested review from dain, electrum, findepi, kasiafi and martint May 16, 2021 19:21

djsagain mentioned this pull request May 16, 2021

Support SQL MERGE in the Trino engine and Hive and Kudu connectors #7386

Closed

djsagain force-pushed the david.stryker/support-sql-merge-final branch from b87802f to d65286e Compare May 18, 2021 12:35

electrum reviewed May 25, 2021

View reviewed changes

djsagain force-pushed the david.stryker/support-sql-merge-final branch from d65286e to f88718f Compare May 26, 2021 00:07

djsagain force-pushed the david.stryker/support-sql-merge-final branch 2 times, most recently from 3108a8d to db83bfe Compare May 27, 2021 13:27

kasiafi reviewed Jun 15, 2021

View reviewed changes

kasiafi reviewed Jun 16, 2021

View reviewed changes

djsagain force-pushed the david.stryker/support-sql-merge-final branch 2 times, most recently from 1b878ef to 238eb2d Compare June 16, 2021 17:31

djsagain force-pushed the david.stryker/support-sql-merge-final branch 2 times, most recently from 6038c7f to b373e2b Compare June 16, 2021 19:03

kasiafi reviewed Jun 17, 2021

View reviewed changes

core/trino-main/src/main/java/io/trino/sql/analyzer/StatementAnalyzer.java Outdated Show resolved Hide resolved

core/trino-main/src/main/java/io/trino/sql/analyzer/Analysis.java Outdated Show resolved Hide resolved

kasiafi reviewed Jun 17, 2021

View reviewed changes

findepi reviewed Jun 17, 2021

View reviewed changes

kasiafi reviewed Jun 17, 2021

View reviewed changes

djsagain force-pushed the david.stryker/support-sql-merge-final branch 2 times, most recently from f4a18f7 to 083ab11 Compare June 17, 2021 15:10

findepi reviewed Jun 17, 2021

View reviewed changes

This was referenced Jul 19, 2022

Iceberg cleanup #13236

Merged

Add utility to resolve failure function #13222

Merged

sopel39 reviewed Jul 26, 2022

View reviewed changes

electrum reviewed Jul 30, 2022

View reviewed changes

testing/trino-testing/src/main/java/io/trino/testing/BaseConnectorSmokeTest.java Outdated Show resolved Hide resolved

dain reviewed Aug 2, 2022

View reviewed changes

dain approved these changes Aug 3, 2022

View reviewed changes

core/trino-spi/src/main/java/io/trino/spi/connector/ConnectorNodePartitioningProvider.java Outdated Show resolved Hide resolved

...ino-blackhole/src/main/java/io/trino/plugin/blackhole/BlackHoleNodePartitioningProvider.java Outdated Show resolved Hide resolved

electrum and others added 12 commits August 4, 2022 14:47

Update Kudu Toxiproxy to 2.1.4

5689dfb

This version works under emulation on M1 Macs.

Allow non-standard schemas for UPDATE smoke tests

1516b6d

Add support for TINYINT and REAL types in Raptor

0912cbd

Add CatalogHandle utility method in NodePartitioningManager

660ceb8

Add default implementation of getSplitBucketFunction

6b4c901

Support SQL MERGE in the Hive connector

e906548

This commit adds SQL MERGE support in the Hive connector and a raft of MERGE tests to verify that it works.

Support SQL MERGE in the Kudu connector

cf5c25c

Support SQL MERGE in the Raptor connector

435d100

Support SQL MERGE in the Iceberg connector

6cb188b

Support SQL MERGE in the Delta Lake connector

53a4500

nineinchnick mentioned this pull request Aug 8, 2022

Upgrade errorprone to version 2.15.0 #13540

Merged

jhlodin mentioned this pull request Aug 8, 2022

Add MERGE to SQL support documentation #13548

Merged

colebow mentioned this pull request Aug 8, 2022

Add Trino 393 release notes #13519

Merged

findepi mentioned this pull request Sep 6, 2022

Fix "No bucket node map" failure when inserting into Iceberg table #14003

Closed

Conversation

djsagain commented May 16, 2021 • edited by findepi Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

electrum left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

djsagain commented May 26, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kasiafi left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kasiafi left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

djsagain commented Jun 16, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

findepi commented Jun 17, 2021

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kasiafi left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

findepi Jun 17, 2021

Choose a reason for hiding this comment

Uh oh!

djsagain Jun 17, 2021

Choose a reason for hiding this comment

Uh oh!

djsagain commented May 16, 2021 •

edited by findepi

Loading

djsagain commented May 26, 2021 •

edited

Loading

djsagain commented Jun 16, 2021 •

edited

Loading

djsagain commented Jul 8, 2022 •

edited

Loading

electrum Jul 26, 2022 •

edited

Loading

sopel39 Jul 27, 2022 •

edited

Loading