Skip to content

Support SQL MERGE in the Trino engine and five connectors#7933

Merged
electrum merged 12 commits intotrinodb:masterfrom
djsagain:david.stryker/support-sql-merge-final
Aug 5, 2022
Merged

Support SQL MERGE in the Trino engine and five connectors#7933
electrum merged 12 commits intotrinodb:masterfrom
djsagain:david.stryker/support-sql-merge-final

Conversation

@djsagain
Copy link
Member

@djsagain djsagain commented May 16, 2021

This PR is a second take on implementing SQL MERGE. It consists commits that add support for SQL MERGE in the Trino engine and in the Hive, Kudu, Raptor, Iceberg and Delta Lake connectors. The implementation is structured so that most of the work happens in the Trino engine, so adding support in a connector is pretty simple.

The SQL MERGE implementation allows update of all columns, including partition or bucket columns, and the Trino engine performs redistribution to ensure that the updated rows end up on the appropriate nodes.

The Trino engine commit introduces an enum RowChangeParadigm, which characterizes how a connector modifies rows. Hive uses and Iceberg will use the DELETE_ROW_AND_INSERT_ROW paradigm, since they represent an updated row as a deleted row and an inserted row. Kudu uses the CHANGE_ONLY_UPDATED_COLUMNS paradigm.

Each paradigm corresponds to an implementation of the RowChangeProcessor interface. After this PR is merged, the intent is to retrofit SQL UPDATE to use the same RowChangeParadigm/Processor mechanism.

Extensive documentation on the internal MERGE architecture can be found in the developer doc supporting-merge.rst.

Fixes #7708

Copy link
Member

@electrum electrum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Kudu commit looks good

@djsagain djsagain force-pushed the david.stryker/support-sql-merge-final branch from d65286e to f88718f Compare May 26, 2021 00:07
@djsagain
Copy link
Member Author

djsagain commented May 26, 2021

Thanks for the great comments, @electrum. I did everything you suggested.

@djsagain djsagain force-pushed the david.stryker/support-sql-merge-final branch 2 times, most recently from 3108a8d to db83bfe Compare May 27, 2021 13:27
Copy link
Member

@kasiafi kasiafi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A lot of questions and some comments. I've gone through the docs, and partially through the analysis.

Copy link
Member

@kasiafi kasiafi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some more comments regarding the analyzer. Initial comments on the planner part.

@djsagain djsagain force-pushed the david.stryker/support-sql-merge-final branch 2 times, most recently from 1b878ef to 238eb2d Compare June 16, 2021 17:31
@djsagain
Copy link
Member Author

djsagain commented Jun 16, 2021

Thanks for the great first batch of comments, @kasiafi! I believe I've addressed the comments from yesterday except those listed below. It would be great if you could resolve the comments you think have been handled to your satisfaction.

I haven't addressed the more profound comments made 4 hours ago yet, and some of them will require coaching from you or @martint.

Here are the comments from yesterday that I haven't addressed:

  • Does DuplicateRowFinder need to compare the writeRedistribution columns?
  • Will matched target table rowIds really come out in order such that DuplicateRowFinder is guaranteed to identify them?
  • Implementing multiple assignment.
  • Addressing your comment: "Instead of assigning a scope to an Identifier, the aliased table should parse as AliasedRelation."
  • Addressing your comment: "What if the table was a materialized view?"

@djsagain djsagain force-pushed the david.stryker/support-sql-merge-final branch 2 times, most recently from 6038c7f to b373e2b Compare June 16, 2021 19:03
@findepi
Copy link
Member

findepi commented Jun 17, 2021

re #7933 (comment)

target table rowIds would be partitioned among nodes

@djsstarburst can you please point me to a document outlining how MERGE interacts with connectors?

i would like to learn about the following

  • what are the assumption on rowIds, can rowIds carry un-updated columns
  • how should a connector construct rowIds if it needs to create deletion delta files for the sake of updates (e.g. a separate deletion file for an input file which would mark all the rows that got updated)
  • what is table handle lifecycle for MERGE. for example, how MERGE interacts with partition, file and file chunk pruning

Copy link
Member

@kasiafi kasiafi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here are some comments regarding the previously reviewed part. Additionally, I answered some of your replies directly. I resolved all conversations except those that require a follow-up.

I plan to review next portions of code, and put my comments in a new batch.

Comment on lines 225 to 228
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why RowBlock is special-cased here?
What if underlyingBlock is a DictionaryBlock over a RowBlock? Would it require special-casing as well?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had endless trouble with this, and it's one of the main things I hoped review would shed light on.

I had hoped that I could just call rowIdBlock.getPositions(...) and end up with a consistent view of the resulting block. However, when I tried that, way downstream in the Driver I would see out-of-range array references. My assumption is that I'm doing something wrong, but I wasn't successful debugging the problem.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had endless trouble with this, and it's one of the main things I hoped review would shed light on.

Sorry that i cannot help. Add a TODO comment here, warning the reader we don't exactly know why it's written the way it's written

Comment on lines 244 to 230
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this actually depend on rowIdType?

also, direct use of ArrayBlock is not correct. Typically you would use io.trino.spi.type.Type#createBlockBuilder(io.trino.spi.block.BlockBuilderStatus, int) to construct a block of values for given type.

Here, however, you actually want to create a single-value NULL block (nativeValueToBlock may be helpful) and wrap it in a RunLengthEncodedBlock instead

Copy link
Member

@kasiafi kasiafi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments and questions regarding the planner part. I still have a few classes to review.

@djsagain djsagain force-pushed the david.stryker/support-sql-merge-final branch 2 times, most recently from f4a18f7 to 083ab11 Compare June 17, 2021 15:10
@findepi
Copy link
Member

findepi commented Jun 13, 2022

what I have understood, only a rebase to main is missing before pull request will be accepted.

@harlequin not exactly, but hopefully not very far from that. @electrum recently told me there is some correctness issue lingering somewhere, probably on the engine-side of the changes proposed here. Per my understanding David is going to take a stab and hunting it down.

@nicor88
Copy link

nicor88 commented Jun 24, 2022

Any plan to re-work this feature soon? We want to use Trino as the main framework for our ETL, using Iceberg, and having MERGE operations will be indeed awesome, otherwise, we might need to fall back to Spark.

@djsagain
Copy link
Member Author

djsagain commented Jul 1, 2022

Today @electrum and I found and fixed the long-standing problem with this PR that caused SQL MERGE tests with lots of modified/deleted rows to fail. The root cause of the failures was using Block.getChildren, and was fixed by using ColumnarRow in two places in the DeleteAndInsertRowProcessor. In the process, @electrum added a Raptor connector implementation of SQL MERGE, which is much easier to debug than the product tests for Hive SQL MERGE.

Here are the tasks I know of to finish off the work on SQL MERGE:

  • Replicate the merge tests in TestHiveTransactionalTable for Raptor. This is mostly cut-and-paste/formatting (assuming the tests all pass).
  • Eliminate the SPI call ConnectorMetadata.getWriteRedistributionColumnHandles - - replacing it with a call to the existing getInsertLayout method.
  • Add the new SPI call ConnectorMetadata.getUpdateInsertLayout, or whatever we decide to call it. This layout is needed for unbucketed Raptor and Iceberg SQL MERGE operations, according to @electrum.
  • Make sure that the internal developer documentation for SQL MERGE reflects reality.

@sopel39
Copy link
Member

sopel39 commented Jul 1, 2022

@djsstarburst Will merge work similar to INSERTS, e.g. allow complex plans that have UNIONS, joins or aggregations?

@djsagain
Copy link
Member Author

djsagain commented Jul 1, 2022

Will merge work similar to INSERTS, e.g. allow complex plans that have UNIONS, joins or aggregations?

Hi Karol. Yes, I believe so. You can see the new tests in that PR in TestHiveTransactionalTable. The test names all start with testMerge.

If you see something untested, or have a test to propose, please send it my way.

@djsagain
Copy link
Member Author

djsagain commented Jul 8, 2022

Eliminate the SPI call ConnectorMetadata.getWriteRedistributionColumnHandles - - replacing it with a call to the existing getInsertLayout method.

This is done - - getWriteRedistributionColumns is no longer mentioned in either the code or the developer documentation.

@nicor88
Copy link

nicor88 commented Jul 13, 2022

@djsstarburst will this feature cover also DeltaLake connector?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't you need to add exchanges below MergeProcessorNode?

Copy link
Member

@electrum electrum Jul 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the question, but the produced plan seems to be correct. We have these stages in the plan:

(scan[target], scan[source]) -> RightJoin -> MergeProcessor -> MergeWriter -> TableCommit

Take a look at the supporting-merge.rst which might help explain the structure.

Copy link
Member

@sopel39 sopel39 Jul 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you make sure RightJoin does not participate in CBO and doesn't get reordered or flipped?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would that matter? If the CBO is working properly it should not change the output of the join operation, so if it has a better plan that should be ok. right?

Copy link
Member

@dain dain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a better message here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to:

The target table in Hive MERGE must be a transactional table

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would that matter? If the CBO is working properly it should not change the output of the join operation, so if it has a better plan that should be ok. right?

Comment on lines 43 to 45
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should mention what happens if not clauses match the row and there is no default WHEN MATCHED clause

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added:

If a source row is not matched by any ``WHEN`` clause and there is no
``WHEN NOT MATCHED` clause, the source row is ignored.

electrum and others added 12 commits August 4, 2022 14:47
This version works under emulation on M1 Macs.
This allows the engine to make the decision about how many nodes to
use as appropriate, based on the number of workers or hash partition
count session property. This is also required for MERGE so that the
insert and update layouts can use the same mapping.
This commit adds support for SQL MERGE in the Trino engine.
It introduces an enum RowChangeParadigm, which characterizes
how a connector modifies rows.  Hive and Iceberg will use the
DELETE_ROW_AND_INSERT_ROW paradigm, since they represent an
updated row as a deleted row and an inserted row.  Kudu will
use the CHANGE_ONLY_UPDATED_COLUMNS paradigm.

Each paradigm corresponds to an implementation of the
RowChangeProcessor interface.  The intent is to retrofit SQL
UPDATE to use the same RowChangeParadigm/Processor mechanism.

The SQL MERGE implementation allows update of all columns,
including partition or bucket columns, and the Trino engine
performs redistribution to ensure that the updated rows
end up on the appropriate nodes.

MERGE processing is extensively documented in the new
file in the developer documentation, supporting-merge.rst.
This commit adds SQL MERGE support in the Hive connector and
a raft of MERGE tests to verify that it works.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

MERGE statement

9 participants