Support SQL MERGE for Phoenix connector #16693

chenjian2664 · 2023-03-23T12:15:47Z

Description

Additional context and related issues

Release notes

( ) This is not user-visible or docs only and no release notes are required.
( ) Release notes are required, please propose a release note for me.
(x) Release notes are required, with the following suggested text:

# Phoenix
* Support SQL MERGE. ({issue}`16661`)

djsagain

In general the implementation of the merge-specific ConnectorMetadata methods look good to me, and I really like all the consistency checking you do in this PR.

I'm not approving, because I'm not familiar with the Jdbc connectors. I think you need to get an expert on Jdbc connectors to also review the PR. When they say they like what they see, I'll approve as well.

plugin/trino-phoenix5/src/main/java/io/trino/plugin/phoenix5/PhoenixClient.java

plugin/trino-phoenix5/src/main/java/io/trino/plugin/phoenix5/PhoenixMergeSink.java

plugin/trino-phoenix5/src/main/java/io/trino/plugin/phoenix5/PhoenixMetadata.java

chenjian2664 · 2023-03-24T14:39:01Z

In general the implementation of the merge-specific ConnectorMetadata methods look good to me, and I really like all the consistency checking you do in this PR.

Thanks for your time! I have learnt a lot from your code.

I'm not approving, because I'm not familiar with the Jdbc connectors. I think you need to get an expert on Jdbc connectors to also review the PR. When they say they like what they see, I'll approve as well.

cc @hashhar @kokosing

kokosing

Can we first implement this in base-jdbc connector and then use it for phoenix? Phoenix was based on that library and I would like to keep this way, otherwise I am afraid that base-jdbc and phoenix would diverge.

chenjian2664 · 2023-03-29T08:57:41Z

Can we first implement this in base-jdbc connector and then use it for phoenix? Phoenix was based on that library and I would like to keep this way, otherwise I am afraid that base-jdbc and phoenix would diverge.

@kokosing Do you mean support MERGE in base-jdbc module and then let phoenix reuse it?

kokosing · 2023-03-29T10:55:28Z

Do you mean support MERGE in base-jdbc module and then let phoenix reuse it?

Yes. But maybe we can do it other way round too. My second thought is that doing it first in base-jdbc could be premature.

chenjian2664 · 2023-03-29T12:44:23Z

Do you mean support MERGE in base-jdbc module and then let phoenix reuse it?

Yes. But maybe we can do it other way round too. My second thought is that doing it first in base-jdbc could be premature.

Yes, My plan is to contributed into Phoenix and Ignite first, then abstract the implementation to the base-jdbc. I was think contributing to the base-jdbc at the beginning of the work as well.

vlad-lyutenko · 2023-03-29T13:05:10Z

I am really new to this part of Trino.
But can you please clarify for me a few things:

As far as I understand getMergeRowIdColumnHandle is used by engine, to understand which unique row identifier of TARGET table he can use to perform later UPDATEs (For Phoenix when we will be doing UPSERT, rowkey will be used). So we call this method and kind of forcefully add this column handle to table handle of TARGET table in StatementAnalyzer:

if (addRowIdColumn) {
                // Add the row id field
                ColumnHandle rowIdColumnHandle = metadata.getMergeRowIdColumnHandle(session, tableHandle.get());
                Type type = metadata.getColumnMetadata(session, tableHandle.get(), rowIdColumnHandle).getType();
                Field field = Field.newUnqualified(Optional.empty(), type);
                fields.add(field);
                analysis.setColumn(field, rowIdColumnHandle);
            }

In your code I see that you use $merge_row_id as name, and then you kind of map internal Phoenix ROWKEY column to this synthetic $merge_row_id (in PageSource e.t.c)

So my question is why we cannot directly use ROWKEY instead of merge_row_id, so we can kind of force add this column to table handle in beginMerge and then kind of forget about this and treat as normal column without doing this redirection mapping in PageSource and later in MergeSink.
Something similar to Kudu connector.

For me your current approach looks super helpfull, if we for example want to use some primary key column as $merge_row_id. But if we already have this nice unique identifier - ROWKEY, why just not use it as it is.
But maybe I miss something, thanks!

chenjian2664 · 2023-03-30T03:06:44Z

@vlad-lyutenko Thank you for watching my code! This is a really good question! I was consider doing like Kudu initially, simple conclusion is that the adjustments won't be less than the current and is hard to extend to other JDBC connectors if follow Kudu is done. The major factor is that while performing operations(SELETE, DELTE, UPDATE) JDBC need to specified the columns while kuduClient can use the api the client supported directly. For the ROWKEY column, Kudu set the value(In Trino) but Phoenix is assigning the value from Phoenix sequence.
If the table already define the ROWKEY, we can using the ROWKEY directly, but while not define the ROWKEY and specified multi primary keys, we still need a synthetic column to describe it.
There are two places to do the "mapping": one is mapping in the PageSource, another is mapping in the RecordCursor and RecordSet. The latter one is done in Kudu, let's see what we need at least to do this way in Phoenix:

We need a new column handle extend JdbcColumnHandle add sth like ordinalPosition and update all the get method. Also note the JdbcColumnHandle is a final class. And of course the RecordCursor and RecordSet also need to adjust for it.
Define sth like PartialRow to allow store the primary keys while reading from the cursor, and encoder class like KeyEncoderAccessor in Kudu to allow encode and decode the primary keys from the resultSet, and note the JDBC connectors not force the primary keys definition is always in front of the non-primary keys columns, so here need more work than Kudu. And after this still need a method to allow client delete using the row, this is kudu have but not Phoenix.(Or we can use existing appendColumn method, if so we can not ignore the ROWKEY column).
While DELETE and UPDATE through JDBC statement(Kudu client has api through row directly), we need describe the ROWKEY column using a proper JDBC type(maybe row), and custom JdbcPageSink while build the columnWriters, here we can not ignore about the ROWKEY column, the logic is similar to the pageSource do.
While scan, see the Phoenix client buildSql, if we do not handle extra ROWKEY in pageSource, then we need to handle the logic here, not like Kudu can use clientSession open a "table" directly, we can not forget about the ROWKEY here.

Others adjustments may like we need change the ROWKEY type, BIGINT now may not able to store infos about multi primary keys..
I am not sure answer you question clearly, any feedback would be appreciated.

vlad-lyutenko · 2023-03-30T09:17:50Z

I am not sure answer you question clearly, any feedback would be appreciated.

Big thanks!!! you clarify the things.

hashhar

first pass, looks good. some places I need to take a deeper look.

plugin/trino-base-jdbc/src/main/java/io/trino/plugin/jdbc/JdbcPageSink.java

plugin/trino-phoenix5/src/main/java/io/trino/plugin/phoenix5/PhoenixClient.java

plugin/trino-phoenix5/src/main/java/io/trino/plugin/phoenix5/PhoenixConnector.java

plugin/trino-phoenix5/src/main/java/io/trino/plugin/phoenix5/PhoenixMetadata.java

plugin/trino-phoenix5/src/main/java/io/trino/plugin/phoenix5/PhoenixPageSource.java

plugin/trino-phoenix5/src/main/java/io/trino/plugin/phoenix5/PhoenixMergeSink.java

hashhar

Looks awesome, can you verify if the MERGE works for non-lowercase tables as well?

Looks good to go other than that.

plugin/trino-phoenix5/src/main/java/io/trino/plugin/phoenix5/PhoenixMergeSink.java

hashhar · 2023-03-31T13:24:26Z

FYI @lhofhansl , you may be interested in this.

Thanks @chenjian2664 for working on this. I'll merge once CI is finished.

chenjian2664 · 2023-04-01T08:18:08Z

For https://github.com/trinodb/trino/actions/runs/4579617258/jobs/8090257129?pr=16693#step:6:5787, I think it's NOT got involved in in this PR.
I have tried a loop can not reproduce the problem on my local

plugin/trino-phoenix5/src/main/java/io/trino/plugin/phoenix5/PhoenixPageSource.java

hashhar · 2023-04-03T06:31:04Z

Much nicer, thanks @raunaqmorarka for the ColumnAdaptation idea.

plugin/trino-phoenix5/src/main/java/io/trino/plugin/phoenix5/PhoenixMergeSink.java

plugin/trino-phoenix5/src/main/java/io/trino/plugin/phoenix5/PhoenixPageSource.java

plugin/trino-phoenix5/src/main/java/io/trino/plugin/phoenix5/PhoenixMergeSink.java

chenjian2664 · 2023-04-03T09:50:30Z

Known issue, https://github.com/trinodb/trino/actions/runs/4594128378/jobs/8113690320?pr=16693#step:6:11096
#16671

cla-bot bot added the cla-signed label Mar 23, 2023

chenjian2664 requested review from djsagain and ebyhr March 23, 2023 12:15

chenjian2664 self-assigned this Mar 23, 2023

chenjian2664 added the enhancement New feature or request label Mar 23, 2023

hashhar self-requested a review March 23, 2023 12:31

ebyhr removed their request for review March 23, 2023 22:37

chenjian2664 mentioned this pull request Mar 24, 2023

Support SQL MERGE for jdbc connectors #16709

Open

djsagain reviewed Mar 24, 2023

View reviewed changes

chenjian2664 requested a review from kokosing March 28, 2023 13:25

kokosing requested a review from vlad-lyutenko March 29, 2023 08:19

kokosing reviewed Mar 29, 2023

View reviewed changes

kokosing requested a review from wendigo March 29, 2023 08:23

hashhar reviewed Mar 30, 2023

View reviewed changes

hashhar reviewed Mar 31, 2023

View reviewed changes

plugin/trino-phoenix5/src/main/java/io/trino/plugin/phoenix5/PhoenixMergeSink.java Outdated Show resolved Hide resolved

plugin/trino-phoenix5/src/main/java/io/trino/plugin/phoenix5/PhoenixMergeSink.java Outdated Show resolved Hide resolved

hashhar approved these changes Mar 31, 2023

View reviewed changes

raunaqmorarka reviewed Apr 2, 2023

View reviewed changes

chenjian2664 requested a review from raunaqmorarka April 3, 2023 06:11

raunaqmorarka reviewed Apr 3, 2023

View reviewed changes

plugin/trino-phoenix5/src/main/java/io/trino/plugin/phoenix5/PhoenixMergeSink.java Outdated Show resolved Hide resolved

raunaqmorarka reviewed Apr 3, 2023

View reviewed changes

plugin/trino-phoenix5/src/main/java/io/trino/plugin/phoenix5/PhoenixPageSource.java Outdated Show resolved Hide resolved

plugin/trino-phoenix5/src/main/java/io/trino/plugin/phoenix5/PhoenixMergeSink.java Outdated Show resolved Hide resolved

Support MERGE for Phoenix connector

1de0013

chenjian2664 requested review from djsagain, hashhar and raunaqmorarka April 6, 2023 03:28

raunaqmorarka approved these changes Apr 6, 2023

View reviewed changes

hashhar approved these changes Apr 6, 2023

View reviewed changes

hashhar merged commit 6df669b into trinodb:master Apr 6, 2023

This was referenced Apr 6, 2023

Support MERGE for Phoenix connector #16661

Closed

Release notes for 413 #16895

Closed

github-actions bot added this to the 413 milestone Apr 6, 2023

colebow mentioned this pull request Apr 12, 2023

Add Trino 413 release notes #16997

Merged

chenjian2664 mentioned this pull request May 22, 2023

Base jdbc MERGE support #16944

Closed

chenjian2664 deleted the merge_support branch May 22, 2023 22:16

chenjian2664 mentioned this pull request Apr 30, 2024

Jdbc merge support #20532

Closed

Support SQL MERGE for Phoenix connector #16693

Support SQL MERGE for Phoenix connector #16693

Uh oh!

Conversation

chenjian2664 commented Mar 23, 2023

Description

Additional context and related issues

Release notes

Uh oh!

djsagain left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chenjian2664 commented Mar 24, 2023

Uh oh!

kokosing left a comment

Choose a reason for hiding this comment

Uh oh!

chenjian2664 commented Mar 29, 2023

Uh oh!

kokosing commented Mar 29, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chenjian2664 commented Mar 29, 2023

Uh oh!

vlad-lyutenko commented Mar 29, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chenjian2664 commented Mar 30, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vlad-lyutenko commented Mar 30, 2023

Uh oh!

hashhar left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

hashhar left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

hashhar commented Mar 31, 2023

Uh oh!

chenjian2664 commented Apr 1, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

hashhar commented Apr 3, 2023

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chenjian2664 commented Apr 3, 2023

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

6 participants

kokosing commented Mar 29, 2023 •

edited

Loading

vlad-lyutenko commented Mar 29, 2023 •

edited

Loading

chenjian2664 commented Mar 30, 2023 •

edited

Loading

chenjian2664 commented Apr 1, 2023 •

edited

Loading