Skip to content

Conversation

@chenjian2664
Copy link
Contributor

Description

#16661

Additional context and related issues

Release notes

( ) This is not user-visible or docs only and no release notes are required.
( ) Release notes are required, please propose a release note for me.
(x) Release notes are required, with the following suggested text:

# Phoenix
* Support SQL MERGE. ({issue}`16661`)

@cla-bot cla-bot bot added the cla-signed label Mar 23, 2023
@chenjian2664 chenjian2664 requested review from djsagain and ebyhr March 23, 2023 12:15
@chenjian2664 chenjian2664 self-assigned this Mar 23, 2023
@chenjian2664 chenjian2664 added the enhancement New feature or request label Mar 23, 2023
@hashhar hashhar self-requested a review March 23, 2023 12:31
@ebyhr ebyhr removed their request for review March 23, 2023 22:37
Copy link
Member

@djsagain djsagain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general the implementation of the merge-specific ConnectorMetadata methods look good to me, and I really like all the consistency checking you do in this PR.

I'm not approving, because I'm not familiar with the Jdbc connectors. I think you need to get an expert on Jdbc connectors to also review the PR. When they say they like what they see, I'll approve as well.

@chenjian2664
Copy link
Contributor Author

In general the implementation of the merge-specific ConnectorMetadata methods look good to me, and I really like all the consistency checking you do in this PR.

Thanks for your time! I have learnt a lot from your code.

I'm not approving, because I'm not familiar with the Jdbc connectors. I think you need to get an expert on Jdbc connectors to also review the PR. When they say they like what they see, I'll approve as well.

cc @hashhar @kokosing

@chenjian2664 chenjian2664 requested a review from kokosing March 28, 2023 13:25
@kokosing kokosing requested a review from vlad-lyutenko March 29, 2023 08:19
Copy link
Member

@kokosing kokosing left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we first implement this in base-jdbc connector and then use it for phoenix? Phoenix was based on that library and I would like to keep this way, otherwise I am afraid that base-jdbc and phoenix would diverge.

@kokosing kokosing requested a review from wendigo March 29, 2023 08:23
@chenjian2664
Copy link
Contributor Author

Can we first implement this in base-jdbc connector and then use it for phoenix? Phoenix was based on that library and I would like to keep this way, otherwise I am afraid that base-jdbc and phoenix would diverge.

@kokosing Do you mean support MERGE in base-jdbc module and then let phoenix reuse it?

@kokosing
Copy link
Member

kokosing commented Mar 29, 2023

Do you mean support MERGE in base-jdbc module and then let phoenix reuse it?

Yes. But maybe we can do it other way round too. My second thought is that doing it first in base-jdbc could be premature.

@chenjian2664
Copy link
Contributor Author

Do you mean support MERGE in base-jdbc module and then let phoenix reuse it?

Yes. But maybe we can do it other way round too. My second thought is that doing it first in base-jdbc could be premature.

Yes, My plan is to contributed into Phoenix and Ignite first, then abstract the implementation to the base-jdbc. I was think contributing to the base-jdbc at the beginning of the work as well.

@vlad-lyutenko
Copy link
Contributor

vlad-lyutenko commented Mar 29, 2023

I am really new to this part of Trino.
But can you please clarify for me a few things:

As far as I understand getMergeRowIdColumnHandle is used by engine, to understand which unique row identifier of TARGET table he can use to perform later UPDATEs (For Phoenix when we will be doing UPSERT, rowkey will be used). So we call this method and kind of forcefully add this column handle to table handle of TARGET table in StatementAnalyzer:

if (addRowIdColumn) {
                // Add the row id field
                ColumnHandle rowIdColumnHandle = metadata.getMergeRowIdColumnHandle(session, tableHandle.get());
                Type type = metadata.getColumnMetadata(session, tableHandle.get(), rowIdColumnHandle).getType();
                Field field = Field.newUnqualified(Optional.empty(), type);
                fields.add(field);
                analysis.setColumn(field, rowIdColumnHandle);
            }

In your code I see that you use $merge_row_id as name, and then you kind of map internal Phoenix ROWKEY column to this synthetic $merge_row_id (in PageSource e.t.c)

So my question is why we cannot directly use ROWKEY instead of merge_row_id, so we can kind of force add this column to table handle in beginMerge and then kind of forget about this and treat as normal column without doing this redirection mapping in PageSource and later in MergeSink.
Something similar to Kudu connector.

For me your current approach looks super helpfull, if we for example want to use some primary key column as $merge_row_id. But if we already have this nice unique identifier - ROWKEY, why just not use it as it is.
But maybe I miss something, thanks!

@chenjian2664
Copy link
Contributor Author

chenjian2664 commented Mar 30, 2023

@vlad-lyutenko Thank you for watching my code! This is a really good question! I was consider doing like Kudu initially, simple conclusion is that the adjustments won't be less than the current and is hard to extend to other JDBC connectors if follow Kudu is done. The major factor is that while performing operations(SELETE, DELTE, UPDATE) JDBC need to specified the columns while kuduClient can use the api the client supported directly. For the ROWKEY column, Kudu set the value(In Trino) but Phoenix is assigning the value from Phoenix sequence.
If the table already define the ROWKEY, we can using the ROWKEY directly, but while not define the ROWKEY and specified multi primary keys, we still need a synthetic column to describe it.
There are two places to do the "mapping": one is mapping in the PageSource, another is mapping in the RecordCursor and RecordSet. The latter one is done in Kudu, let's see what we need at least to do this way in Phoenix:

  1. We need a new column handle extend JdbcColumnHandle add sth like ordinalPosition and update all the get method. Also note the JdbcColumnHandle is a final class. And of course the RecordCursor and RecordSet also need to adjust for it.
  2. Define sth like PartialRow to allow store the primary keys while reading from the cursor, and encoder class like KeyEncoderAccessor in Kudu to allow encode and decode the primary keys from the resultSet, and note the JDBC connectors not force the primary keys definition is always in front of the non-primary keys columns, so here need more work than Kudu. And after this still need a method to allow client delete using the row, this is kudu have but not Phoenix.(Or we can use existing appendColumn method, if so we can not ignore the ROWKEY column).
  3. While DELETE and UPDATE through JDBC statement(Kudu client has api through row directly), we need describe the ROWKEY column using a proper JDBC type(maybe row), and custom JdbcPageSink while build the columnWriters, here we can not ignore about the ROWKEY column, the logic is similar to the pageSource do.
  4. While scan, see the Phoenix client buildSql, if we do not handle extra ROWKEY in pageSource, then we need to handle the logic here, not like Kudu can use clientSession open a "table" directly, we can not forget about the ROWKEY here.

Others adjustments may like we need change the ROWKEY type, BIGINT now may not able to store infos about multi primary keys..
I am not sure answer you question clearly, any feedback would be appreciated.

@vlad-lyutenko
Copy link
Contributor

I am not sure answer you question clearly, any feedback would be appreciated.

Big thanks!!! you clarify the things.

Copy link
Member

@hashhar hashhar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

first pass, looks good. some places I need to take a deeper look.

Copy link
Member

@hashhar hashhar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks awesome, can you verify if the MERGE works for non-lowercase tables as well?

Looks good to go other than that.

@hashhar
Copy link
Member

hashhar commented Mar 31, 2023

FYI @lhofhansl , you may be interested in this.

Thanks @chenjian2664 for working on this. I'll merge once CI is finished.

@chenjian2664
Copy link
Contributor Author

chenjian2664 commented Apr 1, 2023

For https://github.com/trinodb/trino/actions/runs/4579617258/jobs/8090257129?pr=16693#step:6:5787, I think it's NOT got involved in in this PR.
I have tried a loop can not reproduce the problem on my local

@hashhar
Copy link
Member

hashhar commented Apr 3, 2023

Much nicer, thanks @raunaqmorarka for the ColumnAdaptation idea.

@chenjian2664
Copy link
Contributor Author

@hashhar hashhar merged commit 6df669b into trinodb:master Apr 6, 2023
@github-actions github-actions bot added this to the 413 milestone Apr 6, 2023
@chenjian2664 chenjian2664 deleted the merge_support branch May 22, 2023 22:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla-signed enhancement New feature or request

Development

Successfully merging this pull request may close these issues.

6 participants