Support metadata DELETE in JDBC connectors#6287
Conversation
|
Fixes #5275 |
|
How does this interact with predicate pushdown? |
It is based on predicate pushdown. If predicate is fully pushed down, then we can call |
86c4765 to
44da5b6
Compare
44da5b6 to
e5fa7d1
Compare
There was a problem hiding this comment.
What is additionalPredicate? Is it somehow connected with limit? If not, why don't you need it in delete?
There was a problem hiding this comment.
It is not. Connector when creates a splits then it can use additionalPredicate so it can run JDBC queries in parallel. One split where key_column % 2 = 0 and other when key_column % 2 = 1 where you have 2 splits.
There was a problem hiding this comment.
In this case you cannot delete something which does not exist (or can because it could somehow exist?), and it safe not to provide additionalPredicate for delete. However consider situation if you can not match indexes in underlying systems which had been created with such additionalPredicate conditions.
There was a problem hiding this comment.
This code is not executed for delete. buildSql is called only for SELECT. DELETE is using buildDeleteSql there is no additionalPredicate
There was a problem hiding this comment.
Yeah, I meant exactly that place (I started this discussion here because there is no additionalPredicate in code you added for DELETE).
Let me please clarify my comment.
Imagine situation when you have some data (key_column, year, name) in table A.
And there are 2 splits: key_column % 2 = 0 and other when key_column % 2 = 1.
Usually user executes selects on A: select name from A where year = <year> which translates to two with splits:
select name from A where key_column % 2 = 0 and year = <year>
select name from A where key_column % 2 = 1 and year = <year>
To optimize these queries downstream system has CREATE INDEX a_idx ON a (key_column, year)
If you don't propagate additionalPredicate to DELETE this could end up with a_idx is not used for DELETE.
Could you please check if such scenario is covered?
There was a problem hiding this comment.
If you don't propagate additionalPredicate to DELETE this could end up with a_idx is not used for DELETE.
I don't care about it. It does not have to be well performant.
Notice that for this metadata delete there are no splits generated, hence there is no additional predicate nor even actual query execution in Presto. It all happens during the "planning".
There was a problem hiding this comment.
additionalPredicate is not about performance, but about correctness.
anyway, this is split's attribute, and no splits are involved at this point yet, afaict
presto-base-jdbc/src/main/java/io/prestosql/plugin/jdbc/JdbcMetadata.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Can it be not null when beginDelete throws an exception?
There was a problem hiding this comment.
beginDelete is not called if this can be done by metadata delete (applyDelete)
There was a problem hiding this comment.
if supportsDelete=true why this test never assertQuerySucceeds("DELETE FROM "?
Could you please add such scenario? The same is for other connectors.
There was a problem hiding this comment.
Added a comment below, which should answer your question
There was a problem hiding this comment.
// PostgreSql connector currently does not support row-by-row delete
Thanks for comment.
So if supportsDelete=true and connector currently does not support row-by-row delete I'd expect that this connector supports some other type of delete. I see you added testDeleteWithSimplePredicate and testDeleteEntireTable. Maybe it worth to rename this method to something like testDeleteRowByRow?
e5fa7d1 to
c63ddb6
Compare
There was a problem hiding this comment.
| assertQueryFails("DELETE FROM " + tableName + " WHERE orderkey % 2 = 0", "Not supported delete"); | |
| // SqlServer connector currently does not support row-by-row delete | |
| assertQueryFails("DELETE FROM " + tableName + " WHERE orderkey % 2 = 0", "Not supported delete"); |
presto-oracle/src/test/java/io/prestosql/plugin/oracle/BaseTestOracleDistributedQueries.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
// PostgreSql connector currently does not support row-by-row delete
Thanks for comment.
So if supportsDelete=true and connector currently does not support row-by-row delete I'd expect that this connector supports some other type of delete. I see you added testDeleteWithSimplePredicate and testDeleteEntireTable. Maybe it worth to rename this method to something like testDeleteRowByRow?
There was a problem hiding this comment.
Yeah, I meant exactly that place (I started this discussion here because there is no additionalPredicate in code you added for DELETE).
Let me please clarify my comment.
Imagine situation when you have some data (key_column, year, name) in table A.
And there are 2 splits: key_column % 2 = 0 and other when key_column % 2 = 1.
Usually user executes selects on A: select name from A where year = <year> which translates to two with splits:
select name from A where key_column % 2 = 0 and year = <year>
select name from A where key_column % 2 = 1 and year = <year>
To optimize these queries downstream system has CREATE INDEX a_idx ON a (key_column, year)
If you don't propagate additionalPredicate to DELETE this could end up with a_idx is not used for DELETE.
Could you please check if such scenario is covered?
c63ddb6 to
8542efc
Compare
findepi
left a comment
There was a problem hiding this comment.
Looks nice, but there are some parts I do not understand.
presto-base-jdbc/src/main/java/io/prestosql/plugin/jdbc/BaseJdbcClient.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
This looks like common code which could be unified. Thoughts?
There was a problem hiding this comment.
I thought that I have already unified this by extracting applyParameters method. Do you have any idea what else could be done here?
presto-hive/src/test/java/io/prestosql/plugin/hive/TestHiveDistributedQueries.java
Outdated
Show resolved
Hide resolved
presto-iceberg/src/test/java/io/prestosql/plugin/iceberg/TestIcebergDistributed.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
The only difference is the commit is missing. Why?
presto-testing/src/main/java/io/prestosql/testing/AbstractTestDistributedQueries.java
Outdated
Show resolved
Hide resolved
presto-testing/src/main/java/io/prestosql/testing/AbstractTestDistributedQueries.java
Outdated
Show resolved
Hide resolved
presto-testing/src/main/java/io/prestosql/testing/AbstractTestDistributedQueries.java
Outdated
Show resolved
Hide resolved
presto-testing/src/main/java/io/prestosql/testing/AbstractTestDistributedQueries.java
Outdated
Show resolved
Hide resolved
presto-testing/src/main/java/io/prestosql/testing/AbstractTestDistributedQueries.java
Outdated
Show resolved
Hide resolved
presto-base-jdbc/src/main/java/io/prestosql/plugin/jdbc/BaseJdbcClient.java
Outdated
Show resolved
Hide resolved
8542efc to
6a8e391
Compare
presto-base-jdbc/src/main/java/io/prestosql/plugin/jdbc/JdbcMetadata.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
I thought that I have already unified this by extracting applyParameters method. Do you have any idea what else could be done here?
presto-hive/src/test/java/io/prestosql/plugin/hive/TestHiveDistributedQueries.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
beginDelete is not called if this can be done by metadata delete (applyDelete)
There was a problem hiding this comment.
I added io.trino.testing.AbstractTestDistributedQueries#testDeleteWithSimplePredicate that is executed for mysql and covers very simple predicate pushdown.
Notice that I added here predicate with random. I don't want to hardcode what is supported and what extend. Delete is supported if predicate pushdown is supported. Predicate pushdown support is changing over time, so such test would need to be updated frequently.
There was a problem hiding this comment.
It will have to be overriden for all JDBC connectors, test testDelete requires delete row by row which JDBC connector won't have for a while.
Maybe I could add supportsRowByRowDelete()? (I don't like it)
There was a problem hiding this comment.
It was falling that one cannot call commit when autocommit=true. Now I set autocommit explicitly so I can remove explicit commit.
There was a problem hiding this comment.
tuple domain is not used to delete row-by-row, by rather to remove by predicate. By row-by-row I mean to delete query execution in Trino where we collect ids of rows to be deleted and then we remove them (beginDelete and finishDelete methods).
There was a problem hiding this comment.
I think we have to separate openConnection to two methods: openReadOnlyConnection(readonly=true, autocommit=false) and openConnectionForWrite(readonly=false, autocommit=true). But not in this PR.
There was a problem hiding this comment.
isn't it default for JDBC?
It looks it is not.
it actually is, but Phoenix and ClickHouse have it wrong. Addressed in #7875
plugin/trino-hive/src/test/java/io/trino/plugin/hive/TestHiveDistributedQueries.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Can this test set up an ACID table?
i guess file metastore is not supported for ACID, right? cc @djsstarburst
...no-postgresql/src/test/java/io/trino/plugin/postgresql/TestPostgreSqlDistributedQueries.java
Outdated
Show resolved
Hide resolved
...trino-sqlserver/src/test/java/io/trino/plugin/sqlserver/TestSqlServerDistributedQueries.java
Outdated
Show resolved
Hide resolved
plugin/trino-mysql/src/test/java/io/trino/plugin/mysql/TestMySqlDistributedQueries.java
Outdated
Show resolved
Hide resolved
plugin/trino-mysql/src/test/java/io/trino/plugin/mysql/TestMySqlDistributedQueries.java
Outdated
Show resolved
Hide resolved
plugin/trino-base-jdbc/src/main/java/io/trino/plugin/jdbc/JdbcMetadata.java
Outdated
Show resolved
Hide resolved
|
There are two APIs for deletion in
Don't blame me for the goofiness of the two APIs - - both pairs existed before I touched the code :) AFAICT what you've implemented is row-by-row deletion. So perhaps you should use the |
in hive the distinction is needed because dropping partitions is very different operation than "deleting rows from files". the distinction is not as sound in JDBC worlds. Conceptually, you could consider every row to be its own partition, since there is "an API call" which you can remove the data, without reading it. |
|
@djsstarburst Thanks for taking a look!
I implemented here "metadata delete".
That is my case here. Notice that there is no concept of partitions in JDBC connectors. Here |
|
Automation hit #6723 |
|
@ksobolew can you please take a look? |
plugin/trino-base-jdbc/src/main/java/io/trino/plugin/jdbc/CachingJdbcClient.java
Outdated
Show resolved
Hide resolved
plugin/trino-base-jdbc/src/main/java/io/trino/plugin/jdbc/CachingJdbcClient.java
Outdated
Show resolved
Hide resolved
plugin/trino-druid/src/test/java/io/trino/plugin/druid/BaseDruidConnectorTest.java
Outdated
Show resolved
Hide resolved
plugin/trino-base-jdbc/src/main/java/io/trino/plugin/jdbc/BaseJdbcClient.java
Outdated
Show resolved
Hide resolved
plugin/trino-druid/src/main/java/io/trino/plugin/druid/DruidJdbcClient.java
Outdated
Show resolved
Hide resolved
testing/trino-testing/src/main/java/io/trino/testing/TestingConnectorBehavior.java
Outdated
Show resolved
Hide resolved
testing/trino-testing/src/main/java/io/trino/testing/BaseConnectorTest.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Please bring this back. This check was verifying whether ! hasBehavior(SUPPORTS_DELETE) is declared correctly.
There was a problem hiding this comment.
i'd suggest reverting this change.
hasBehavior should allow us to drive the reusable test without need to split it up.
the need to split up the test indicates that testing behaviors are not defined/plugged ideally yet
ksobolew
left a comment
There was a problem hiding this comment.
I don't think I have much to add to what others said. Looks good in general
0ffb179 to
e4d26b4
Compare
There was a problem hiding this comment.
I cannot, as the other method is using SchemaTableName and this is using JdbcTableHandle.
Using always version with SchemaTableName is suboptimal when data change event does not affect entire table.
I think that onDataChanged is too ambitious. It does not fit well with parameter JdbcTableHandle as it does not invalidate data for entire table like one could expect when data is changed. Unless "data" term also refers to "metadata"? I think the easiest is to rename it is to invalidateTableStatistics and do not pretend it is anything more than that.
WDYT?
There was a problem hiding this comment.
This was written in a way you want. And it was failing for certain connectors. Now I don't remember the exact problem. Will do it suggested way and try to capture differences in connectors.
plugin/trino-base-jdbc/src/main/java/io/trino/plugin/jdbc/BaseJdbcClient.java
Outdated
Show resolved
Hide resolved
plugin/trino-base-jdbc/src/main/java/io/trino/plugin/jdbc/DefaultJdbcMetadata.java
Outdated
Show resolved
Hide resolved
plugin/trino-base-jdbc/src/test/java/io/trino/plugin/jdbc/BaseJdbcConnectorSmokeTest.java
Outdated
Show resolved
Hide resolved
2758d03 to
3409fa0
Compare
testing/trino-testing/src/main/java/io/trino/testing/BaseConnectorSmokeTest.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/test/java/io/trino/sql/query/QueryAssertions.java
Outdated
Show resolved
Hide resolved
plugin/trino-base-jdbc/src/main/java/io/trino/plugin/jdbc/BaseJdbcClient.java
Outdated
Show resolved
Hide resolved
plugin/trino-base-jdbc/src/test/java/io/trino/plugin/jdbc/BaseJdbcConnectorTest.java
Outdated
Show resolved
Hide resolved
plugin/trino-base-jdbc/src/test/java/io/trino/plugin/jdbc/BaseJdbcConnectorTest.java
Outdated
Show resolved
Hide resolved
plugin/trino-druid/src/main/java/io/trino/plugin/druid/DruidJdbcClient.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
redundant, because we did not disable autocommit
since this is the only different between this & super, remove override
There was a problem hiding this comment.
I will add a comment, it is required. Otherwise PSQL does not commit the delete.
There was a problem hiding this comment.
is it only reorg of test cases?
move to separate commit
testing/trino-testing/src/main/java/io/trino/testing/BaseConnectorTest.java
Outdated
Show resolved
Hide resolved
...in/trino-iceberg/src/test/java/io/trino/plugin/iceberg/AbstractTestIcebergConnectorTest.java
Outdated
Show resolved
Hide resolved
3409fa0 to
2cd8429
Compare
kokosing
left a comment
There was a problem hiding this comment.
@ssheikin @ksobolew @Praveen2112 Can you please take a look again, a lot changed since your last review.
There was a problem hiding this comment.
I will add a comment, it is required. Otherwise PSQL does not commit the delete.
a8e8533 to
3583b10
Compare
There was a problem hiding this comment.
Outdated comments?
maybe just remove them?
There was a problem hiding this comment.
No. It is copied from other not supported test for deletes. Delete in Iceberg is special.
There was a problem hiding this comment.
I see only testMetadataDelete (without star) in codebase and in this pr.
plugin/trino-hive/src/test/java/io/trino/plugin/hive/TestHiveConnectorTest.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
I know we already discussed it before, but I'm still not convinced to a check where the result is compared with 0.
Up to you.
plugin/trino-base-jdbc/src/test/java/io/trino/plugin/jdbc/BaseJdbcConnectorTest.java
Outdated
Show resolved
Hide resolved
Some connector might have partial support for DELETE statement.
There was a problem hiding this comment.
Can we add support for correlated sub query ... So that we could ensure it works for corelated joins too.
There was a problem hiding this comment.
It is out of the scope of this PR. Surely it won't be supported for jdbc connectors.
3583b10 to
4f22747
Compare
There was a problem hiding this comment.
Where does testMetadataDeleteWithVarcharEquality live?
There was a problem hiding this comment.
It should be testDeleteWithVarcharEqualityPredicate
There was a problem hiding this comment.
I see only testMetadataDelete (without star) in codebase and in this pr.
Yes, and there is |
4f22747 to
2d9a113
Compare
|
Tried with Phoenix in real setup. Works perfectly. |
|
CI hit: #8457 |
| public void testDeleteWithBigintEqualityPredicate() | ||
| { | ||
| if (!hasBehavior(SUPPORTS_DELETE)) { | ||
| assertQueryFails("DELETE FROM region WHERE regionkey = 1", "This connector does not support deletes"); |
There was a problem hiding this comment.
@kokosing In case of connectors where the test infra is shared this can lead to test data being deleted.
Or is the assumption that if the hasBehavior is implemented correctly it shouldn't matter since the actual test performs deletions against copies of tables?
There was a problem hiding this comment.
I'm submitting a PR to change this to follow the approach in testDeleteWithVarcharEqualityPredicate where an explicit TestTable is created for the negative test case.
There was a problem hiding this comment.
@hashhar if you're concerned about that, there are other cases like that in tests, where negative test cases are "dangerous".
(i was not concerned as much as you are ...)
There was a problem hiding this comment.
Support metadata DELETE in JDBC connectors
Fixes #5275