Implement timestamp predicate pushdown in Druid connector#8474
Implement timestamp predicate pushdown in Druid connector#8474jerryleooo wants to merge 6 commits intotrinodb:masterfrom
Conversation
plugin/trino-base-jdbc/src/main/java/io/trino/plugin/jdbc/BaseJdbcClient.java
Outdated
Show resolved
Hide resolved
hashhar
left a comment
There was a problem hiding this comment.
Please add tests. It's difficult to prevent regressions or verify if it's working as expected without it.
Take a look at TestPostgreSqlTypeMapping.
You'll need to modify the tests to perform insertions using the JDBC driver instead of using Trino (since Druid connector doesn't support writes yet).
plugin/trino-druid/src/main/java/io/trino/plugin/druid/DruidJdbcClient.java
Outdated
Show resolved
Hide resolved
plugin/trino-druid/src/main/java/io/trino/plugin/druid/DruidJdbcClient.java
Outdated
Show resolved
Hide resolved
plugin/trino-druid/src/main/java/io/trino/plugin/druid/DruidJdbcClient.java
Outdated
Show resolved
Hide resolved
plugin/trino-druid/src/main/java/io/trino/plugin/druid/DruidJdbcClient.java
Outdated
Show resolved
Hide resolved
|
@jerryleooo Edit the PR to |
OK sure but I think it should be |
JDBC driver may also not work -- in druid they ingest data with: |
|
Hi @hashhar some latest progress:
|
|
Seems like we cannot "write" data at all to Druid. Creating ingest jobs for each possible test table doesn't scale too. @wendigo @Parth-Brahmbhatt @dheerajkulakarni Any ideas how we can work around the Druid limitations to ingest some test data without having to create TSV files and batch ingestion jobs from those? |
|
Hey @hashhar @jerryleooo , I think I have done something similar in this pull request . I created a method in which I was reading from a datasource and pushing it to other datasource with some other name. |
|
Thank @dheerajkulakarni for the idea, however, I think it has a prerequisite -- the target data source has the same schema as the source data source, so we can just change the I guess the final solution would be generating index task file and io file, copy the io file into container, and then do the ingestion. |
|
Hi @hashhar @findepi @dheerajkulakarni
UPDATE: temporarily solved |
plugin/trino-base-jdbc/src/main/java/io/trino/plugin/jdbc/StandardColumnMappings.java
Show resolved
Hide resolved
hashhar
left a comment
There was a problem hiding this comment.
@jerryleooo Can we extract the testing related changes to a separate PR and get that merged first?
Thanks a lot for coming up with the approach of generating both data and the indexing tasks dynamically. Will make it possible to test and hence implement more features in Druid connector.
ac96a1c to
83d7bed
Compare
|
Thanks @hashhar for the review. |
|
@jerryleooo That's a fair point. I agree now. But once we think everything looks good we should at-least move that into a separate commit. I'll review this in a while. |
plugin/trino-druid/src/test/java/io/trino/plugin/druid/DruidSqlDataTypeTest.java
Show resolved
Hide resolved
hashhar
left a comment
There was a problem hiding this comment.
Some initial comments.
Let's break into commits as:
- The public to private change of DRUID_SCHEMA.
- Introduction of the Druid test setup (TestTable, CreateAndInsertDataSetup etc.)
- Adding explicit timestamp mapping and tests for timestamp predicate pushdown and type mapping tests for timestamp. (Let's handle other types as follow-ups to keep scope limited and easier to reason about and review).
plugin/trino-base-jdbc/src/main/java/io/trino/plugin/jdbc/StandardColumnMappings.java
Show resolved
Hide resolved
plugin/trino-druid/src/main/java/io/trino/plugin/druid/DruidJdbcClient.java
Outdated
Show resolved
Hide resolved
plugin/trino-druid/src/main/java/io/trino/plugin/druid/DruidJdbcClient.java
Show resolved
Hide resolved
| @Test | ||
| public void testPredicatePushdown() | ||
| { | ||
| assertThat(query("SELECT * FROM orders where __time > DATE'1970-01-01'")).isFullyPushedDown(); |
There was a problem hiding this comment.
Copy the tests from TestPostgreSqlConnectorTest#testPredicatePushdown and remove the ones not relevant for Druid. Keeping tests consistent is useful and allows extracting them to base test classes in the future.
For __time pushdown add a separate test method since that's something specific to Druid.
| @@ -35,35 +35,49 @@ public class TestTable | |||
|
|
|||
There was a problem hiding this comment.
It might make more sense to introduce a new TestTable within the trino-druid module. After all other modules shouldn't be affected by Druid's limitations.
Please extract to separate class.
There was a problem hiding this comment.
The consideration was there might be other databases that don't support creating tables and inserting data -- but you are right, this might be a premature abstraction. WIll modify.
plugin/trino-druid/src/test/java/io/trino/plugin/druid/ingestion/TestIndexTaskBuilder.java
Outdated
Show resolved
Hide resolved
plugin/trino-druid/src/test/java/io/trino/plugin/druid/TestingDruidServer.java
Show resolved
Hide resolved
plugin/trino-druid/src/test/java/io/trino/plugin/druid/DruidSqlDataTypeTest.java
Show resolved
Hide resolved
plugin/trino-base-jdbc/src/main/java/io/trino/plugin/jdbc/StandardColumnMappings.java
Show resolved
Hide resolved
plugin/trino-druid/src/main/java/io/trino/plugin/druid/DruidJdbcClient.java
Show resolved
Hide resolved
plugin/trino-druid/src/main/java/io/trino/plugin/druid/DruidJdbcClient.java
Outdated
Show resolved
Hide resolved
| import static java.util.stream.Collectors.joining; | ||
| import static org.assertj.core.api.Assertions.assertThat; | ||
|
|
||
| public final class DruidSqlDataTypeTest |
There was a problem hiding this comment.
What would it take to avoid copying SqlDataTypeTest?
There was a problem hiding this comment.
SqlDataTypeTest relies on TestTable, in whose constructor CREATE TABLE and INSERT DATA will be run -- this is not suitable for Druid.
#8404 (comment)
There was a problem hiding this comment.
There are 3 things in the game: SqlDataTypeTest, DataSetup and TestTable
SqlDataTypeTestdelegates all the setup toDataSetup, so only this needs to be specific for DruidTestTableis used inDataSetupinterfaceSqlDataTypeTestusesTestTablefor two things:- knowing table name
- removing table with
TestTable#close(cleanup)
We should address this by introducing a new interface:
- call it
TemporaryRelation - with methods
getRelationNameandclose()(extending fromAutoCloseable) - make
TestTableimplementTemporaryRelation - use it in
DataSetupinterface definition - all current
DataSetupimplementations will continue to useTestTable - you will write a new implementation for Druid.
Would that work?
cc @hashhar
|
https://github.com/trinodb/trino/pull/8474/files#r673634986 seems to suggest all Druid timestamps have millisecond precision (for storage) If Druid's storage is always millisecond precision, then our life is simpler. We can easily support millisecond precision, without any rounding, with full pushdown.
|
|
Seems like this has run aground, is there anything I can help with to get it moving again? The current state of the Druid connector makes a quite limited for real-life purposes and this seems like the first thing to tackle even before aggs, which sadly is also stranded. |
|
@lkm I will make time to finish this. |
|
@jerryleooo Any updates on this PR ? If you are okay, can I re-work on this PR and fix the issue ? |
|
Thanks @jerryleooo for working on this PR. I have applied your changes as a part of this PR - #13335. |
|
@Praveen2112 cool bro and sorry for the late reply |
Fixes #8404