Enable dynamic filtering in JDBC connector#8137
Enable dynamic filtering in JDBC connector#8137jerryleooo wants to merge 2 commits intotrinodb:masterfrom
Conversation
74c2852 to
fb02adb
Compare
2fa8276 to
f98a4d1
Compare
plugin/trino-base-jdbc/src/main/java/io/trino/plugin/jdbc/DynamicFilteringJdbcConfig.java
Outdated
Show resolved
Hide resolved
plugin/trino-base-jdbc/src/main/java/io/trino/plugin/jdbc/DynamicFilteringJdbcConfig.java
Outdated
Show resolved
Hide resolved
...rino-base-jdbc/src/main/java/io/trino/plugin/jdbc/DynamicFilteringJdbcSessionProperties.java
Outdated
Show resolved
Hide resolved
plugin/trino-base-jdbc/src/main/java/io/trino/plugin/jdbc/JdbcRecordSetProvider.java
Outdated
Show resolved
Hide resolved
plugin/trino-base-jdbc/src/main/java/io/trino/plugin/jdbc/DynamicFilteringJdbcConfig.java
Outdated
Show resolved
Hide resolved
...rino-base-jdbc/src/main/java/io/trino/plugin/jdbc/DynamicFilteringJdbcSessionProperties.java
Outdated
Show resolved
Hide resolved
plugin/trino-base-jdbc/src/main/java/io/trino/plugin/jdbc/DynamicFilteringJdbcConfig.java
Outdated
Show resolved
Hide resolved
|
@ebyhr updated as commented, can see if need further review, thanks for the hard work! |
...rino-base-jdbc/src/main/java/io/trino/plugin/jdbc/DynamicFilteringJdbcSessionProperties.java
Show resolved
Hide resolved
plugin/trino-base-jdbc/src/main/java/io/trino/plugin/jdbc/DynamicFilteringJdbcConfig.java
Outdated
Show resolved
Hide resolved
...rino-base-jdbc/src/main/java/io/trino/plugin/jdbc/DynamicFilteringJdbcSessionProperties.java
Outdated
Show resolved
Hide resolved
...rino-base-jdbc/src/main/java/io/trino/plugin/jdbc/DynamicFilteringJdbcSessionProperties.java
Outdated
Show resolved
Hide resolved
|
@jerryleooo Thanks for this feature. We also want to use this feature for few of the massive join use case. Waiting for this ticket to be merged. |
|
Hi @haldes thanks for the interest! |
|
For the "massive join case" I created #8519. I kept this separate as this is a principle issue, not only related to joins. |
|
@jerryleooo Thanks for working on this feature. Would like to use this feature. Is there any chance that this feature would be merged to Trino master soon? Was able to see dynamic filtering kicking in after CP the commits. |
|
What needs to be done to merge this? Can we move this forward? |
|
Just started looking into this (see my slack comment on dev). Then found your PR (I started almost exactly the same way.) Let's finish one and get it in. |
|
I had a fix up a few things, but this works nicely. Should we finish this? |
1be1a71 to
cd1674c
Compare
|
Thanks, @lhofhansl, I just rebased the code to resolve the conflict, let's see how the organization thinks? |
| @Override | ||
| public RecordSet getRecordSet(ConnectorTransactionHandle transaction, ConnectorSession session, ConnectorSplit split, ConnectorTableHandle table, List<? extends ColumnHandle> columns, DynamicFilter dynamicFilter) | ||
| { | ||
| return getRecordSet(transaction, session, split, table, columns); |
There was a problem hiding this comment.
This should actually pass the DynamicFIlter along.
There was a problem hiding this comment.
Yes, should be something like
try (ThreadContextClassLoader ignored = new ThreadContextClassLoader(classLoader)) {
return new ClassLoaderSafeRecordSet(delegate.getRecordSet(transaction, session, split, table, columns, dynamicFilter), classLoader);
}
| configBinder(binder).bindConfig(JdbcMetadataConfig.class); | ||
| configBinder(binder).bindConfig(JdbcWriteConfig.class); | ||
|
|
||
| configBinder(binder).bindConfig(DynamicFilteringJdbcConfig.class); |
There was a problem hiding this comment.
I'm guessing this is needed because phoenix doesn't derive from JdbcPlugin.
Changes for Phoenix connector should go into separate commit and need to be explicitly covered by query runner tests.
|
|
||
| public class DynamicFilteringJdbcConfig | ||
| { | ||
| private boolean enableDynamicFiltering; |
There was a problem hiding this comment.
We can turn it on in a follow-up change after running benchmarks on it. We'll also want tests with actual connectors rather than just H2QueryRunner before we turn it on by default.
|
@raunaqmorarka FYI :) |
| public RecordSet getRecordSet(ConnectorTransactionHandle transaction, ConnectorSession session, ConnectorSplit split, ConnectorTableHandle table, List<? extends ColumnHandle> columns, DynamicFilter dynamicFilter) | ||
| { | ||
| if (isEnableDynamicFiltering(session)) { | ||
| long timeoutMillis = getDynamicFilteringWaitTimeout(session).toMillis(); |
There was a problem hiding this comment.
I find some weird behavior when trying this Postgres as a probe side. It almost seems random whether dynamic filtering is used or not.
I've set join_reordering_strategy='NONE' and also set dynamic_filtering_wait_timeout='10000ms'.
Executing the same join over and over I find some sometimes it uses dynamic filters and sometimes it does not.
In the debugger I see that a dynamic filter is passed here, but its tupledomain is "all".
I think this is outside of this code, but wondering if you have seen this too.
(The performance difference between the two cases is 250ms vs 38 seconds.)
All single node between two different JDBC datasources.
There was a problem hiding this comment.
Update: This happens only with PARTITIONED joins... As expected I guess since I am testing single node.
Is this waiting for something? Looks good to me.
|
Ping... Seems to be quite useful, especially when Trino is used as a general query platform between many big and small data sources. |
| @Override | ||
| public RecordSet getRecordSet(ConnectorTransactionHandle transaction, ConnectorSession session, ConnectorSplit split, ConnectorTableHandle table, List<? extends ColumnHandle> columns, DynamicFilter dynamicFilter) | ||
| { | ||
| return getRecordSet(transaction, session, split, table, columns); |
There was a problem hiding this comment.
Yes, should be something like
try (ThreadContextClassLoader ignored = new ThreadContextClassLoader(classLoader)) {
return new ClassLoaderSafeRecordSet(delegate.getRecordSet(transaction, session, split, table, columns, dynamicFilter), classLoader);
}
|
|
||
| public class DynamicFilteringJdbcConfig | ||
| { | ||
| private boolean enableDynamicFiltering; |
There was a problem hiding this comment.
We can turn it on in a follow-up change after running benchmarks on it. We'll also want tests with actual connectors rather than just H2QueryRunner before we turn it on by default.
| public class DynamicFilteringJdbcConfig | ||
| { | ||
| private boolean enableDynamicFiltering; | ||
| private Duration dynamicFilteringWaitTimeout = new Duration(0, TimeUnit.MINUTES); |
There was a problem hiding this comment.
Let's use a default of 20 seconds here, that will ensure that DF collection has sufficient time to complete without hanging too long. Also, if the build side gets too big, the DF collection will automatically bail out sooner.
| this.nextSyntheticColumnId = nextSyntheticColumnId; | ||
| } | ||
|
|
||
| public JdbcTableHandle withConstraint(TupleDomain<ColumnHandle> newConstraint) |
There was a problem hiding this comment.
withConstraint -> intersectedWithConstraint
| if (newDomain == constraint) { | ||
| return this; | ||
| } |
There was a problem hiding this comment.
This condition isn't really needed, we can keep it simple and remove it
| return getRecordSet(transaction, session, split, handle, columns); | ||
| } | ||
|
|
||
| try { |
There was a problem hiding this comment.
Please remove the try/catch we shouldn't be hiding any errors that result from DF pushdown
| } | ||
|
|
||
| @Test | ||
| public void testSemiJoinDynamicFilteringNone() |
There was a problem hiding this comment.
By default semi joins with the query shapes we're using here get rewritten to inner joins by the optimizer. We don't need explicit testing of semi join in these tests, so you can just remove semijoin tests.
| 1, 1, 1); | ||
| } | ||
|
|
||
| private void assertDynamicFiltering(@Language("SQL") String selectQuery, Session session, int expectedRowCount, int... expectedOperatorRowsRead) |
There was a problem hiding this comment.
We don't need to assert on specific values of expectedRowCount and expectedOperatorRowsRead here.
We should run the query with DF explicitly disabled, getPhysicalInputPositions from QueryStats and assert that the value is less than the same value from running the query with DF enabled.
We should also use assertEqualsIgnoreOrder to ensure that results are the same with DF enabled/disabled.
| } | ||
|
|
||
| try { | ||
| dynamicFilter.isBlocked().get(timeoutMillis, MILLISECONDS); |
There was a problem hiding this comment.
This will block an execution thread in the engine and prevent other splits from running, so we need to avoid this somehow.
You can apply dynamic filtering in JdbcSplitManager instead and use ConnectorSplitSource#getNextBatch to provide splits with DF using a future there. Changes to ConnectorRecordSetProvider SPI won't be necessary then.
| configBinder(binder).bindConfig(JdbcMetadataConfig.class); | ||
| configBinder(binder).bindConfig(JdbcWriteConfig.class); | ||
|
|
||
| configBinder(binder).bindConfig(DynamicFilteringJdbcConfig.class); |
There was a problem hiding this comment.
I'm guessing this is needed because phoenix doesn't derive from JdbcPlugin.
Changes for Phoenix connector should go into separate commit and need to be explicitly covered by query runner tests.
|
Thank you for your work on this. I'm closing this in favour of #13334 which will implement this functionality with a more holistic approach and more complete testing. |
#7968
depends on #8046