Support query pass-through for JDBC-based connectors#12325
Support query pass-through for JDBC-based connectors#12325findepi merged 6 commits intotrinodb:masterfrom
Conversation
| @Override | ||
| public JdbcTableHandle getTableHandle(ConnectorSession session, PreparedQuery preparedQuery) | ||
| { | ||
| return delegate.getTableHandle(session, preparedQuery); // TODO add caching? |
There was a problem hiding this comment.
I think we should. @kokosing thoughts?
(note: PreparedQuery defines value-based equals, so ok from this perspective)
There was a problem hiding this comment.
Added caching + test in TestCachingJdbcClient.
| } | ||
| } | ||
| catch (SQLException e) { | ||
| throw new TrinoException(JDBC_ERROR, e); |
There was a problem hiding this comment.
add some message about what operation failed
| Optional.empty(), | ||
| OptionalLong.empty(), | ||
| Optional.of(columns.build()), | ||
| ImmutableSet.of(), // TODO return other tables referenced by the query |
There was a problem hiding this comment.
We won't be able to do this, since query is opaque.
Instead of TODO add a // Note that we're not doing that.
AFAIR, this field is used for cache eviction. I think we should make io.trino.plugin.jdbc.JdbcTableHandle#otherReferencedTables and Optional field (so that we can mark "unknown" state) so cache eviction remains correct
There was a problem hiding this comment.
I filed an issue #12526, and added a link in the code along with the note.
There was a problem hiding this comment.
Thanks, fine to do it as a follow-up.
| columns -> groupKey -> verify(columns.contains(groupKey), | ||
| "applyAggregation called with a grouping column %s which was not included in the table columns: %s", | ||
| groupKey, | ||
| tableColumns)) |
There was a problem hiding this comment.
unrelated change, separate commit
There was a problem hiding this comment.
Reverted. The formatter changed its mind.
| @Override | ||
| public JdbcTableHandle getTableHandle(ConnectorSession session, PreparedQuery preparedQuery) | ||
| { | ||
| return delegate().getTableHandle(session, preparedQuery); // TODO add stats? |
| public class RemoteQuery | ||
| implements Provider<ConnectorTableFunction> | ||
| { | ||
| public static final String NAME = "remote_query"; |
There was a problem hiding this comment.
i think native_query would be a better name
There was a problem hiding this comment.
I think remote_query is better.
There was a problem hiding this comment.
i think i am not convinced. every query is "remote" in some case.
"native" is sometimes used to mean "uninterpreted" by current layer, like java.sql.Connection#nativeSQL or "native query" in Hibernate, so users can be familiar with the term
There was a problem hiding this comment.
i like this name. Let's roll with it.
There was a problem hiding this comment.
I renamed the function to "query", and the related classes to Query*. Left "NativeQuery" in test method names.
| // TODO wrap in ClassLoaderSafeConnectorTableFunction? (see also TestClassLoaderSafeWrappers) | ||
| return new RemoteQueryFunction(transactionManager); |
There was a problem hiding this comment.
apply the TODO; i think base-jdbc -based connectors use ClassLoaderSafe* wrappers
There was a problem hiding this comment.
Will do after ConnectorTableFunction is refactored into an interface (#12531).
| public class RemoteQuery | ||
| implements Provider<ConnectorTableFunction> | ||
| { | ||
| public static final String NAME = "remote_query"; |
There was a problem hiding this comment.
I think remote_query is better.
| @Override | ||
| public JdbcTableHandle getTableHandle(ConnectorSession session, PreparedQuery preparedQuery) | ||
| { | ||
| return delegate.getTableHandle(session, preparedQuery); // TODO add caching? |
| { | ||
| binder.bind(JdbcClient.class).annotatedWith(ForBaseJdbc.class).to(MariaDbClient.class).in(Scopes.SINGLETON); | ||
| binder.install(new DecimalModule()); | ||
| newSetBinder(binder, ConnectorTableFunction.class).addBinding().toProvider(RemoteQuery.class).in(Scopes.SINGLETON); |
There was a problem hiding this comment.
Why not to do it in trino-base-jdbc?
There was a problem hiding this comment.
it's not applicable to some connectors (clickhouse)
wendigo
left a comment
There was a problem hiding this comment.
Feel free to ignore my comments if they don't make sense :)
| @Override | ||
| public JdbcTableHandle getTableHandle(ConnectorSession session, PreparedQuery preparedQuery) | ||
| { | ||
| ImmutableList.Builder<JdbcColumnHandle> columns = ImmutableList.builder(); |
There was a problem hiding this comment.
can we move columns builder inside of the try and use builderWithExpectedSize(metaData.getColumnCount()) ?
There was a problem hiding this comment.
That would require moving the return statement inside the try block.
| ImmutableList.Builder<JdbcColumnHandle> columns = ImmutableList.builder(); | ||
| try (Connection connection = connectionFactory.openConnection(session); | ||
| PreparedStatement preparedStatement = queryBuilder.prepareStatement(this, session, connection, preparedQuery)) { | ||
| ResultSetMetaData metaData = preparedStatement.getMetaData(); |
| try (Connection connection = connectionFactory.openConnection(session); | ||
| PreparedStatement preparedStatement = queryBuilder.prepareStatement(this, session, connection, preparedQuery)) { | ||
| ResultSetMetaData metaData = preparedStatement.getMetaData(); | ||
| if (metaData == null) { |
There was a problem hiding this comment.
requireNonNull(preparedStatement.getMetaData(), "ResultSetMetaData not provided for query")
There was a problem hiding this comment.
I prefer UnsupportedOperationException and a more verbose message.
| Optional.empty(), | ||
| OptionalLong.empty(), | ||
| Optional.of(columns.build()), | ||
| ImmutableSet.of(), // TODO return other tables referenced by the query |
plugin/trino-base-jdbc/src/main/java/io/trino/plugin/jdbc/JdbcClient.java
Show resolved
Hide resolved
plugin/trino-base-jdbc/src/main/java/io/trino/plugin/jdbc/JdbcMetadata.java
Show resolved
Hide resolved
| public void configure(Binder binder) | ||
| { | ||
| binder.bind(JdbcClient.class).annotatedWith(ForBaseJdbc.class).to(DruidJdbcClient.class).in(Scopes.SINGLETON); | ||
| newSetBinder(binder, ConnectorTableFunction.class).addBinding().toProvider(RemoteQuery.class).in(Scopes.SINGLETON); |
There was a problem hiding this comment.
I guess that we should expose this PTF only on some config toggle (disabled by default)?
There was a problem hiding this comment.
The function is always registered, but it is disabled by default by access control.
There was a problem hiding this comment.
Makes sense. Thanks for clarification
plugin/trino-base-jdbc/src/main/java/io/trino/plugin/jdbc/ptf/RemoteQuery.java
Show resolved
Hide resolved
|
What's the specific aim for this? I see multiple variation:
For (2) and (3) I'd suggest we generally improve what/how Trino can push down - as has happened in the past few months. Not trying to question this (wouldn't be my place anyway), just curious. |
|
This can be viewed as an escape hatch for features we don't have pushdown support yet, including features and syntaxes that exists in a remote database but do not exist in Trino.
Agreed. This is not the goal for this PR, but it remains a goal for the project. |
Big fan. We have a very fast internal analytical database that we sometimes access via a custom connector based on base-jdbc. Pushing down vs. not is often a 100x performance difference, so have having full control over this is great! |
|
@lhofhansl same here! |
d320be2 to
94ab03f
Compare
There was a problem hiding this comment.
Declare the param in uppercase, so that it can be specified without quotes (...( query => '....')) upon the invocation.
11d9f57 to
af5417e
Compare
|
Tested this with our internal database connector, and it works great. The use of Is there a way to do this after the statement is executed so that we already have a ResultSet? |
af5417e to
fabdb91
Compare
The metadata needs to be available on the coordinator during query planning, while the final execution should happen on the worker. We could have a mode where we simply execute the query and leverage
@lhofhansl Do you know for which databases it's fast and for which it isn't? |
9e73886 to
4745c00
Compare
aad6690 to
3b9edf0
Compare
I do not, sorry. :( As an example for the internal DB I'm "playing" with, I'm wrapping the query with Not a big deal. I think JDBC is "funny" here. |
6c1a665 to
b110e47
Compare
There was a problem hiding this comment.
i don't think preparedQuery.getQuery() should go into exception message.
it can be super-long.
you can append firstNonNull(e.getMessage(), e) to the message
There was a problem hiding this comment.
OK. In this case, I'll limit the expected message in tests to the common part: Failed to get table handle for prepared query to avoid overriding.
bf7a6ef to
533dc83
Compare
plugin/trino-base-jdbc/src/main/java/io/trino/plugin/jdbc/BaseJdbcClient.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Yes. Majority of overrides include the schema name. If I didn't add it here, I'd have to override the method that uses this test table.
There was a problem hiding this comment.
shouldn't this be getsession().getschema().orthrow()?
There was a problem hiding this comment.
Other implementations also put tpch explicitly.
plugin/trino-mysql/src/test/java/io/trino/plugin/mysql/BaseMySqlConnectorTest.java
Outdated
Show resolved
Hide resolved
plugin/trino-mysql/src/test/java/io/trino/plugin/mysql/BaseMySqlConnectorTest.java
Outdated
Show resolved
Hide resolved
plugin/trino-mariadb/src/test/java/io/trino/plugin/mariadb/BaseMariaDbConnectorTest.java
Outdated
Show resolved
Hide resolved
plugin/trino-mariadb/src/test/java/io/trino/plugin/mariadb/BaseMariaDbConnectorTest.java
Outdated
Show resolved
Hide resolved
plugin/trino-base-jdbc/src/main/java/io/trino/plugin/jdbc/CachingJdbcClient.java
Outdated
Show resolved
Hide resolved
plugin/trino-base-jdbc/src/main/java/io/trino/plugin/jdbc/BaseJdbcClient.java
Outdated
Show resolved
Hide resolved
plugin/trino-base-jdbc/src/main/java/io/trino/plugin/jdbc/BaseJdbcClient.java
Outdated
Show resolved
Hide resolved
plugin/trino-base-jdbc/src/main/java/io/trino/plugin/jdbc/CachingJdbcClient.java
Outdated
Show resolved
Hide resolved
533dc83 to
5d8e141
Compare
|
Is this merge-ready? |
Make the names more specific to make space for another kind of TableHandle caching.
in StatisticsAwareJdbcClient
5d8e141 to
5cf2960
Compare
|
Do we need docs for this? edit: Manfred informed me we're already on this |
| // TODO https://github.com/trinodb/trino/issues/12526: invalidate tableHandlesByNameCache for handles derived from opaque queries | ||
| invalidateColumnsCache(schemaTableName); | ||
| invalidateCache(tableHandlesByNameCache, key -> key.tableName.equals(schemaTableName)); | ||
| tableHandlesByQueryCache.invalidateAll(); |
There was a problem hiding this comment.
This looks too coarse. For every table invalidation (with any cause) all caches for handles by query are invalidated.
Apart from that I've noticed that the only usage of
BaseJdbcClient.getTableHandle(ConnectorSession, PreparedQuery) (io.trino.plugin.jdbc)
is for
visitTableFunctionInvocation(TableFunctionInvocation, Optional<Scope>) in Visitor in StatementAnalyzer (io.trino.sql.analyzer)
So this test (evolved since initial PR) is not really precise as it tests real tables (but should not), but not the virtual ones.
dropTable(phantomTable); // not via CachingJdbcClient
assertThatThrownBy(() -> jdbcClient.getTableHandle(SESSION, query))
.hasMessageContaining("Failed to get table handle for prepared query");
assertCacheStats(cachingJdbcClient, TABLE_HANDLES_BY_QUERY_CACHE)
.hits(1)
.afterRunning(() -> {
assertThat(cachingJdbcClient.getTableHandle(SESSION, query))
.isEqualTo(cachedTable);
assertThat(cachingJdbcClient.getColumns(SESSION, cachedTable))
.hasSize(0); // phantom_table has no columns
});
// ...
cachingJdbcClient.createTable(SESSION, new ConnectorTableMetadata(phantomTable, emptyList()));
assertCacheStats(cachingJdbcClient, TABLE_HANDLES_BY_QUERY_CACHE)
.misses(1)
.loads(1) // <<<------------------ this has to be 0, and not tested on real table
.afterRunning(() -> {
assertThat(cachingJdbcClient.getTableHandle(SESSION, query))
.isEqualTo(cachedTable);
assertThat(cachingJdbcClient.getColumns(SESSION, cachedTable))
.hasSize(0); // phantom_table has no columns
});
WDYT?
There was a problem hiding this comment.
This looks too coarse. For every table invalidation (with any cause) all caches for handles by query are invalidated.
Can we do better?
So this test (evolved since initial PR) is not really precise as it tests real tables (but should not), but not the virtual ones.
i am not sure i follow. Do you want to make a PR with a fix?
There was a problem hiding this comment.
I'm trying figuring out what's going on and this is my findings. Actually my thoughts.
Looking for any comments which would allow me to shape that knowledge.
Updated Trino version to 405. Added support for query pass-through from Trino to the connected DB2 instance. See trinodb/trino#12325 for more details about the implementation of the ``query`` Polymorphic Table Function for JDBC-based connectors in Trino. The functionality is useful when dealing with very large tables on DB2 as Trino does not need to pull entire tables across. It also allows the use of DB2-native functions that are not supported in Trino.
…ation Problem: - Table functions fail to register when S3 config is present - Root cause: non-standard connector initialization (direct instantiation vs Guice) - No configuration validation framework Solution - Phase 1: Guice Integration: - Add McapModule with Guice bindings for table functions - Update McapConnectorFactory to use Bootstrap/Injector pattern - Update McapConnector to receive table functions via constructor injection - Add @Inject annotation to McapMetadataFunction constructor - Create McapConfig class (placeholder for future MCAP-specific config) - Bind FileAccessor instance to injector for dependency injection Changes: - McapConnectorFactory: Uses Bootstrap with JsonModule and McapModule - McapConnector: Now accepts Set<ConnectorTableFunction> in constructor - McapModule: Binds table functions using Multibinder pattern - McapMetadataFunction: Uses @Inject for Guice instantiation - Tests: Updated to pass table functions to connector constructor Benefits: - Configuration validation happens during injector initialization - Table functions properly bound via Guice Multibinder - Follows standard Trino connector pattern (Hive, Iceberg, Delta Lake) - Clear error messages for configuration issues - Foundation for FileSystemModule integration (phase 2) Test Results: All unit tests pass (16/16) - TestMcapPlugin: ✅ Connector creation with Guice - TestMcapConnector: ✅ Table function registration - TestMcapMetadataFunction: ✅ Function analysis and schema Next Phase: FileSystemModule integration and TrinoFileSystem migration Refs: openspec/changes/fix-s3-table-function-registration See: trinodb/trino#12325 (JDBC table function example)

This PR introduces
queryPolymorphic Table Function for JDBC-based connectors:not yet implemented: