Skip to content

Conversation

@hashhar
Copy link
Member

@hashhar hashhar commented Jun 3, 2024

Description

Before this change, when listing table columns, JDBC connectors would first list tables and then list columns of a table. Thus, when serving Trino's information_schema.columns or system.jdbc.columns, we would make O(#tables) calls to the remote database.

With this change, we utilize remote database's bulk column listing facilities to satisfy Trino's bulk column listing requests. This can be viewed as "information_schema.columns pass-through", although this works for both Trino's information_schema.columns and Trino's system.jdbc.columns
(io.trino.jdbc.TrinoDatabaseMetaData.getColumns), and does not use remote database's information_schema.columns directly. Instead, the commit leverages the fact that DatabaseMetaData.getColumns typically used to get columns of a table can be used without a table filter, and then it gets all columns from all tables.

The bulk retrieval is supported for selected JDBC connectors, and by default is not supported (requires JdbcClient changes).

Release notes

( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
(x) Release notes are required, with the following suggested text:

# MariaDB, MySQL, SingleStore, Redshift
* Improve performance of listing table columns. ({issue}`issuenumber`)

Before this change, when listing table columns, JDBC connectors would
first list tables and then list columns of a table. Thus, when serving
Trino's `information_schema.columns` or `system.jdbc.columns`, we would
make O(#tables) calls to the remote database.

With this change, we utilize remote database's bulk column listing
facilities to satisfy Trino's bulk column listing requests. This can be
viewed as "`information_schema.columns` pass-through", although this
works for both Trino's `information_schema.columns` and Trino's
`system.jdbc.columns`
(`io.trino.jdbc.TrinoDatabaseMetaData.getColumns`), and does not use
remote database's `information_schema.columns` directly. Instead, the
commit leverages the fact that `DatabaseMetaData.getColumns` typically
used to get columns of a table can be used without a table filter, and
then it gets all columns from all tables.

The bulk retrieval is supported for selected JDBC connectors, and by
default is not supported (requires `JdbcClient` changes).

Co-authored-by: Ashhar Hasan <[email protected]>
@hashhar hashhar requested review from ebyhr and findepi June 3, 2024 09:33
@cla-bot cla-bot bot added the cla-signed label Jun 3, 2024
@hashhar hashhar merged commit 1ac1ee1 into trinodb:master Jun 4, 2024
@hashhar hashhar deleted the hashhar/bulk-fetch-all-columns branch June 4, 2024 08:59
@github-actions github-actions bot added this to the 450 milestone Jun 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

3 participants