Skip to content

Pass full session to avoid Unknown connector errors#22936

Merged
tdcmeehan merged 1 commit intoprestodb:masterfrom
denodo-research-labs:unknown_connector
Jul 2, 2024
Merged

Pass full session to avoid Unknown connector errors#22936
tdcmeehan merged 1 commit intoprestodb:masterfrom
denodo-research-labs:unknown_connector

Conversation

@denodo-research-labs
Copy link
Contributor

Description

The session passed to the system connector did not retain any connector properties, so cache.enabled property was not found and tasks were retried, resulting in slowness.

Motivation and Context

Presto issue #19809 .
Same issue with the Delta connector. For future reference here is the trace using the Delta connector:

    com.facebook.presto.spi.PrestoException: Unknown connector delta
    at com.facebook.presto.metadata.SessionPropertyManager.getConnectorSessionPropertyMetadata(SessionPropertyManager.java:121)
    at com.facebook.presto.metadata.SessionPropertyManager.decodeCatalogPropertyValue(SessionPropertyManager.java:182)
    at com.facebook.presto.FullConnectorSession.getProperty(FullConnectorSession.java:160)
    at com.facebook.presto.hive.HiveSessionProperties.isCacheEnabled(HiveSessionProperties.java:1024)
    at java.util.Optional.map(Optional.java:215)
    at com.facebook.presto.hive.cache.HiveCachingHdfsConfiguration.lambda$getConfiguration$0(HiveCachingHdfsConfiguration.java:80)
    at com.facebook.presto.hive.cache.HiveCachingHdfsConfiguration$CachingJobConf.createFileSystem(HiveCachingHdfsConfiguration.java:105)
    at org.apache.hadoop.fs.PrestoFileSystemCache.get(PrestoFileSystemCache.java:59)
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:373)
    at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
    at com.facebook.presto.hive.HdfsEnvironment.lambda$getFileSystem$0(HdfsEnvironment.java:71)
    at com.facebook.presto.hive.authentication.NoHdfsAuthentication.doAs(NoHdfsAuthentication.java:23)
    at com.facebook.presto.hive.HdfsEnvironment.getFileSystem(HdfsEnvironment.java:70)
    at com.facebook.presto.hive.HdfsEnvironment.getFileSystem(HdfsEnvironment.java:64)
    at com.facebook.presto.delta.DeltaClient.loadDeltaTableLog(DeltaClient.java:147)
    at com.facebook.presto.delta.DeltaClient.getTable(DeltaClient.java:79)
    at com.facebook.presto.delta.DeltaMetadata.getTableHandle(DeltaMetadata.java:221)
    at com.facebook.presto.delta.DeltaMetadata.getTableMetadata(DeltaMetadata.java:328)
    at com.facebook.presto.delta.DeltaMetadata.listTableColumns(DeltaMetadata.java:317)
    at com.facebook.presto.spi.connector.classloader.ClassLoaderSafeConnectorMetadata.listTableColumns(ClassLoaderSafeConnectorMetadata.java:330)
    at com.facebook.presto.metadata.MetadataManager.listTableColumns(MetadataManager.java:562)
    at com.facebook.presto.metadata.MetadataListing.listTableColumns(MetadataListing.java:95)
    at com.facebook.presto.connector.system.jdbc.ColumnJdbcTable.cursor(ColumnJdbcTable.java:126)

Impact

JDBC driver queries against Iceberg and Delta Lake tables are very slow the first time.

Contributor checklist

  • Please make sure your submission complies with our development, formatting, commit message, and attribution guidelines.
  • PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • Documented new properties (with its default value), SQL syntax, functions, or other functionality.
  • If release notes are required, they follow the release notes guidelines.
  • Adequate tests were added if applicable.
  • CI passed.

Release Notes

Please follow release notes guidelines and fill in the release notes below.

== RELEASE NOTES ==

General Changes
* Pass full session to avoid Unknown connector errors :pr:`22936`

@tdcmeehan
Copy link
Contributor

I was thinking about why this is, and my guess is it's because session properties from the original session could in theory alter the metadata listing itself. You wouldn't want any session properties to be applied, because what is returned by the system connector should be consistent across sessions.

I think this problem of needing to check a session property value while inside the system connector is general, but also we should consider the above issue as well. So I'm thinking we should relax the validation.

Basically, wherever we use getConnectorSessionPropertyMetadata, we already validate that the catalog already exists. I'm wondering if we should just return Optional.empty() in this method if the catalog doesn't exist?

@denodo-research-labs
Copy link
Contributor Author

When returning Optional.empty in getConnectorSessionPropertyMetadata, decodeCatalogPropertyValue will fail.
Even if we change the code to avoid throwing an exception, HiveSessionProperties.isCacheEnabled will not reflect the Iceberg connector configuration if we don't pass the FullConnectorSession to the system connector, and the same happens for other session properties required by the Iceberg connector, resulting in inconsistent behavior.

@denodo-research-labs
Copy link
Contributor Author

Any update on this?

@tdcmeehan tdcmeehan self-assigned this Jun 13, 2024
@tdcmeehan
Copy link
Contributor

I think this should be fine.

@tdcmeehan
Copy link
Contributor

Is there any way to provide a test which would fail without this fix?

@denodo-research-labs
Copy link
Contributor Author

denodo-research-labs commented Jun 14, 2024

The query causing the slowness is:

SELECT TABLE_CAT, ... FROM system.jdbc.columns
WHERE TABLE_CAT = 'iceberg' AND TABLE_SCHEM LIKE 'x' ESCAPE '\' AND TABLE_NAME LIKE 'y' ESCAPE '\' AND COLUMN_NAME LIKE '%' ESCAPE '\'
ORDER BY TABLE_CAT, TABLE_SCHEM, TABLE_NAME, ORDINAL_POSITION

The problem to test it is that it does not fail, it is just slow.

Do you have any suggestion on how we can implement it?

@denodo-research-labs
Copy link
Contributor Author

Any update on this?

@tdcmeehan
Copy link
Contributor

Unfortunately I don't have any quick suggestions. This should be fine to merge.

@tdcmeehan tdcmeehan merged commit 27ece7e into prestodb:master Jul 2, 2024
@tdcmeehan tdcmeehan mentioned this pull request Aug 23, 2024
34 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Presto with Iceberg connector and JDBC client is very slowly at first time query

2 participants