Limit session state during metadata queries in Iceberg#19757
Conversation
Let it differentiate between "a"."b.c" and "a.b"."c" tables.
a1ab9d5 to
2c3e111
Compare
Metadata queries such as `information_schema.columns`, `system.jdbc.columns` or `system.metadata.table_comments` may end up loading arbitrary number of relations within single query (transaction). It is important to bound memory usage for such queries. In case of Iceberg Hive metastore based catalog, this is already done in `TrinoHiveCatalogFactory` bu means of configuring per-query `CachingHiveMetastore`. However, catalogs with explicit caching need something similar.
2c3e111 to
ad51682
Compare
| } | ||
| String tableLocation = metadataLocation.replaceFirst("/metadata/[^/]*$", ""); | ||
| deleteTableDirectory(fileSystemFactory.create(session), schemaTableName, tableLocation); | ||
| invalidateTableCache(schemaTableName); |
There was a problem hiding this comment.
Can this be moved to dropTableFromMetastore or even to deleteTable to simplify code and prevent from omitting in case new functions in future?
There was a problem hiding this comment.
i thought about it. technically it would work, but i considered dropTableFromMetastore being just a technical operation, which may or may not be invoked, or be the last operation as part of the drop flow
| public void unregisterTable(ConnectorSession session, SchemaTableName schemaTableName) | ||
| { | ||
| dropTableFromMetastore(schemaTableName); | ||
| invalidateTableCache(schemaTableName); |
There was a problem hiding this comment.
Can this and folowing be moved to dropTableFromMetastore ?
|
No release note entry @findepi ? |
|
My understanding is that the caching here is important for ensuring that queries use the same snapshot in different phases of query execution. It's unlikely that a regular select would fill this cache but it'd be nice if we could have them be unbounded when necessary. |
|
I see your point and agree. We probably should also update the JDBC connector. |
Metadata queries such as
information_schema.columns,system.jdbc.columnsorsystem.metadata.table_commentsmay end uploading arbitrary number of relations within single query (transaction).
It is important to bound memory usage for such queries.
In case of Iceberg Hive metastore based catalog, this is already done in
TrinoHiveCatalogFactorybu means of configuring per-queryCachingHiveMetastore. However, catalogs with explicit caching needsomething similar.