query-level caching for hive tables and iceberg table statistics#8659
query-level caching for hive tables and iceberg table statistics#8659clemensvonschwerin wants to merge 3 commits intotrinodb:masterfrom
Conversation
|
Thank you for your pull request and welcome to our community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. In order for us to review and merge your code, please submit the signed CLA to cla@trino.io. For more information, see https://github.com/trinodb/cla. |
1 similar comment
|
Thank you for your pull request and welcome to our community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. In order for us to review and merge your code, please submit the signed CLA to cla@trino.io. For more information, see https://github.com/trinodb/cla. |
…to improve planning time
f3fc182 to
9759b2d
Compare
findepi
left a comment
There was a problem hiding this comment.
Last time we spoke, @electrum was very against query level caches, based on the hope they are not needed.
See the concern about second call to getTable here #8151 (comment)
and see test results for a test @homar just added here
The next step should be to understand why we're doing those additional accesses to metastore.
Maybe we do not need them at all? Then, maybe we do not need the cache still?
cc @joshthoward
|
I am specifically against a generic metastore cache, like we have in the Hive connector. We can cache table statistics if we need it, though I'd like to understand why it is called multiple times, and if we can fix it in the engine. Requiring every connector to cache a likely expensive operation seems like a suboptimal design. |
| public ConcurrentHashMap<SchemaTableName, Table> load(final String unused) | ||
| throws Exception | ||
| { | ||
| return new ConcurrentHashMap<>(); |
There was a problem hiding this comment.
I think i don't understand what this is doing.
Anyway, i guess this will go away once we move away from unnecessary Cache use.
There was a problem hiding this comment.
The cache builder needs a function as an argument which creates the cache entry if there is a miss for a certain key. Here we simply create a new ConcurrentHashMap if there is no cache for a certain query id, yet.
| this.trinoVersion = requireNonNull(trinoVersion, "trinoVersion is null"); | ||
| this.tableCache = CacheBuilder.newBuilder() | ||
| .maximumSize(100) | ||
| .expireAfterWrite(5, TimeUnit.MINUTES) |
There was a problem hiding this comment.
The IcebergMetadata object is per-query scoped, so you don't need time-based eviction here.
That's why tableMetadataCache uses ordinary map.
There was a problem hiding this comment.
Ah, I did not know that. Then we do not even need the two levels query -> table but one level is enough.
| .maximumSize(100) | ||
| .expireAfterWrite(5, TimeUnit.MINUTES) | ||
| .build(new TableCacheLoader()); | ||
| this.tableStatisticsCache = CacheBuilder.newBuilder() |
There was a problem hiding this comment.
if adding stats cache is an improvement, i would expect this to be reflected by a test change.
Currently i see TestIcebergMetadataFileOperations passes without modification.
Let's move statistics to separate PR (or at least separate commit), so that we can focus on metastore interactions here.
|
|
||
| Optional<Table> getHiveTable(ConnectorSession session, SchemaTableName schemaTableName) | ||
| { | ||
| var queryTableCache = tableCache.getUnchecked(session.getQueryId()); |
There was a problem hiding this comment.
| private final Map<String, Optional<Long>> snapshotIds = new ConcurrentHashMap<>(); | ||
| private final Map<SchemaTableName, TableMetadata> tableMetadataCache = new ConcurrentHashMap<>(); | ||
|
|
||
| private final LoadingCache<String, ConcurrentHashMap<SchemaTableName, Table>> tableCache; |
There was a problem hiding this comment.
Nested keying query id -> SchemaTableName -> Table is redundant.
Make SchemaTableName the key here, as in tableMetadataCache (and effectively in snapshotIds, although the code doesn't make it obvious)
|
I like the engine-side approach: #12196 However, the engine-side has some limitations. For example, when dereference pushdown happens, a new |
|
👋 @clemensvonschwerin - this PR is inactive and doesn't seem to be under development. If you'd like to continue work on this at any point in the future, feel free to re-open. |
We experienced long planning times due to expensive calls to hive metastore and s3 (during BaseTableScans). Caching of statistics and hive tables on the query-level reduced planning times by around 50% for us.
fixes #8675