Skip to content

query-level caching for hive tables and iceberg table statistics#8659

Closed
clemensvonschwerin wants to merge 3 commits intotrinodb:masterfrom
clemensvonschwerin:feature/iceberg-caching
Closed

query-level caching for hive tables and iceberg table statistics#8659
clemensvonschwerin wants to merge 3 commits intotrinodb:masterfrom
clemensvonschwerin:feature/iceberg-caching

Conversation

@clemensvonschwerin
Copy link
Contributor

@clemensvonschwerin clemensvonschwerin commented Jul 26, 2021

We experienced long planning times due to expensive calls to hive metastore and s3 (during BaseTableScans). Caching of statistics and hive tables on the query-level reduced planning times by around 50% for us.

fixes #8675

@cla-bot
Copy link

cla-bot bot commented Jul 26, 2021

Thank you for your pull request and welcome to our community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. In order for us to review and merge your code, please submit the signed CLA to cla@trino.io. For more information, see https://github.com/trinodb/cla.

1 similar comment
@cla-bot
Copy link

cla-bot bot commented Jul 26, 2021

Thank you for your pull request and welcome to our community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. In order for us to review and merge your code, please submit the signed CLA to cla@trino.io. For more information, see https://github.com/trinodb/cla.

@clemensvonschwerin clemensvonschwerin force-pushed the feature/iceberg-caching branch from f3fc182 to 9759b2d Compare July 26, 2021 12:48
@cla-bot cla-bot bot added the cla-signed label Jul 26, 2021
Copy link
Member

@findepi findepi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Last time we spoke, @electrum was very against query level caches, based on the hope they are not needed.

See the concern about second call to getTable here #8151 (comment)
and see test results for a test @homar just added here

assertMetastoreInvocations("SELECT * FROM test_select_from",
ImmutableMultiset.builder()
.addCopies(GET_TABLE, 12)

The next step should be to understand why we're doing those additional accesses to metastore.
Maybe we do not need them at all? Then, maybe we do not need the cache still?

cc @joshthoward

@electrum
Copy link
Member

I am specifically against a generic metastore cache, like we have in the Hive connector. We can cache table statistics if we need it, though I'd like to understand why it is called multiple times, and if we can fix it in the engine. Requiring every connector to cache a likely expensive operation seems like a suboptimal design.

public ConcurrentHashMap<SchemaTableName, Table> load(final String unused)
throws Exception
{
return new ConcurrentHashMap<>();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think i don't understand what this is doing.

Anyway, i guess this will go away once we move away from unnecessary Cache use.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cache builder needs a function as an argument which creates the cache entry if there is a miss for a certain key. Here we simply create a new ConcurrentHashMap if there is no cache for a certain query id, yet.

this.trinoVersion = requireNonNull(trinoVersion, "trinoVersion is null");
this.tableCache = CacheBuilder.newBuilder()
.maximumSize(100)
.expireAfterWrite(5, TimeUnit.MINUTES)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The IcebergMetadata object is per-query scoped, so you don't need time-based eviction here.
That's why tableMetadataCache uses ordinary map.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I did not know that. Then we do not even need the two levels query -> table but one level is enough.

.maximumSize(100)
.expireAfterWrite(5, TimeUnit.MINUTES)
.build(new TableCacheLoader());
this.tableStatisticsCache = CacheBuilder.newBuilder()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if adding stats cache is an improvement, i would expect this to be reflected by a test change.
Currently i see TestIcebergMetadataFileOperations passes without modification.

Let's move statistics to separate PR (or at least separate commit), so that we can focus on metastore interactions here.


Optional<Table> getHiveTable(ConnectorSession session, SchemaTableName schemaTableName)
{
var queryTableCache = tableCache.getUnchecked(session.getQueryId());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

private final Map<String, Optional<Long>> snapshotIds = new ConcurrentHashMap<>();
private final Map<SchemaTableName, TableMetadata> tableMetadataCache = new ConcurrentHashMap<>();

private final LoadingCache<String, ConcurrentHashMap<SchemaTableName, Table>> tableCache;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nested keying query id -> SchemaTableName -> Table is redundant.
Make SchemaTableName the key here, as in tableMetadataCache (and effectively in snapshotIds, although the code doesn't make it obvious)

@findepi
Copy link
Member

findepi commented May 23, 2022

I like the engine-side approach: #12196
Engine shouldn't ask a connector for same information multiple times.

However, the engine-side has some limitations. For example, when dereference pushdown happens, a new ConnectorTableHandle is injected into the plan. The engine doesn't know it's "almost the same" as before, so will ask a connector again. Thus, this change here may still be useful (or not).

cc @lxynov @alexjo2144

@colebow
Copy link
Member

colebow commented Oct 27, 2022

👋 @clemensvonschwerin - this PR is inactive and doesn't seem to be under development. If you'd like to continue work on this at any point in the future, feel free to re-open.

@colebow colebow closed this Oct 27, 2022
@bitsondatadev
Copy link
Member

I like the engine-side approach: #12196 Engine shouldn't ask a connector for same information multiple times.

This was completed here: #13047

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

Excessive metastore invocations when querying Iceberg table

5 participants