query-level caching for hive tables and iceberg table statistics#8659

Closed

clemensvonschwerin wants to merge 3 commits intotrinodb:masterfrom

clemensvonschwerin:feature/iceberg-caching

Contributor

clemensvonschwerin commented Jul 26, 2021 •

edited by findepi

Loading

We experienced long planning times due to expensive calls to hive metastore and s3 (during BaseTableScans). Caching of statistics and hive tables on the query-level reduced planning times by around 50% for us.

fixes #8675

cla-bot bot commented Jul 26, 2021

Thank you for your pull request and welcome to our community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. In order for us to review and merge your code, please submit the signed CLA to cla@trino.io. For more information, see https://github.com/trinodb/cla.

1 similar comment

cla-bot bot commented Jul 26, 2021

Thank you for your pull request and welcome to our community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. In order for us to review and merge your code, please submit the signed CLA to cla@trino.io. For more information, see https://github.com/trinodb/cla.


          add query-level caching for hive tables and iceberg table statistics …

9759b2d

…to improve planning time

clemensvonschwerin force-pushed the feature/iceberg-caching branch from f3fc182 to 9759b2d Compare

July 26, 2021 12:48

cla-bot bot added the cla-signed label

clemensvonschwerin added 2 commits

July 26, 2021 15:22


          add static modifiers to inner classes

82461a5


          remove trailing whitespace

024e0ca

findepi requested changes

View reviewed changes

Member

findepi left a comment

Last time we spoke, @electrum was very against query level caches, based on the hope they are not needed.

See the concern about second call to getTable here #8151 (comment)
and see test results for a test @homar just added here

trino/plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/TestIcebergMetastoreAccessOperations.java

Lines 106 to 108 in 30f2ecc

    
           assertMetastoreInvocations("SELECT * FROM test_select_from", 
        
                   ImmutableMultiset.builder() 
        
                           .addCopies(GET_TABLE, 12)

The next step should be to understand why we're doing those additional accesses to metastore.
Maybe we do not need them at all? Then, maybe we do not need the cache still?

cc @joshthoward

Member

electrum commented Jul 26, 2021

I am specifically against a generic metastore cache, like we have in the Hive connector. We can cache table statistics if we need it, though I'd like to understand why it is called multiple times, and if we can fix it in the engine. Requiring every connector to cache a likely expensive operation seems like a suboptimal design.

findepi mentioned this pull request

Excessive metastore invocations when querying Iceberg table #8675

Closed

findepi reviewed

View reviewed changes

plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadata.java

+                      public ConcurrentHashMap<SchemaTableName, Table> load(final String unused)
+                              throws Exception
+                      {
+                          return new ConcurrentHashMap<>();

Member

findepi Jul 27, 2021

I think i don't understand what this is doing.

Anyway, i guess this will go away once we move away from unnecessary Cache use.

Contributor Author

clemensvonschwerin Jul 27, 2021

The cache builder needs a function as an argument which creates the cache entry if there is a miss for a certain key. Here we simply create a new ConcurrentHashMap if there is no cache for a certain query id, yet.

plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadata.java

                       this.trinoVersion = requireNonNull(trinoVersion, "trinoVersion is null");
+                      this.tableCache = CacheBuilder.newBuilder()
+                              .maximumSize(100)
+                              .expireAfterWrite(5, TimeUnit.MINUTES)

Member

findepi Jul 27, 2021

The IcebergMetadata object is per-query scoped, so you don't need time-based eviction here.
That's why tableMetadataCache uses ordinary map.

Contributor Author

clemensvonschwerin Jul 27, 2021

Ah, I did not know that. Then we do not even need the two levels query -> table but one level is enough.

plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadata.java

+                              .maximumSize(100)
+                              .expireAfterWrite(5, TimeUnit.MINUTES)
+                              .build(new TableCacheLoader());
+                      this.tableStatisticsCache = CacheBuilder.newBuilder()

Member

findepi Jul 27, 2021

if adding stats cache is an improvement, i would expect this to be reflected by a test change.
Currently i see TestIcebergMetadataFileOperations passes without modification.

Let's move statistics to separate PR (or at least separate commit), so that we can focus on metastore interactions here.

plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadata.java

+                  Optional<Table> getHiveTable(ConnectorSession session, SchemaTableName schemaTableName)
+                  {
+                      var queryTableCache = tableCache.getUnchecked(session.getQueryId());

Member

findepi Jul 27, 2021

see https://github.com/trinodb/trino/blob/master/DEVELOPMENT.md#avoid-var

plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadata.java

                   private final Map<String, Optional<Long>> snapshotIds = new ConcurrentHashMap<>();
                   private final Map<SchemaTableName, TableMetadata> tableMetadataCache = new ConcurrentHashMap<>();
+                  private final LoadingCache<String, ConcurrentHashMap<SchemaTableName, Table>> tableCache;

Member

findepi Jul 27, 2021

Nested keying query id -> SchemaTableName -> Table is redundant.
Make SchemaTableName the key here, as in tableMetadataCache (and effectively in snapshotIds, although the code doesn't make it obvious)

findepi force-pushed the master branch from 8538e49 to 1f896ea Compare

July 30, 2021 22:13

lxynov mentioned this pull request

Fix stats/cost caching of group-referenced plan nodes #11858

Merged

findepi mentioned this pull request

Add table stats cache #12196

Closed

Member

findepi commented May 23, 2022

I like the engine-side approach: #12196
Engine shouldn't ask a connector for same information multiple times.

However, the engine-side has some limitations. For example, when dereference pushdown happens, a new ConnectorTableHandle is injected into the plan. The engine doesn't know it's "almost the same" as before, so will ask a connector again. Thus, this change here may still be useful (or not).

cc @lxynov @alexjo2144

Member

colebow commented Oct 27, 2022

👋 @clemensvonschwerin - this PR is inactive and doesn't seem to be under development. If you'd like to continue work on this at any point in the future, feel free to re-open.

colebow closed this

Member

bitsondatadev commented May 18, 2023

I like the engine-side approach: #12196 Engine shouldn't ask a connector for same information multiple times.

This was completed here: #13047

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels