Skip to content

Implement caching of iceberg metadata on coordinator#22739

Merged
raunaqmorarka merged 3 commits intotrinodb:masterfrom
raunaqmorarka:small-files-cache
Aug 1, 2024
Merged

Implement caching of iceberg metadata on coordinator#22739
raunaqmorarka merged 3 commits intotrinodb:masterfrom
raunaqmorarka:small-files-cache

Conversation

@raunaqmorarka
Copy link
Copy Markdown
Member

@raunaqmorarka raunaqmorarka commented Jul 19, 2024

Description

This is intended to reduce filesystem accesses for metadata operations in iceberg without having to explicitly configure filesystem caching on local disks

Additional context and related issues

Pre-requisite to landing #22584 as that potentially increases filesystem accesses for planning queries with different columns projected from same table through multiple scans

Release notes

( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
(x) Release notes are required, with the following suggested text:

# Iceberg
* Improve query planning performance. ({issue}`22739`)
  Iceberg metadata files are now cached on the coordinator, this can be disabled by setting the catalog configuration property `iceberg.metadata-cache.enabled` to `false`

@cla-bot cla-bot bot added the cla-signed label Jul 19, 2024
@raunaqmorarka raunaqmorarka requested review from ebyhr, electrum, findepi, findinpath, jkylling and martint and removed request for martint July 19, 2024 06:12
@github-actions github-actions bot added iceberg Iceberg connector delta-lake Delta Lake connector labels Jul 19, 2024
@github-actions github-actions bot added the hive Hive connector label Jul 19, 2024
@raunaqmorarka raunaqmorarka force-pushed the small-files-cache branch 2 times, most recently from f01974c to 3e7ebd4 Compare July 19, 2024 09:00
Copy link
Copy Markdown
Contributor

@jkylling jkylling left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will also cache data files if the coordinator does execution?

@github-actions github-actions bot added the hudi Hudi connector label Jul 22, 2024
@raunaqmorarka raunaqmorarka changed the title Implement caching of small files on coordinator Implement caching of iceberg metadata on coordinator Jul 22, 2024
Comment on lines 29 to 30
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Default configuration for cache is not stable if Runtime.getRutime().maxMemory() < 2000 MB?

@raunaqmorarka
Copy link
Copy Markdown
Member Author

iceberg parquet partitioned.pdf
Screenshot 2024-07-23 at 1 36 52 PM

iceberg parquet unpartitioned.pdf
Screenshot 2024-07-23 at 1 37 25 PM

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How frequently cacheLength is called?

Seems like length could be important to cache. @jkylling could alluxio provide an implementation of this method?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the context of iceberg metadata files, we need it only because SNAPSHOT file lengths are not known in advance and iceberg library uses file length to open it's avro reader.
Having it in the cache helps for subsequent queries.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I can tell the CacheManager interface we use from Alluxio does not have a way to get the file length.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe the javadoc should be changed noting that location could be a directory now. What's the semantics then?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no "semantics", it's a hint to the cache, it can do whatever with it.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the expectation is that it should evict cached data for all files in location, right?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's not a strict requirement, but ideally files under directory should be deleted.
this was needed only because certain tests manipulate files and directories directly, it's not a problem for real workloads where direct manipulation is not allowed.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: prob should be default

@raunaqmorarka raunaqmorarka force-pushed the small-files-cache branch 2 times, most recently from 964eefc to 3712402 Compare July 30, 2024 14:53
@sopel39
Copy link
Copy Markdown
Member

sopel39 commented Jul 30, 2024

#22739 (comment)

analysis time is not reported as planning time, so the gains are probably even better

Every implementation of TrinoInputFile caches lastModified.
Therefore, calls to lastModified() should not be traced repeatedly.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems better to put these in trino-filesystem-manager module as that is where they are used.

Copy link
Copy Markdown
Member Author

@raunaqmorarka raunaqmorarka Jul 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would require making MemoryInputStream and MemoryInput public and bring some dependencies into trino-filesystem-manager pom.xml
Doable but the current way seems simpler.
Let me know if you would still like to move these classes to trino-filesystem-manager module.

This is intended to reduce filesystem accesses for metadata
operations in iceberg without having to explicitly
configure filesystem caching on local disks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla-signed delta-lake Delta Lake connector docs hive Hive connector hudi Hudi connector iceberg Iceberg connector performance

Development

Successfully merging this pull request may close these issues.

7 participants