Skip to content

Conversation

@okumin
Copy link
Contributor

@okumin okumin commented Sep 21, 2025

Allows TableMetadataParser to cache the content of metadata.json.

Iceberg REST Catalog is expected to serve table metadata of the same table to various clients concurrently. Therefore, caching them would improve average latency and reduce S3 cost. Here, I assume each metadata.json is totally immutable, and I may be overlooking something, or this might not be worth including iceberg-core.

Related PR

apache/hive#6022

@github-actions github-actions bot added the core label Sep 21, 2025
@pvary
Copy link
Contributor

pvary commented Sep 22, 2025

CC: @gaborkaszab

@gaborkaszab
Copy link
Collaborator

Thanks for the PR @okumin ! Also, the sequence diagram in the linked Hive PR was very useful to understand the use-case. If I'm not mistaken the motivation for this is to introduce a server-side Table/TableMetadata cache within the HMS implementation of the REST catalog and the original approach didn't work out because there is no catalog API to expose metadata location without loading the whole table. Is my assumption correct?

As an initial step I'd recommend to check if there is community support for such a broad change via asking on dev@. The reason I think this is needed, because this PR seems to affect broadly all the catalogs that load the table metadata from storage using a metadata location. Also, some of my previous experiences showed that the size of the metadata.jsons could grow pretty big, and I'm wondering if there is any study on your side, what are the optional configurations for the size of the cache and the max size of the metadata.jsons. I'd be worried that in a real world scenario only the small tables would fit into the cache anyway. Do you cache the content of the metadata.json and not the compressed gzip version that is stored on storage, right?

=======
Some thinking and side conversation

I was thinking on the architecture of the HMS-based REST catalog and maybe the root cause of these struggles to implement the server-side cache there. Let me know if I miss something. As I see the architecture is this for a load table:

  1. Call loadTable API on HMS-based REST catalog
  2. Internally call HiveCatalog's loadTable
  3. This calls HMS's loadTable that returns the metadata location
  4. Internal HiveCatalog uses the metadata location from HMS to load the TableMetadata from storage
  5. Internal HiveCatalog returns Table object
  6. LoadTableResponse is constructed and returned from HMS-based REST catalog

What I don't exactly see is why there is a need for the internal HiveCatalog other than convenience to connect to HMS (but we are already in HMS, right? Maybe different process, though). Alternatively the sequence could be this, eliminating the need for an API to get metadata location and also for the TableMetadataParser cache:

  1. Call loadTable API on HMS-based REST catalog
  2. Call HMS's loadTable directly to get the metadata location
  3. Have a cache in HMS-based REST catalog (store metadata locations, ETags, Table objects, etc.). Check if the table has been changed using the cache
    4 a) If the table has been changed, do full table load through HiveCatalog, alternatively through TableMetadataParser using the metadata location
    4 b) If the table hasn't been changed, answer the request from cache. Or send a 304-NotModified depending on the use-case

Would this make sense?

@okumin
Copy link
Contributor Author

okumin commented Sep 23, 2025

@gaborkaszab
As you say, the original motivation is exactly for REST on HMS, but the sentence "the original approach didn't work" could be revised a bit. Any options, such as where we add the caching layer (I'd say the HMS-specific approach is still valid), or any detailed configurations, such as the recommended cache size, are open and flat.

I'd say we can potentially have the following three possibilities.

(A) Let's say caching table metadata is legal and practical for REST or other use cases. In this case, reusing the same logic as the manifest caching might make sense; users would have a consistent experience across table metadata and manifests(and potentially manifest list files in the future), and it can be reused by other systems, e.g., REST Catalog over JDBC Catalog. I created this PR to demonstrate this direction.
(B) Let's say caching table metadata is both legal and practical, but the caching layer should not be part of iceberg-core. In this case, I would close this PR, and we will implement caching only in REST over HMS.
(C) Let's say caching table metadata is illegal, e.g., it violates the ACID semantics, or impractical, e.g., table metadata is typically too extensive to cache. In this case, we should reconsider whether we really need HIVE-29035.

I currently assume caching table metadata is practical and (A) can outperform (B) because the logic is not so specific to Hive Metastore. If we can agree with this point, I will send an email to talk about if iceberg-core should be able to cache table metadata and the manifest list.

@github-actions
Copy link

This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the [email protected] list. Thank you for your contributions.

@github-actions github-actions bot added the stale label Oct 24, 2025
@github-actions
Copy link

This pull request has been closed due to lack of activity. This is not a judgement on the merit of the PR in any way. It is just a way of keeping the PR queue manageable. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time.

@github-actions github-actions bot closed this Oct 31, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants