-
Notifications
You must be signed in to change notification settings - Fork 3k
Core: Expose the stats of the manifest file content cache #13560
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Core: Expose the stats of the manifest file content cache #13560
Conversation
|
As described in the commit message, the purpose here is to expose the stats of the manifest file content cache. This would be useful to see the cache hot/miss ratio and to have some observability on the performance of this cache so that admins can tweak the configs of the cache. |
|
This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the [email protected] list. Thank you for your contributions. |
|
I plan to work on this. Commenting to remove the stale label |
|
This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the [email protected] list. Thank you for your contributions. |
|
not stale |
|
This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the [email protected] list. Thank you for your contributions. |
|
still on my TODO list, just couldn't get to it |
|
This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the [email protected] list. Thank you for your contributions. |
|
This pull request has been closed due to lack of activity. This is not a judgement on the merit of the PR in any way. It is just a way of keeping the PR queue manageable. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time. |
|
reopen |
3176c00 to
31f39aa
Compare
31f39aa to
68d04c6
Compare
| import org.immutables.value.Value; | ||
|
|
||
| @Value.Immutable | ||
| public abstract class CacheMetrics { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think later on we might want to expose catalog-level table stats to the clients, like the stats of CachingCatalog or the stats of the table cache used in RESTSessionCatalog for freshness-aware loading. To do that this class could be reused as a container of the stats, but then we might want to put this into api/ instead of core/
|
|
||
| public abstract long evictionCount(); | ||
|
|
||
| public abstract long evictionWeight(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does this mean?
Is this general enough to have a meaning if we happen to change the cache implementation from caffeine to something else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, for different cache implementations probably not all these fields are available. I checked EhCache stats and apparently maybe 3 of the above metrics are included there: hit count, miss count, eviction count.
I see two way from here:
- Make this CacheMetrics class as narrow as it could serve the common fields from different cache implementations. I think this way the offered list of stats would be very narrow, maybe hit and miss count only.
- We could make the metrics objects instead of primitives (e.g. Long) and then if the cache implementation doesn't support some of the metrics we could leave it null. This way we have the flexibility to support more cache implementations and not to be too narrow. WDYT @pvary ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would build on the exact use-case here. What is needed, and why?
Based on that we could decide, but I would prefer a limited API. We can always add more, if needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initially we are fine with the following list of metrics: hitCount, missCount, totalLoadTime and evictionCount. These are the ones that would add actual observability value to fine tune the manifest content cache IMO.
Note, by narrowing the list we still don't solve the issue of different cache implementations have different stats, so maybe other implementations won't have totalLoadTime for instance. Would it still make sense to make the member non-primitive to have the ability of leaving them null?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Gabor and Peter,
Thank you for looking into this. I believe minimal cache metrics exposure is still better than nothing. In Apache Impala usecase, we usually display metrics like this in Coordinator / Catalog daemon WebUI and let metrics monitoring service scraping from it. Having ability to obeserve the cache metrics will help user tune the best config for their Iceberg usecase.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I look again at #4518 and saw many reference into the initial implementation of manifest caching in Iceberg java library. Having a standard API in Iceberg lib related to caching will be great. I'd be happy to contribute / review further improvement of manifest caching to help benefit the broader community Iceberg community.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reduced the list of exposed metrics to be as general as possible based on the comments from @pvary .
About introducing a new API to provide the manifest cache stats: I'm not entirely convinced (I saw the interest in Presto to use stats) but not entirely against either. @pvary Would you mind sharing your view?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to hear input from other engines as well. Maybe get someone from Presto involved.
For observability purposes clients could use the stats of the manifest file content cache to see for instance the cache hit/miss ratio so that users can fine tune the configuration of the cache.
68d04c6 to
78ac11a
Compare
|
Merged to main. |
For observability purposes clients could use the stats of the manifest file content cache to see for instance the cache hit/miss ratio so that users can fine tune the configuration of the cache.