Skip to content

Conversation

@sopel39
Copy link
Member

@sopel39 sopel39 commented Jan 23, 2023

Split caching into metadata and stats cache. Stats
pulling puts significant pressure on metastore. However,
stats don't have to be always up to date in order
to get good query plance. Therefore, stats can
be cached by default.

RELEASE NOTES:

Hive/Delta
* Reduce query latency by caching partition and table statistics by default. {}

@sopel39
Copy link
Member Author

sopel39 commented Jan 23, 2023

cc @electrum

@sopel39 sopel39 force-pushed the ks/stats_cache branch 2 times, most recently from b0a2b41 to da55b27 Compare January 24, 2023 10:49
Copy link
Member

@findepi findepi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

Please make sure not to rebase wrt to current master when applying changes. No need for fixup commits.

Copy link
Member

@findepi findepi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

@sopel39
Copy link
Member Author

sopel39 commented Jan 26, 2023

On top of #15864

@findepi
Copy link
Member

findepi commented Jan 27, 2023

On top of #15864

Please ping me when the other one is merged. Thanks!

Split caching into metadata and stats cache. Stats
pulling puts significant pressure on metastore. However,
stats don't have to be always up to date in order
to get good query plance. Therefore, stats can
be cached by default.
@sopel39
Copy link
Member Author

sopel39 commented Jan 30, 2023

@findepi rebased

.hdfsEnvironment(hdfsEnvironment)
.build()))
.executor(executor)
.metadataCacheEnabled(true)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why using a different setup than the production default here?
add a code comment

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure. Cache was enabled here "forever"

@sopel39 sopel39 merged commit a73ca06 into trinodb:master Jan 31, 2023
@sopel39 sopel39 deleted the ks/stats_cache branch January 31, 2023 12:12
@sopel39 sopel39 mentioned this pull request Jan 31, 2023
@github-actions github-actions bot added this to the 407 milestone Jan 31, 2023
sopel39 added a commit to starburstdata/trino that referenced this pull request Mar 19, 2023
Some deployments might had hive.metastore-cache-ttl already set.
trinodb#15811 introduced new config
hive.metastore-stats-cache-ttl which could be lower (by default)
for such deployments. However, hive.metastore-cache-ttl
should take precedense over hive.metastore-stats-cache-ttl as
it's more generic (affects whole metastore cache).
sopel39 added a commit that referenced this pull request Mar 19, 2023
Some deployments might had hive.metastore-cache-ttl already set.
#15811 introduced new config
hive.metastore-stats-cache-ttl which could be lower (by default)
for such deployments. However, hive.metastore-cache-ttl
should take precedense over hive.metastore-stats-cache-ttl as
it's more generic (affects whole metastore cache).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

2 participants