Extend metadata cache flush procedure to flush specific caches#10385
Extend metadata cache flush procedure to flush specific caches#10385losipiuk merged 3 commits intotrinodb:masterfrom
Conversation
db3b253 to
42dc5d7
Compare
plugin/trino-hive/src/main/java/io/trino/plugin/hive/metastore/cache/CachingHiveMetastore.java
Outdated
Show resolved
Hide resolved
|
@aczajkowski Could you provide a rationale for this feature? Caches are usually not something that one should micromanage. Why flushing entire cache is not sufficient? |
@sopel39 This is mostly for large data sets where deltas from recent period are being updated (overwritten) since next period starts. Eg. Load events from current day (0:00 - 6:00] after six hours export and load events from (0:00 - 12:00] Currently each time delta is being written we:
|
|
I would focus on that particular partition cache usecase then and avoid other complexity for now. We could have |
I agree we should expose as little functionality as sufficient. Every functionality adds up to maintenance cost
There is no "partition cache". There is "metadata cache containing partition information", and there also is "file listing cache containing partitions' data files". I like the idea of "overloading" Since we may want to be future-proof here and keep ability to add more options to the procedure (hopefully this never happens, but butter be prepared), we could allow syntaxes while disallowing syntax to do this, we could add a fake parameter as the first one: return new Procedure(
"system",
"flush_metadata_cache",
ImmutableList.of(
new Procedure.Argument(
"$fake_first_parameter",
VARCHAR,
false,
"procedure should only be invoked with name parameters"),
new Procedure.Argument(
"schema_nname",
VARCHAR,
false,
...
), |
Do the details matter for end user? I don't think user cares what is internal cache layout. All he wants is to evict partition information from metadata cache. Thus |
|
As for the user interface I am on @findepi's side with it. Having a single |
42dc5d7 to
82192a3
Compare
...src/main/java/io/trino/plugin/hive/metastore/procedure/FlushHiveMetastoreCacheProcedure.java
Outdated
Show resolved
Hide resolved
82192a3 to
dec68a2
Compare
6f4892d to
a7de198
Compare
.../test/java/io/trino/plugin/hive/metastore/cache/TestCachingHiveMetastoreWithQueryRunner.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
TestCachingHiveMetastoreWithQueryRunner -- this is an odd name for a test class.
It's named like this to differentiate from a unit test TestCachingHiveMetastore, and so it should be called TestCachingHiveMetastoreQueries (alas, this already exists! let's merge the classes -- as a follow-up)
There was a problem hiding this comment.
Yes we could merge them there are some differences in QueryRunner setup, but i think we can manage to adjust.
.../test/java/io/trino/plugin/hive/metastore/cache/TestCachingHiveMetastoreWithQueryRunner.java
Outdated
Show resolved
Hide resolved
.../test/java/io/trino/plugin/hive/metastore/cache/TestCachingHiveMetastoreWithQueryRunner.java
Outdated
Show resolved
Hide resolved
...src/main/java/io/trino/plugin/hive/metastore/procedure/FlushHiveMetastoreCacheProcedure.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/main/java/io/trino/plugin/hive/metastore/cache/CachingHiveMetastore.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/main/java/io/trino/plugin/hive/metastore/cache/CachingHiveMetastore.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/main/java/io/trino/plugin/hive/metastore/cache/CachingHiveMetastore.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/main/java/io/trino/plugin/hive/metastore/cache/CachingHiveMetastore.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/test/java/io/trino/plugin/hive/BaseTestHivePartitionsOnDataLake.java
Outdated
Show resolved
Hide resolved
a7de198 to
1db58b8
Compare
4bac055 to
6ffcd24
Compare
|
@findepi Thx for review and approval. Tests got green. Do you want some additional reviewers to approve or we could merge ? |
plugin/trino-hive/src/test/java/io/trino/plugin/hive/BaseTestHivePartitionsOnDataLake.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/test/java/io/trino/plugin/hive/BaseTestHivePartitionsOnDataLake.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/test/java/io/trino/plugin/hive/BaseTestHivePartitionsOnDataLake.java
Outdated
Show resolved
Hide resolved
losipiuk
left a comment
There was a problem hiding this comment.
LGTM. Some comments to test code.
6ffcd24 to
7de8efb
Compare
|
@losipiuk applied your comments. Please let me know if updated integration test is ok now. |
7de8efb to
0e682c0
Compare
This PR extends existing Hive connector procedure
system.flush_metadata_cache()with optional parameters which will allow to make invocation more specific to concrete schema, table or even partition.E.g.