Add ability to disable partitions caching in hive metastore#12343
Add ability to disable partitions caching in hive metastore#12343findepi merged 1 commit intotrinodb:masterfrom
Conversation
There was a problem hiding this comment.
add a method to enum instead
scope.matches(otherScope);
There was a problem hiding this comment.
I would enable this only when ALL
There was a problem hiding this comment.
I would enable this only when ALL
plugin/trino-hive/src/main/java/io/trino/plugin/hive/metastore/cache/CachingHiveMetastore.java
Outdated
Show resolved
Hide resolved
What is the use-case scenario when a user would want/need to use this? |
It allows user to partially discard a cache - like we would like to cache the table details but not the partition information (if the user adds/drops a partition frequently) |
2aeaa77 to
3040352
Compare
I know what the change does on a technical level. It does not answer my question about a use-case scenario when a user would want/need to use this. |
In case of new partitions being added or dropped at some irregular interval - would make the system unusable at times at it fails (if partitions are dropped). We need to either disable the cache fully - but we might hit the underlying metastore too many times. This would be kind of a sweet spot between both the worlds. |
There was a problem hiding this comment.
can we move this to a next line ?
There was a problem hiding this comment.
Is this change required as a part of this commit ?
There was a problem hiding this comment.
Is there is any expansion for sutAction ?
There was a problem hiding this comment.
sut is a shortcut for system under test. It's a way to mark this as a holder for all actions that are actually being tested. I could rename it to cachingHiveMetastoreInteractions or metastoreInteractions/Actions. For me sut is ok, but maybe the fact that you need to know what it means makes it worst than one of the above.
There was a problem hiding this comment.
sut is a shortcut for
system under test.
would never guess
There was a problem hiding this comment.
i suggest hive.metastore-cache.cache-partitions and rename existing configs in a follow-up.
There was a problem hiding this comment.
Ok we can do it this way.
There was a problem hiding this comment.
use actual default: new ..Config().is...CacheaPartitions()
There was a problem hiding this comment.
Ok so when I've wanted to change this it hit me as a very bad practice. I understand that the reasoning for keeping the value automated on the test makes it easier to maintain changes in the default value of this field. Still I feel that making it so defeats the purpose of having these tests anyway (and makes it harder to read). I would argue that in the test we maintain automatically checked contract of the configuration of Trino, so that whenever we change defaults like this, we need to at least make it intentionally and this tests guards us from doing such changes by mistake.
Otherwise we could derive all our defaults from the inlined instance of the class, which would mean that we are writing tests for defaults only because we have a checkstyle for that...
There was a problem hiding this comment.
@s2lomon
if this tests's purpose is to test things related to partitions being cached or not, then i agree, the config should be set explicitly.
however, it's not the goal here. you supplied the config only because the compiler ordered you to (missing parameter), not that because you really wanted to enable or disable partition caching.
new ..Config().is...CacheaPartitions() conveys this intent, and neither true nor false conveys that intent
There was a problem hiding this comment.
switch to new ..Config().is...CacheaPartitions() please
There was a problem hiding this comment.
Omg you are right, for some reason I've thought that this comment was about the default test. Good point and thanks for being patient :)
There was a problem hiding this comment.
You chose to add the field as the last one in the config class, so add it as a last one in the test class too.
(or move declaration in the config class)
(same below)
There was a problem hiding this comment.
cmt msg
Thanks to this, one can disable caching of partitions, that
should fix issues with missing partitions during query
execution.
Mention the fact this is for the case where table partitions are modified externally.
I guess it should fix issues like this one. #6286 Or at least would be a good try for such scenarios. Alternatively we could add fine tuning parameters, but I guess it's easier to have it cached as always or not at all. |
3040352 to
7cf7803
Compare
There was a problem hiding this comment.
the executor (refresh executor) is useless for a "never cache", drop the param
There was a problem hiding this comment.
So Optional.empty()? ok.
There was a problem hiding this comment.
i suggest hive.metastore-cache.cache-partitions and rename existing configs in a follow-up.
There was a problem hiding this comment.
switch to new ..Config().is...CacheaPartitions() please
There was a problem hiding this comment.
make it last (as everywhere else)
plugin/trino-hive/src/main/java/io/trino/plugin/hive/metastore/cache/CachingHiveMetastore.java
Outdated
Show resolved
Hide resolved
7cf7803 to
4e4945e
Compare
...rino-hive/src/main/java/io/trino/plugin/hive/metastore/cache/CachingHiveMetastoreConfig.java
Outdated
Show resolved
Hide resolved
Thanks to this, one can disable caching of partitions, that should fix issues with missing partitions during query execution. This can happen when partitions are often replaced or changed by some automated external processes.
4e4945e to
f89babb
Compare
|
CI #11275 (reopened) |
|
@colebow can you verify if we need docs and if yes .. raise a PR asap. |
Description
It's a new feature that helps fixing some issues with inconsistencies in hive metastore caching
It's a change to a connector
It allows you to pick whether you want to cache ALL, DOWN TO TABLE, or PARTITION metadata for hive metastore.
Related issues, pull requests, and links
Documentation
( ) No documentation is needed.
( ) Sufficient documentation is included in this PR.
( ) Documentation PR is available with #prnumber.
( ) Documentation issue #issuenumber is filed, and can be handled later.
Release notes
( ) No release notes entries required.
( ) Release notes entries required with the following suggested text: