Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions docs/src/main/sphinx/connector/hive.rst
Original file line number Diff line number Diff line change
Expand Up @@ -388,6 +388,8 @@ Property Name Description
with a leading 0. If set to 'skip', permissions of newly
created directories will not be set by Trino.

``hive.fs.cache.max-size`` Maximum number of cached file system objects. 1000
Copy link
Copy Markdown
Member

@mosabua mosabua Mar 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

technical question .. and potentially necessary link.. is this about Hive object storage caching or something else? We should link to the relevant docs..

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. It is unrelated to hive object storage caching. It is the max size of the TrinoFileSystemCache which is an implementation of the Hadoop interface FileSystemCache.

This cache is used to speed up requests to get FileSystem objects.

I don't know any docs we could link to for explaining in more detail I'm afraid.

The reason I wanted to document this is because we hit this error sometimes on long running Trino clusters:

io.trino.spi.TrinoException: FileSystem max cache size has been reached: 1000

To increase this cache size limit and avoid hitting this error, you need to change the hive.fs.cache.max-size property.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation. So it seems like there is another caching going on .. on the workers only I assume. Maybe @hashhar @findepi or others can chime in with any other relevant info we should add. From my perspective this is already a worth improvement and could be merged. but more info might be better..

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not caching of data. @electrum will know more.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @findepi .. we definitely need to figure out more .. and if people hit this limit often .. maybe we need to raise the default value as well ..

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've only hit this limit once. It's not common in my experience to hit this limit.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since it's uncommon to hit this why document at all? People change values and then end up with broken setups. 🤔

Copy link
Copy Markdown
Member

@mosabua mosabua Jul 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because you want to at least enable the uncommon usage.. by your logic we should not have this as a modifiable setting at all @hashhar .. all our settings should have reasonable defaults and not need changing .. but we still need to enable the users that require changing a setting and document them. The alternative is that users are required to look things up in the log or the source code or just cant fix a problem.. thats much worse than having some documentation

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the default is "reasonable" since according to Padraig he's only ever needed to do it once.

I'm fine with documenting this but with the number of advanced toggles we have (either for testing or people who know what they are doing) documenting them all would just increase burden on both readers of the docs and the people keeping them up to date.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure.. but thats what it takes .. documentation should be complete .. all alternatives are worse


``hive.query-partition-filter-required`` Set to ``true`` to force a query to use a partition filter. ``false``
You can use the ``query_partition_filter_required`` catalog
session property for temporary, catalog specific use.
Expand Down