Update TrinoFileSystemCache to represent latest hadoop implementation#13243
Update TrinoFileSystemCache to represent latest hadoop implementation#13243phd3 merged 2 commits intotrinodb:masterfrom
Conversation
e904560 to
2346c4f
Compare
2346c4f to
5dd03bb
Compare
|
Todo -
|
a343cd2 to
a90461f
Compare
| int maxSize = conf.getInt("fs.cache.max-size", 1000); | ||
| FileSystemHolder fileSystemHolder; | ||
| try { | ||
| fileSystemHolder = cache.compute(key, (k, currFileSystemHolder) -> { |
There was a problem hiding this comment.
nit: currFileSystemHolder -> currentFileSystemHolder
| private static class FileSystemHolder | ||
| { | ||
| private final FileSystem fileSystem; | ||
| private final URI uri; |
There was a problem hiding this comment.
do we need to store uri/conf here? I think we should be able to put them in createFileSystemOnce right?
There was a problem hiding this comment.
The thought process was to keep uri and conf provided by the thread who created the FileSystemHolder key - which is what happens in existing implementation. createFileSystemOnce() could be called by a different thread having a different uri and/or conf object due to the original thread being scheduled out of execution by operating system just before invoking createFileSystemOnce().
There was a problem hiding this comment.
That shouldn't be a concern right ?
There was a problem hiding this comment.
Yes, should be fine, will update - we can avoid storing uri/conf in FileSystemHolder with this change.
| } | ||
| }); | ||
|
|
||
| fileSystemHolder.createFileSystemOnce(); |
There was a problem hiding this comment.
Can we add a comment here why this is outside of the cache compute? Seems like that's an important piece of why this works
a90461f to
ab05c55
Compare
phd3
left a comment
There was a problem hiding this comment.
The implementation looks good to me, but IMO would be useful to get some more 👀 on this as well since the change is a bit involved.
There was a problem hiding this comment.
Let's add a comment saying why we need cacheSize separately.
| private static class FileSystemHolder | ||
| { | ||
| private final FileSystem fileSystem; | ||
| private final URI uri; |
There was a problem hiding this comment.
That shouldn't be a concern right ?
There was a problem hiding this comment.
i++ is followed generally in codebase
There was a problem hiding this comment.
If we use a lambda, we get the error Hadoop FileSystem instances are shared and should not be closed. Had to add FileSystemCloser and annotate its consume() method with @SuppressModernizer to fix this.
There was a problem hiding this comment.
nit: may be just use the variable names for comment
There was a problem hiding this comment.
nit: null check since this is also exposed outside of the class
|
cc @electrum |
ab05c55 to
2144552
Compare
2144552 to
5fc7018
Compare
|
please rebase |
5fc7018 to
53a3ed2
Compare
phd3
left a comment
There was a problem hiding this comment.
Final set of comments - looks good to me
There was a problem hiding this comment.
would be simpler to write the following
cacheSize.getAndUpdate(currentSize -> Math.min(currentSize + 1, maxSize) == maxSize)
There was a problem hiding this comment.
nit: userCount, threadCount, getCallsPerInvocation to avoid abbreviations;
There was a problem hiding this comment.
shutdownNow() as we do not need to wait here
There was a problem hiding this comment.
this can be simplified (simile to other comment)
There was a problem hiding this comment.
why is this a related change ? Can we keep this in a separate commit ?
There was a problem hiding this comment.
This came in as part of bringing trino-testing-services dependency into trino-hdfs. Resource deallocation check added via ManageTestResources (in PR #15165) got activated in lib/trino-hdfs resulting in this test failure here. Adding fs = null; fixes this, as was the case with similar changes in the PR referenced. Can move this to a different commit
There was a problem hiding this comment.
executor.invokeAll(callableTasks).forEach(MoreFutures::getFutureValue);
- Use ConcurrentHashMap to cache filesystem objects - improves concurrency by removing synchronized blocks - Filesystem object is created outside cache's lock - similar to latest hadoop fs cache impl, further reducing code in critical section. Helps with systems where filesystem creation is expensive. - Only one thread exclusively creates the filesystem object for a given key. Avoids speculative creation and then later discarding of filesystem objects compared to hadoop fs cache impl.
53a3ed2 to
f7f8460
Compare
Motivation
Hadoop's implementation of filesystem cache (hadoop 3.2 filesystem cache) creates the filesystem object outside of synchronized block. A side effect to this (in addition to reduced locking duration for slow-to-create filesystem implementations) is that there is no interaction between the lock in caching infrastructure and locks internal to a filesystem implementation (during filesystem object creation). We would like to bring this approach to
TrinoFileSystemCacheA secondary motivation is to improve the concurrency of
TrinoFileSystemCacheoperations by avoiding synchronized blocks.Description
concurrency by removing synchronized blocks
hadoop fs cache impl, further reducing code in critical section.
Helps with systems where filesystem creation is expensive.
given key. Avoids speculative creation and then later discarding of
filesystem objects compared to hadoop fs cache impl.
There is a more recent update in hadoop 3.3.x branch that limits the number of parallel filesystem object creations using a semaphore. Looking at the description of the issue (HADOOP-17313), it seems to be created as a workaround for speculative-create-and-discard approach used in hadoop implementation which this code avoids.
Benchmark output -
Before:
After:
(above results are from an 8 core intel macbook pro)
Improvement
Update the implementation of TrinoFileSystemCache class in Hive connector.
Bring
TrinoFileSystemCacheimplementation inline with latest hadoop implementation and improve cache performance in the process.Related issues, pull requests, and links
Documentation
( ) No documentation is needed.
( ) Sufficient documentation is included in this PR.
( ) Documentation PR is available with #prnumber.
( ) Documentation issue #issuenumber is filed, and can be handled later.
Release notes
( ) No release notes entries required.
( ) Release notes entries required with the following suggested text: