Do not close Hadoop FileSystem instances#13810
Conversation
There was a problem hiding this comment.
Do we want to change the SyncingFileSystem usage? This actually seemed fine before. For example, if you disable this, then you break the deleteOnExit behavior if that is being used.
There was a problem hiding this comment.
It wasn't necessary to close before, as we didn't use that behavior. It seems better to simply remove the close than to suppress the warning.
.mvn/modernizer/violations.xml
Outdated
There was a problem hiding this comment.
Please add this to the commit message so that the motivation for this change is clear
There was a problem hiding this comment.
Not sure I understand the question. They are singletons and thus are shared.
There was a problem hiding this comment.
Would it be more clear if I wrote "shared singletons"?
There was a problem hiding this comment.
I don't like plural "singletons". To me a singletons is a one object.
Pool of shared objects is "shared objects" to me.
Also, FS instances have some lifecycle, right? They are created, cached and closed behind the scenes, am i correct?
So the application can eventually create unlimited number of these objects over time.
There was a problem hiding this comment.
There's no contradiction in the plural form of "singletons". Each type of FileSystem can be a singleton (one for S3, one for HDFS, one for Minio, etc), and you can still refer to them collectively as a group of singletons.
There was a problem hiding this comment.
So we create as many FileSystem objects as many different URI schemes are in use?
Does the answer depend on extra credentials like for GCS?
There was a problem hiding this comment.
I think it's best to say that "it's complicated" and you can find the exact logic is in TrinoFileSystemCache. I changed the wording to "shared" since it seems to be a better way to communicate why we shouldn't close them.
.mvn/modernizer/violations.xml
Outdated
There was a problem hiding this comment.
Will this be caught when the code uses explicit type, eg. SyncingFileSystem?
i'd guess the code might be compiled with actual implementation method name.
Maybe the shared FileSystem instances should be wrapped with something that prevents inadvertent close?
There was a problem hiding this comment.
Yes, as long as it's not cast to Closeable. We're getting rid of FileSystem usage so I'd rather not make an invasive change like that.
erichwang
left a comment
There was a problem hiding this comment.
looks good to me. this needs to go in
Hadoop FileSystem instances are shared and should not be closed.
Description
fix
Documentation
(x) No documentation is needed.
Release notes
(x) Release notes entries required with the following suggested text: