Skip to content

Do not close Hadoop FileSystem instances#13810

Merged
electrum merged 1 commit intotrinodb:masterfrom
electrum:fsclose
Aug 29, 2022
Merged

Do not close Hadoop FileSystem instances#13810
electrum merged 1 commit intotrinodb:masterfrom
electrum:fsclose

Conversation

@electrum
Copy link
Copy Markdown
Member

@electrum electrum commented Aug 24, 2022

Description

Is this change a fix, improvement, new feature, refactoring, or other?

fix

Documentation

(x) No documentation is needed.

Release notes

(x) Release notes entries required with the following suggested text:

# Delta Lake connector
* Prevent potential failure of unrelated queries after dropping a schema. ({issue}`13810`)

# Hive connector
* Prevent potential failure of unrelated queries after dropping a schema. ({issue}`13810`)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to change the SyncingFileSystem usage? This actually seemed fine before. For example, if you disable this, then you break the deleteOnExit behavior if that is being used.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It wasn't necessary to close before, as we didn't use that behavior. It seems better to simply remove the close than to suppress the warning.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"singletons" or "shared"?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add this to the commit message so that the motivation for this change is clear

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I understand the question. They are singletons and thus are shared.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be more clear if I wrote "shared singletons"?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like plural "singletons". To me a singletons is a one object.
Pool of shared objects is "shared objects" to me.

Also, FS instances have some lifecycle, right? They are created, cached and closed behind the scenes, am i correct?
So the application can eventually create unlimited number of these objects over time.

Copy link
Copy Markdown
Member

@martint martint Aug 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no contradiction in the plural form of "singletons". Each type of FileSystem can be a singleton (one for S3, one for HDFS, one for Minio, etc), and you can still refer to them collectively as a group of singletons.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we create as many FileSystem objects as many different URI schemes are in use?
Does the answer depend on extra credentials like for GCS?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's best to say that "it's complicated" and you can find the exact logic is in TrinoFileSystemCache. I changed the wording to "shared" since it seems to be a better way to communicate why we shouldn't close them.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this be caught when the code uses explicit type, eg. SyncingFileSystem?
i'd guess the code might be compiled with actual implementation method name.

Maybe the shared FileSystem instances should be wrapped with something that prevents inadvertent close?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, as long as it's not cast to Closeable. We're getting rid of FileSystem usage so I'd rather not make an invasive change like that.

Copy link
Copy Markdown
Contributor

@erichwang erichwang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me. this needs to go in

@phd3 phd3 requested a review from jitheshtr August 24, 2022 19:01
Hadoop FileSystem instances are shared and should not be closed.
@electrum electrum merged commit 25bd017 into trinodb:master Aug 29, 2022
@electrum electrum deleted the fsclose branch August 29, 2022 22:45
@github-actions github-actions bot added this to the 394 milestone Aug 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

6 participants