Skip to content

Conversation

@steveloughran
Copy link
Contributor

@steveloughran steveloughran commented Oct 19, 2020

This adds a semaphore to throttle the number of FileSystem instances which
can be created simultaneously, set in "fs.creation.parallel.count".

This is designed to reduce the impact of many threads in an application calling
FileSystem.get() on a filesystem which takes time to instantiate -for example
to an object where HTTPS connections are set up during initialization.
Many threads trying to do this may create spurious delays by conflicting
for access to synchronized blocks, when simply limiting the parallelism
diminishes the conflict, so speeds up all threads trying to access
the store.

The default value, 64, is larger than is likely to deliver any speedup -but
it does mean that there should be no adverse effects from the change.

If a service appears to be blocking on all threads initializing connections to
abfs, s3a or store, try a smaller (possibly significantly smaller) value.

Contributed by Steve Loughran.

@mukund-thakur
Copy link
Contributor

Looks good. Just the pending test.

@steveloughran
Copy link
Contributor Author

yeah, let me get that out the way before I forget about that cache. I'll use a counter of discarded instances and test off that. I'll plan some tricks to avoid the test being fussy about timing...have a semaphore inside the fake FS I'll be instantiating to block its construction, as sleep() is brittle as well as slowing down test runs

@steveloughran
Copy link
Contributor Author

Also, I plan to move the close in a discard out of the locked area. It's not needed and an FS is doing anything there, we don't want to block the other threads

@apache apache deleted a comment from hadoop-yetus Oct 26, 2020
Copy link
Contributor

@mehakmeet mehakmeet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Just some small nits and checkstyles.

Have one doubt also: Is there particular reasoning for "64" as the default value?

@steveloughran
Copy link
Contributor Author

./hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FileSystem.java:3512:      checkArgument(permits > 0 , "Invalid value of %s: %s",:33: ',' is preceded with whitespace. [NoWhitespaceBefore]
./hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FileSystem.java:3553:              new DurationInfo(LOGGER, false, "Acquiring creator semaphore for %s",: Line is longer than 80 characters (found 83). [LineLength]
./hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/TestFileSystemCaching.java:465:    private static final Semaphore sem = new Semaphore(1);:36: Name 'sem' must match pattern '^[A-Z][A-Z0-9]*(_[A-Z0-9]+)*$'. [ConstantName]

@steveloughran
Copy link
Contributor Author

Is there particular reasoning for "64" as the default value?

No. I wondered whether to make it smaller or just leave large.

Large: no visible impact of this change anywhere, so lowest risk
Small: should always speed up applications which always use it.

I think I should add a mention of this in the s3a performance doc, so it's not forgotten about

@hadoop-yetus

This comment has been minimized.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if close() throw IOException ?
Even it has cached object, it will not return that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is an interesting thought. I just left that code as is, but yes, it could fail

Looking at FileSystem.close, the removal of the entry from the cache should be in a finally clause too, shouldn't it. ouch. Ignoring that for now.

How about I go to IOUtils.close() & catch and log on failures.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at FileSystem.close, the removal of the entry from the cache should be in a finally clause too, shouldn't it. ouch. Ignoring that for now.

I feel not required, processDeleteOnExit() already catching the IOException.

How about I go to IOUtils.close() & catch and log on failures.

I didn't get this, are you planning to use IOUtils.close() to close filesystem instance ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel not required, processDeleteOnExit() already catching the IOException.

yes, but look at the subclasses

@apache apache deleted a comment from hadoop-yetus Nov 9, 2020
@steveloughran steveloughran force-pushed the s3/HADOOP-17313-fs-get-semaphore branch from fa4b65d to 40f9e98 Compare November 17, 2020 15:20
@hadoop-yetus

This comment has been minimized.

@steveloughran
Copy link
Contributor Author

Don't get the test failure. Will rebase

[ERROR] testServerKeyPasswordDefaultsToPassword(org.apache.hadoop.security.ssl.TestSSLFactory)  Time elapsed: 0.103 s  <<< ERROR!
java.security.GeneralSecurityException: The property 'ssl.server.keystore.location' has not been set in the ssl configuration file.
	at org.apache.hadoop.security.ssl.FileBasedKeyStoresFactory.init(FileBasedKeyStoresFactory.java:153)
	at org.apache.hadoop.security.ssl.SSLFactory.init(SSLFactory.java:187)
	at org.apache.hadoop.security.ssl.TestSSLFactory.checkSSLFactoryInitWithPasswords(TestSSLFactory.java:520)
	at org.apache.hadoop.security.ssl.TestSSLFactory.checkSSLFactoryInitWithPasswords(TestSSLFactory.java:440)
	at org.apache.hadoop.security.ssl.TestSSLFactory.testServerKeyPasswordDefaultsToPassword(TestSSLFactory.java:397)

@steveloughran steveloughran force-pushed the s3/HADOOP-17313-fs-get-semaphore branch from 40f9e98 to 5c41240 Compare November 19, 2020 14:59
@hadoop-yetus

This comment has been minimized.

Adds a semaphore to throttle the #of parallel clients which can be
created simultaneously, set in "fs.creation.parallel.count".

Change-Id: I794cecac4a23ae7e1aa376e16b4085ea5ae20086
* Move FileSystem.close() outside the semaphored block, so
  it does not hold up other threads.
* Tests which use the count of discarded instances for their
  assertions.

Change-Id: I4dfaf6f2a327142048438b1bfa4c4517e595df50
* Address Mehakmeet's comments
* Checkstyles
* And document in s3 performance file

Change-Id: I9568f980f4e9917448a1d4d1c7c2d070a6f28bad
if, in FileSystem.get(), a superfluous instance is closed(), exceptions
there are swallowed.

Also: put the deregistration of the class from the cache into a finally
block, so even if a subclass's processDeleteOnExit code raises any exception,
the filesystem is removed from the cache

Change-Id: I513215fe266bbad7a864481d0505cc9073a7c35a
@steveloughran steveloughran force-pushed the s3/HADOOP-17313-fs-get-semaphore branch from 5c41240 to 788a73b Compare November 23, 2020 15:52
@hadoop-yetus

This comment has been minimized.

@steveloughran
Copy link
Contributor Author

@surendralilhore -could you look @ this again? Think I've tried to address your comments

@surendralilhore
Copy link
Contributor

@surendralilhore -could you look @ this again? Think I've tried to address your

+1

@steveloughran
Copy link
Contributor Author

thanks -merging to 3.3+!

@steveloughran steveloughran merged commit ac7045b into apache:trunk Nov 25, 2020
asfgit pushed a commit that referenced this pull request Nov 25, 2020
…s. (#2396)

This adds a semaphore to throttle the number of FileSystem instances which
can be created simultaneously, set in "fs.creation.parallel.count".

This is designed to reduce the impact of many threads in an application calling
FileSystem.get() on a filesystem which takes time to instantiate -for example
to an object where HTTPS connections are set up during initialization.
Many threads trying to do this may create spurious delays by conflicting
for access to synchronized blocks, when simply limiting the parallelism
diminishes the conflict, so speeds up all threads trying to access
the store.

The default value, 64, is larger than is likely to deliver any speedup -but
it does mean that there should be no adverse effects from the change.

If a service appears to be blocking on all threads initializing connections to
abfs, s3a or store, try a smaller (possibly significantly smaller) value.

Contributed by Steve Loughran.

Change-Id: I57161b026f28349e339dc8b9d74f6567a62ce196
@steveloughran steveloughran deleted the s3/HADOOP-17313-fs-get-semaphore branch October 15, 2021 19:50
jojochuang pushed a commit to jojochuang/hadoop that referenced this pull request May 23, 2023
…te FS clients. (apache#2396)

This adds a semaphore to throttle the number of FileSystem instances which
can be created simultaneously, set in "fs.creation.parallel.count".

This is designed to reduce the impact of many threads in an application calling
FileSystem.get() on a filesystem which takes time to instantiate -for example
to an object where HTTPS connections are set up during initialization.
Many threads trying to do this may create spurious delays by conflicting
for access to synchronized blocks, when simply limiting the parallelism
diminishes the conflict, so speeds up all threads trying to access
the store.

The default value, 64, is larger than is likely to deliver any speedup -but
it does mean that there should be no adverse effects from the change.

If a service appears to be blocking on all threads initializing connections to
abfs, s3a or store, try a smaller (possibly significantly smaller) value.

Contributed by Steve Loughran.

Change-Id: I57161b026f28349e339dc8b9d74f6567a62ce196
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants