-
Notifications
You must be signed in to change notification settings - Fork 8.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HDFS-17067 Use BlockingThreadPoolExecutorService for nnProbingThreadPool in ObserverReadProxy #5803
Conversation
…ool in ObserverReadProxy
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
Hi @goiri, In this PR, we basically changed ThreadPoolExecutor to BlockingThreadPoolExecutorService, which comes with some default settings. I am not sure what unit test we should add here. What do you think? Can we merge in this change without adding new unit tests? |
...lient/src/main/java/org/apache/hadoop/hdfs/server/namenode/ha/ObserverReadProxyProvider.java
Show resolved
Hide resolved
Hi @goiri, Could you review this PR as well? thanks, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm.
Per @xinglin - This is deployed at LinkedIn and been running for approx a day without thread issue. Previous thread issue presented within 4 hours
Thanks @mccormickt12 for reviewing and approving the PR! @goiri, could you take a look? thanks, |
thanks @goiri for committing this PR to trunk. |
…ool in ObserverReadProxy (apache#5803)
…ool in ObserverReadProxy (apache#5803)
Description of PR
In HDFS-17030, we introduced an ExecutorService, to submit getHAServiceState() requests. We constructed the ExecutorService directly from a basic ThreadPoolExecutor, without setting allowCoreThreadTimeOut to true. Then, the core thread will be kept up and running even when the main thread exits. To fix it, one could set allowCoreThreadTimeOut to true. However, in this PR, we decide to directly use an existing executorService implementation (BlockingThreadPoolExecutorService) in hadoop instead. It takes care of setting allowCoreThreadTimeOut and also allows setting the prefix for thread names.
A second minor issue is we did not shutdown the executorService in close(). It is a minor issue as close() will only be called when the garbage collector starts to reclaim an ObserverReadProxyProvider object, not when there is no reference to the ObserverReadProxyProvider object. The time between when an ObserverReadProxyProvider becomes dereferenced and when the garage collector actually starts to reclaim that object is out of control/under-defined (unless the program is shutdown with an explicit System.exit(1)).
I also tested with a standalone Java program.
pool.allowCoreThreadTimeOut(true);
is commented out, the JVM process won't exit (noProcess finished with exit code 0
). The threaddump shows myThread-1 is still waiting for new tasks.pool.allowCoreThreadTimeOut(true);
, the JVM process exits after 10 seconds.How was this patch tested?