HDFS-17067 Use BlockingThreadPoolExecutorService for nnProbingThreadPool in ObserverReadProxy #5803

xinglin · 2023-07-03T22:39:23Z

Description of PR

In HDFS-17030, we introduced an ExecutorService, to submit getHAServiceState() requests. We constructed the ExecutorService directly from a basic ThreadPoolExecutor, without setting allowCoreThreadTimeOut to true. Then, the core thread will be kept up and running even when the main thread exits. To fix it, one could set allowCoreThreadTimeOut to true. However, in this PR, we decide to directly use an existing executorService implementation (BlockingThreadPoolExecutorService) in hadoop instead. It takes care of setting allowCoreThreadTimeOut and also allows setting the prefix for thread names.

  private final ExecutorService nnProbingThreadPool =
      new ThreadPoolExecutor(1, 4, 1L, TimeUnit.MINUTES,
          new ArrayBlockingQueue<Runnable>(1024));

A second minor issue is we did not shutdown the executorService in close(). It is a minor issue as close() will only be called when the garbage collector starts to reclaim an ObserverReadProxyProvider object, not when there is no reference to the ObserverReadProxyProvider object. The time between when an ObserverReadProxyProvider becomes dereferenced and when the garage collector actually starts to reclaim that object is out of control/under-defined (unless the program is shutdown with an explicit System.exit(1)).

I also tested with a standalone Java program.

When pool.allowCoreThreadTimeOut(true); is commented out, the JVM process won't exit (no Process finished with exit code 0). The threaddump shows myThread-1 is still waiting for new tasks.

Mon Jul 03 15:42:50 PDT 2023: Main thread started
Mon Jul 03 15:42:50 PDT 2023: task is running
Mon Jul 03 15:42:51 PDT 2023: Main thread exited

When we commented out pool.allowCoreThreadTimeOut(true);, the JVM process exits after 10 seconds.

Mon Jul 03 15:43:43 PDT 2023: Main thread started
Mon Jul 03 15:43:43 PDT 2023: task is running
Mon Jul 03 15:43:44 PDT 2023: Main thread exited

Process finished with exit code 0

import java.io.Closeable;
import java.io.IOException;
import java.util.Date;
import java.util.concurrent.ArrayBlockingQueue;
import java.util.concurrent.ThreadFactory;
import java.util.concurrent.ThreadPoolExecutor;
import java.util.concurrent.TimeUnit;
import java.util.concurrent.atomic.AtomicInteger;


public class ExecutorServiceCoreThreadIdleTimeoutTest implements Closeable {
  ExecutorServiceCoreThreadIdleTimeoutTest() {
    pool =
    new ThreadPoolExecutor(1, 4, 10, TimeUnit.SECONDS, new ArrayBlockingQueue<Runnable>(1024),
        namedThreadFactory);
     
    //pool.allowCoreThreadTimeOut(true);
  }

  ThreadFactory namedThreadFactory = new ThreadFactory() {
    private final AtomicInteger threadNumber = new AtomicInteger(1);

    @Override
    public Thread newThread(Runnable r) {
      String name = "myThread-" + threadNumber.getAndIncrement();
      return new Thread(r, name);
    }
  };

  private final ThreadPoolExecutor pool;

  public void submitTask() {

    pool.submit(() -> {
      System.out.printf("%tc: task is running\n", new Date());
    });
  }

  public static void main(String[] args) throws InterruptedException {
    System.out.printf("%tc: Main thread started\n", new Date());

    ExecutorServiceCoreThreadIdleTimeoutTest test = new ExecutorServiceCoreThreadIdleTimeoutTest();
    test.submitTask();
    Thread.sleep(1000);
    System.out.printf("%tc: Main thread exited\n", new Date());
  }

  @Override
  public void close() throws IOException {
    pool.shutdown();
    System.out.printf("%tc: shutdown thread pool\n", new Date());
  }
}

How was this patch tested?

~/p/h/t/h/hadoop-hdfs (HDFS-17067)> mvn test -Dtest="TestObserverReadProxyProvider"
[INFO] Running org.apache.hadoop.hdfs.server.namenode.ha.TestObserverReadProxyProvider
[INFO] Tests run: 16, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 7.965 s - in org.apache.hadoop.hdfs.server.namenode.ha.TestObserverReadProxyProvider
[INFO]
[INFO] Results:
[INFO]
[INFO] Tests run: 16, Failures: 0, Errors: 0, Skipped: 0

…ool in ObserverReadProxy

hadoop-yetus · 2023-07-04T00:59:58Z

💔 -1 overall

Vote	Subsystem	Runtime	Logfile	Comment
+0 🆗	reexec	0m 39s		Docker mode activated.
			_ Prechecks _
+1 💚	dupname	0m 0s		No case conflicting files found.
+0 🆗	codespell	0m 1s		codespell was not available.
+0 🆗	detsecrets	0m 1s		detect-secrets was not available.
+1 💚	@author	0m 0s		The patch does not contain any @author tags.
-1 ❌	test4tests	0m 0s		The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
			_ trunk Compile Tests _
+1 💚	mvninstall	47m 11s		trunk passed
+1 💚	compile	1m 3s		trunk passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1
+1 💚	compile	0m 59s		trunk passed with JDK Private Build-1.8.0_362-8u372-ga~~us1-0ubuntu1~~20.04-b09
+1 💚	checkstyle	0m 35s		trunk passed
+1 💚	mvnsite	1m 2s		trunk passed
+1 💚	javadoc	0m 54s		trunk passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1
+1 💚	javadoc	0m 46s		trunk passed with JDK Private Build-1.8.0_362-8u372-ga~~us1-0ubuntu1~~20.04-b09
+1 💚	spotbugs	2m 40s		trunk passed
+1 💚	shadedclient	35m 13s		branch has no errors when building and testing our client artifacts.
			_ Patch Compile Tests _
+1 💚	mvninstall	0m 53s		the patch passed
+1 💚	compile	0m 53s		the patch passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1
+1 💚	javac	0m 53s		the patch passed
+1 💚	compile	0m 46s		the patch passed with JDK Private Build-1.8.0_362-8u372-ga~~us1-0ubuntu1~~20.04-b09
+1 💚	javac	0m 46s		the patch passed
+1 💚	blanks	0m 0s		The patch has no blanks issues.
+1 💚	checkstyle	0m 24s		the patch passed
+1 💚	mvnsite	0m 49s		the patch passed
+1 💚	javadoc	0m 36s		the patch passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1
+1 💚	javadoc	0m 37s		the patch passed with JDK Private Build-1.8.0_362-8u372-ga~~us1-0ubuntu1~~20.04-b09
+1 💚	spotbugs	2m 33s		the patch passed
+1 💚	shadedclient	35m 19s		patch has no errors when building and testing our client artifacts.
			_ Other Tests _
+1 💚	unit	2m 26s		hadoop-hdfs-client in the patch passed.
+1 💚	asflicense	0m 41s		The patch does not generate ASF License warnings.
		138m 59s

Subsystem	Report/Notes
Docker	ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5803/1/artifact/out/Dockerfile
GITHUB PR	#5803
Optional Tests	dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname	Linux 2a0a727e34de 4.15.0-213-generic #224-Ubuntu SMP Mon Jun 19 13:30:12 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/bin/hadoop.sh
git revision	trunk / `6b71a65`
Default Java	Private Build-1.8.0_362-8u372-ga~~us1-0ubuntu1~~20.04-b09
Multi-JDK versions	/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_362-8u372-ga~~us1-0ubuntu1~~20.04-b09
Test Results	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5803/1/testReport/
Max. process+thread count	596 (vs. ulimit of 5500)
modules	C: hadoop-hdfs-project/hadoop-hdfs-client U: hadoop-hdfs-project/hadoop-hdfs-client
Console output	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5803/1/console
versions	git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by	Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

hadoop-yetus · 2023-07-04T01:04:50Z

💔 -1 overall

Vote	Subsystem	Runtime	Logfile	Comment
+0 🆗	reexec	0m 37s		Docker mode activated.
			_ Prechecks _
+1 💚	dupname	0m 0s		No case conflicting files found.
+0 🆗	codespell	0m 1s		codespell was not available.
+0 🆗	detsecrets	0m 1s		detect-secrets was not available.
+1 💚	@author	0m 0s		The patch does not contain any @author tags.
-1 ❌	test4tests	0m 0s		The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
			_ trunk Compile Tests _
+1 💚	mvninstall	44m 9s		trunk passed
+1 💚	compile	1m 2s		trunk passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1
+1 💚	compile	0m 59s		trunk passed with JDK Private Build-1.8.0_362-8u372-ga~~us1-0ubuntu1~~20.04-b09
+1 💚	checkstyle	0m 36s		trunk passed
+1 💚	mvnsite	1m 3s		trunk passed
+1 💚	javadoc	0m 53s		trunk passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1
+1 💚	javadoc	0m 46s		trunk passed with JDK Private Build-1.8.0_362-8u372-ga~~us1-0ubuntu1~~20.04-b09
+1 💚	spotbugs	2m 39s		trunk passed
+1 💚	shadedclient	34m 58s		branch has no errors when building and testing our client artifacts.
			_ Patch Compile Tests _
+1 💚	mvninstall	0m 52s		the patch passed
+1 💚	compile	0m 53s		the patch passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1
+1 💚	javac	0m 53s		the patch passed
+1 💚	compile	0m 47s		the patch passed with JDK Private Build-1.8.0_362-8u372-ga~~us1-0ubuntu1~~20.04-b09
+1 💚	javac	0m 47s		the patch passed
+1 💚	blanks	0m 1s		The patch has no blanks issues.
+1 💚	checkstyle	0m 23s		the patch passed
+1 💚	mvnsite	0m 51s		the patch passed
+1 💚	javadoc	0m 36s		the patch passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1
+1 💚	javadoc	0m 35s		the patch passed with JDK Private Build-1.8.0_362-8u372-ga~~us1-0ubuntu1~~20.04-b09
+1 💚	spotbugs	2m 37s		the patch passed
+1 💚	shadedclient	35m 22s		patch has no errors when building and testing our client artifacts.
			_ Other Tests _
+1 💚	unit	2m 26s		hadoop-hdfs-client in the patch passed.
+1 💚	asflicense	0m 41s		The patch does not generate ASF License warnings.
		135m 13s

Subsystem	Report/Notes
Docker	ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5803/2/artifact/out/Dockerfile
GITHUB PR	#5803
Optional Tests	dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname	Linux 5221563ce6eb 4.15.0-212-generic #223-Ubuntu SMP Tue May 23 13:09:22 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/bin/hadoop.sh
git revision	trunk / `d5cee02`
Default Java	Private Build-1.8.0_362-8u372-ga~~us1-0ubuntu1~~20.04-b09
Multi-JDK versions	/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_362-8u372-ga~~us1-0ubuntu1~~20.04-b09
Test Results	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5803/2/testReport/
Max. process+thread count	652 (vs. ulimit of 5500)
modules	C: hadoop-hdfs-project/hadoop-hdfs-client U: hadoop-hdfs-project/hadoop-hdfs-client
Console output	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5803/2/console
versions	git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by	Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

hadoop-yetus · 2023-07-04T01:05:37Z

💔 -1 overall

Vote	Subsystem	Runtime	Logfile	Comment
+0 🆗	reexec	0m 38s		Docker mode activated.
			_ Prechecks _
+1 💚	dupname	0m 0s		No case conflicting files found.
+0 🆗	codespell	0m 1s		codespell was not available.
+0 🆗	detsecrets	0m 1s		detect-secrets was not available.
+1 💚	@author	0m 0s		The patch does not contain any @author tags.
-1 ❌	test4tests	0m 0s		The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
			_ trunk Compile Tests _
+1 💚	mvninstall	45m 20s		trunk passed
+1 💚	compile	1m 1s		trunk passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1
+1 💚	compile	0m 57s		trunk passed with JDK Private Build-1.8.0_362-8u372-ga~~us1-0ubuntu1~~20.04-b09
+1 💚	checkstyle	0m 37s		trunk passed
+1 💚	mvnsite	1m 1s		trunk passed
+1 💚	javadoc	0m 52s		trunk passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1
+1 💚	javadoc	0m 46s		trunk passed with JDK Private Build-1.8.0_362-8u372-ga~~us1-0ubuntu1~~20.04-b09
+1 💚	spotbugs	2m 39s		trunk passed
+1 💚	shadedclient	35m 9s		branch has no errors when building and testing our client artifacts.
			_ Patch Compile Tests _
+1 💚	mvninstall	0m 53s		the patch passed
+1 💚	compile	0m 53s		the patch passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1
+1 💚	javac	0m 53s		the patch passed
+1 💚	compile	0m 47s		the patch passed with JDK Private Build-1.8.0_362-8u372-ga~~us1-0ubuntu1~~20.04-b09
+1 💚	javac	0m 47s		the patch passed
+1 💚	blanks	0m 0s		The patch has no blanks issues.
+1 💚	checkstyle	0m 24s		the patch passed
+1 💚	mvnsite	0m 51s		the patch passed
+1 💚	javadoc	0m 35s		the patch passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1
+1 💚	javadoc	0m 35s		the patch passed with JDK Private Build-1.8.0_362-8u372-ga~~us1-0ubuntu1~~20.04-b09
+1 💚	spotbugs	2m 38s		the patch passed
+1 💚	shadedclient	34m 41s		patch has no errors when building and testing our client artifacts.
			_ Other Tests _
+1 💚	unit	2m 28s		hadoop-hdfs-client in the patch passed.
+1 💚	asflicense	0m 41s		The patch does not generate ASF License warnings.
		135m 51s

Subsystem	Report/Notes
Docker	ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5803/3/artifact/out/Dockerfile
GITHUB PR	#5803
Optional Tests	dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname	Linux cc8c458cde8c 4.15.0-212-generic #223-Ubuntu SMP Tue May 23 13:09:22 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/bin/hadoop.sh
git revision	trunk / `d5cee02`
Default Java	Private Build-1.8.0_362-8u372-ga~~us1-0ubuntu1~~20.04-b09
Multi-JDK versions	/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_362-8u372-ga~~us1-0ubuntu1~~20.04-b09
Test Results	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5803/3/testReport/
Max. process+thread count	738 (vs. ulimit of 5500)
modules	C: hadoop-hdfs-project/hadoop-hdfs-client U: hadoop-hdfs-project/hadoop-hdfs-client
Console output	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5803/3/console
versions	git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by	Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

xinglin · 2023-07-04T03:00:50Z

Hi @goiri,

In this PR, we basically changed ThreadPoolExecutor to BlockingThreadPoolExecutorService, which comes with some default settings. I am not sure what unit test we should add here. What do you think? Can we merge in this change without adding new unit tests?

...lient/src/main/java/org/apache/hadoop/hdfs/server/namenode/ha/ObserverReadProxyProvider.java

xinglin · 2023-07-18T04:39:16Z

Hi @goiri,

Could you review this PR as well? thanks,

mccormickt12

lgtm.
Per @xinglin - This is deployed at LinkedIn and been running for approx a day without thread issue. Previous thread issue presented within 4 hours

xinglin · 2023-07-20T14:46:17Z

Thanks @mccormickt12 for reviewing and approving the PR!

@goiri, could you take a look? thanks,

xinglin · 2023-07-20T18:43:55Z

thanks @goiri for committing this PR to trunk.

…ool in ObserverReadProxy (apache#5803)

Xing Lin added 7 commits July 3, 2023 15:35

HDFS-17067 Use BlockingThreadPoolExecutorService for nnProbingThreadP…

6b71a65

…ool in ObserverReadProxy

An empty commit to trigger a build

8adebb9

An empty commit to trigger a build

7f5ed98

An empty commit to trigger a build

f056931

An empty commit to trigger a build

fbc930a

An empty commit to trigger a build

84f30f3

An empty commit to trigger a build

d5cee02

xinglin marked this pull request as ready for review July 4, 2023 02:57

mccormickt12 reviewed Jul 10, 2023

View reviewed changes

...lient/src/main/java/org/apache/hadoop/hdfs/server/namenode/ha/ObserverReadProxyProvider.java Show resolved Hide resolved

mccormickt12 approved these changes Jul 18, 2023

View reviewed changes

goiri approved these changes Jul 20, 2023

View reviewed changes

goiri merged commit 80fefd0 into apache:trunk Jul 20, 2023

xinglin added a commit to xinglin/hadoop that referenced this pull request Jul 23, 2023

HDFS-17067 Use BlockingThreadPoolExecutorService for nnProbingThreadP…

7fe378b

…ool in ObserverReadProxy (apache#5803)

jiajunmao pushed a commit to jiajunmao/hadoop-MLEC that referenced this pull request Feb 6, 2024

HDFS-17067 Use BlockingThreadPoolExecutorService for nnProbingThreadP…

1751efb

…ool in ObserverReadProxy (apache#5803)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HDFS-17067 Use BlockingThreadPoolExecutorService for nnProbingThreadPool in ObserverReadProxy #5803

HDFS-17067 Use BlockingThreadPoolExecutorService for nnProbingThreadPool in ObserverReadProxy #5803

xinglin commented Jul 3, 2023 •

edited

Loading

hadoop-yetus commented Jul 4, 2023

hadoop-yetus commented Jul 4, 2023

hadoop-yetus commented Jul 4, 2023

xinglin commented Jul 4, 2023

xinglin commented Jul 18, 2023

mccormickt12 left a comment

xinglin commented Jul 20, 2023

xinglin commented Jul 20, 2023

HDFS-17067 Use BlockingThreadPoolExecutorService for nnProbingThreadPool in ObserverReadProxy #5803

HDFS-17067 Use BlockingThreadPoolExecutorService for nnProbingThreadPool in ObserverReadProxy #5803

Conversation

xinglin commented Jul 3, 2023 • edited Loading

Description of PR

How was this patch tested?

hadoop-yetus commented Jul 4, 2023

hadoop-yetus commented Jul 4, 2023

hadoop-yetus commented Jul 4, 2023

xinglin commented Jul 4, 2023

xinglin commented Jul 18, 2023

mccormickt12 left a comment

Choose a reason for hiding this comment

xinglin commented Jul 20, 2023

xinglin commented Jul 20, 2023

xinglin commented Jul 3, 2023 •

edited

Loading