Skip to content
Merged
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -195,10 +195,10 @@ public void start() throws IOException {
public void stop() {
if (isStarted) {
try {
server.shutdown();

@smengcl smengcl Dec 10, 2021

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically this implies triggering server shutdown and readExecutors shutdown (roughly) simutaneously, rather than awaiting readExecutors shutdown first then initiate server shutdown? If so I'm +1.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @smengcl for the review. Initially I also suspected readExecutors, but it's not really the problem.

server.shutdown() triggers shutdown in the server's underlying Netty transport. The server's terminated flag is set via callbacks, server.awaitTermination() waits for this flag to be set. It seems that this is not happening if eventLoopGroup has already been shut down.

This can be reproduced easily (on current master) by changing server.awaitTermination() to few minutes instead of few seconds. Then we can create thread dump and see these threads being stuck at:

"ForkJoinPool.commonPool-worker-8" #358 daemon prio=5 os_prio=31 tid=0x00007fda0dc72800 nid=0x31d03 in Object.wait() [0x0000700026cc8000]
   java.lang.Thread.State: TIMED_WAITING (on object monitor)
	at java.lang.Object.wait(Native Method)
	at java.lang.Object.wait(Object.java:460)
	at java.util.concurrent.TimeUnit.timedWait(TimeUnit.java:348)
	at org.apache.ratis.thirdparty.io.grpc.internal.ServerImpl.awaitTermination(ServerImpl.java:319)
	- locked <0x00000007ac151340> (a java.lang.Object)
	at org.apache.hadoop.ozone.container.common.transport.server.XceiverServerGrpc.stop(XceiverServerGrpc.java:207)
	at org.apache.hadoop.ozone.container.ozoneimpl.OzoneContainer.stop(OzoneContainer.java:329)

eventLoopGroup.shutdownGracefully().sync();
readExecutors.shutdown();
readExecutors.awaitTermination(5L, TimeUnit.SECONDS);
server.shutdown();
server.awaitTermination(5, TimeUnit.SECONDS);
} catch (InterruptedException e) {
LOG.error("failed to shutdown XceiverServerGrpc", e);
Expand Down