-
Notifications
You must be signed in to change notification settings - Fork 590
HDDS-5537. Limit grpc threads in ReplicationServer. #2498
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
jojochuang
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for posting the PR. I did realize this is a potential problem but never got the chance to implement a fix.
...rvice/src/test/java/org/apache/hadoop/ozone/container/replication/TestReplicationServer.java
Outdated
Show resolved
Hide resolved
...rvice/src/test/java/org/apache/hadoop/ozone/container/replication/TestReplicationServer.java
Outdated
Show resolved
Hide resolved
jojochuang
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i don't like how the test is written but as long as it does fail when there's an error, I am fine.
The test tries to verify that when epoll is not available, the eventLoopGroup still can be created. Previously this part of logic is handled by grpc itself. Just want to make sure it doesn't break anything. |
|
Hey @elek , would you help to take another look of this patch ? |
adoroszlai
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @ChenSammi for the patch. I only have a question and a minor code suggestion.
...r-service/src/main/java/org/apache/hadoop/ozone/container/replication/ReplicationServer.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please explain why setting channel type and event loop group are needed? Based on NettyServerBuilder#channelType javadoc this is the same as the default behavior:
Specifies the channel type to use, by default we use {@code EpollServerSocketChannel} if
available, otherwise using {@link NioServerSocketChannel}.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@adoroszlai , thanks for the code review. It's because EpollServerSocketChannel is not supported on Mac. I believe most of us use Mac for code develoepment. I'd like to make sure replication related unit tests will not fail on Mac book. You can find that there is an channelType identify process ahead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can consider adding KQueueSocketChannel support for Mac later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is this different from EpollServerSocketChannel?
If we use domain socket do we bind to a domain socket instead like the example in here? https://stackoverflow.com/questions/48354562/binding-a-java-grpc-server-to-a-unix-domain-socket/48358536#48358536
|
@ChenSammi it seems unit test failures are related to this change, the same failures repeated twice: https://github.com/apache/ozone/runs/3933857760?check_suite_focus=true#step:4:1471 |
|
This one turned out to be unrelated, fixed by #2806:
But these two are still there:
https://github.com/apache/ozone/runs/4139083271#step:4:1409 |
Thanks @adoroszlai , will look at it. |
…ozone/container/replication/ReplicationServer.java Co-authored-by: Doroszlai, Attila <[email protected]>
| Class<? extends ServerChannel> channelType; | ||
|
|
||
| if (Epoll.isAvailable()) { | ||
| eventLoopGroup = new EpollEventLoopGroup(poolSize / 4); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please explain how 4 is derived? (Magic number?)
| eventLoopGroup.shutdownGracefully().sync(); | ||
| executor.shutdown(); | ||
| executor.awaitTermination(5L, TimeUnit.SECONDS); | ||
| server.shutdown().awaitTermination(5L, TimeUnit.SECONDS); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Event loop group should be shutdown last (#2900 fixed similar problem in XceiverServerGrpc, introduced in #2824):
| eventLoopGroup.shutdownGracefully().sync(); | |
| executor.shutdown(); | |
| executor.awaitTermination(5L, TimeUnit.SECONDS); | |
| server.shutdown().awaitTermination(5L, TimeUnit.SECONDS); | |
| executor.shutdown(); | |
| executor.awaitTermination(5L, TimeUnit.SECONDS); | |
| server.shutdown().awaitTermination(5L, TimeUnit.SECONDS); | |
| eventLoopGroup.shutdownGracefully().sync(); |
...c/main/java/org/apache/hadoop/ozone/container/common/statemachine/DatanodeConfiguration.java
Outdated
Show resolved
Hide resolved
|
/pending |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Marking this issue as un-mergeable as requested.
Please use /ready comment when it's resolved.
Please note that the PR will be closed after 21 days of inactivity from now. (But can be re-opened anytime later...)
/pending
|
Thank you very much for the patch. I am closing this PR temporarily as there was no activity recently and it is waiting for response from its author. It doesn't mean that this PR is not important or ignored: feel free to reopen the PR at any time. It only means that attention of committers is not required. We prefer to keep the review queue clean. This ensures PRs in need of review are more visible, which results in faster feedback for all PRs. If you need ANY help to finish this PR, please contact the community on the mailing list or the slack channel." |
https://issues.apache.org/jira/browse/HDDS-5537