Skip to content

Conversation

@ChenSammi
Copy link
Contributor

Copy link
Contributor

@jojochuang jojochuang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for posting the PR. I did realize this is a potential problem but never got the chance to implement a fix.

@ChenSammi ChenSammi requested a review from jojochuang August 9, 2021 02:21
Copy link
Contributor

@jojochuang jojochuang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't like how the test is written but as long as it does fail when there's an error, I am fine.

@ChenSammi
Copy link
Contributor Author

i don't like how the test is written but as long as it does fail when there's an error, I am fine.

The test tries to verify that when epoll is not available, the eventLoopGroup still can be created. Previously this part of logic is handled by grpc itself. Just want to make sure it doesn't break anything.

@ChenSammi
Copy link
Contributor Author

Hey @elek , would you help to take another look of this patch ?

Copy link
Contributor

@adoroszlai adoroszlai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ChenSammi for the patch. I only have a question and a minor code suggestion.

Comment on lines +112 to +115
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please explain why setting channel type and event loop group are needed? Based on NettyServerBuilder#channelType javadoc this is the same as the default behavior:

Specifies the channel type to use, by default we use {@code EpollServerSocketChannel} if
available, otherwise using {@link NioServerSocketChannel}.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adoroszlai , thanks for the code review. It's because EpollServerSocketChannel is not supported on Mac. I believe most of us use Mac for code develoepment. I'd like to make sure replication related unit tests will not fail on Mac book. You can find that there is an channelType identify process ahead.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can consider adding KQueueSocketChannel support for Mac later.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is this different from EpollServerSocketChannel?
If we use domain socket do we bind to a domain socket instead like the example in here? https://stackoverflow.com/questions/48354562/binding-a-java-grpc-server-to-a-unix-domain-socket/48358536#48358536

@adoroszlai
Copy link
Contributor

@ChenSammi it seems unit test failures are related to this change, the same failures repeated twice:

Error:    TestDatanodeStateMachine.testDatanodeStateContext:262 » Timeout Timed out wait...
Error:    TestStorageVolumeChecker.testVolumeDeletion:231 » OutOfMemory unable to create...
Error:    TestDatanodeUpgradeToScmHA.testReadsDuringFinalization:138->addContainer:612->dispatchRequest:672->dispatchRequest:679 » NullPointer

https://github.com/apache/ozone/runs/3933857760?check_suite_focus=true#step:4:1471
https://github.com/apache/ozone/runs/3967634877?check_suite_focus=true#step:4:1461

@adoroszlai
Copy link
Contributor

This one turned out to be unrelated, fixed by #2806:

Error:    TestStorageVolumeChecker.testVolumeDeletion:231 » OutOfMemory unable to create...

But these two are still there:

Error:    TestDatanodeStateMachine.testDatanodeStateContext:262 » Timeout Timed out wait...
Error:    TestDatanodeUpgradeToScmHA.testReadsDuringFinalization:138->addContainer:612->dispatchRequest:672->dispatchRequest:679 » NullPointer

https://github.com/apache/ozone/runs/4139083271#step:4:1409

@ChenSammi
Copy link
Contributor Author

This one turned out to be unrelated, fixed by #2806:

Error:    TestStorageVolumeChecker.testVolumeDeletion:231 » OutOfMemory unable to create...

But these two are still there:

Error:    TestDatanodeStateMachine.testDatanodeStateContext:262 » Timeout Timed out wait...
Error:    TestDatanodeUpgradeToScmHA.testReadsDuringFinalization:138->addContainer:612->dispatchRequest:672->dispatchRequest:679 » NullPointer

https://github.com/apache/ozone/runs/4139083271#step:4:1409

Thanks @adoroszlai , will look at it.

Class<? extends ServerChannel> channelType;

if (Epoll.isAvailable()) {
eventLoopGroup = new EpollEventLoopGroup(poolSize / 4);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please explain how 4 is derived? (Magic number?)

Comment on lines +157 to +160
eventLoopGroup.shutdownGracefully().sync();
executor.shutdown();
executor.awaitTermination(5L, TimeUnit.SECONDS);
server.shutdown().awaitTermination(5L, TimeUnit.SECONDS);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Event loop group should be shutdown last (#2900 fixed similar problem in XceiverServerGrpc, introduced in #2824):

Suggested change
eventLoopGroup.shutdownGracefully().sync();
executor.shutdown();
executor.awaitTermination(5L, TimeUnit.SECONDS);
server.shutdown().awaitTermination(5L, TimeUnit.SECONDS);
executor.shutdown();
executor.awaitTermination(5L, TimeUnit.SECONDS);
server.shutdown().awaitTermination(5L, TimeUnit.SECONDS);
eventLoopGroup.shutdownGracefully().sync();

@adoroszlai
Copy link
Contributor

/pending

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Marking this issue as un-mergeable as requested.

Please use /ready comment when it's resolved.

Please note that the PR will be closed after 21 days of inactivity from now. (But can be re-opened anytime later...)

/pending

@github-actions
Copy link

Thank you very much for the patch. I am closing this PR temporarily as there was no activity recently and it is waiting for response from its author.

It doesn't mean that this PR is not important or ignored: feel free to reopen the PR at any time.

It only means that attention of committers is not required. We prefer to keep the review queue clean. This ensures PRs in need of review are more visible, which results in faster feedback for all PRs.

If you need ANY help to finish this PR, please contact the community on the mailing list or the slack channel."

@github-actions github-actions bot closed this Mar 31, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants