Skip to content

listener: fix sharedSocket#17090

Merged
mattklein123 merged 9 commits intoenvoyproxy:mainfrom
lambdai:tcpbindtoport
Jul 7, 2021
Merged

listener: fix sharedSocket#17090
mattklein123 merged 9 commits intoenvoyproxy:mainfrom
lambdai:tcpbindtoport

Conversation

@lambdai
Copy link
Copy Markdown
Contributor

@lambdai lambdai commented Jun 22, 2021

Commit Message:
The background is that a tcp listener which doesn't bind to port does not own io_handle. There is a code path that using ListenSocketFactory::sharedSocket() to obtain the socket and call io handle without checking if io handle is needed.

SharedSocket now returns nullopt when the tcp listener does not bind_to_port even if reuse_port set to true.

Additional Description:
Risk Level: LOW
Testing: added test that crashes on existing code. Also fixed a ListenerManagerImplTest
Docs Changes:
Release Notes:
Platform Specific Features:
[Optional Runtime guard:]
[Optional Fixes #Issue]
[Optional Deprecated:]
[Optional API Considerations:]

lambdai added 4 commits June 21, 2021 15:45
Signed-off-by: Yuchen Dai <silentdai@gmail.com>
Signed-off-by: Yuchen Dai <silentdai@gmail.com>
Signed-off-by: Yuchen Dai <silentdai@gmail.com>
Signed-off-by: Yuchen Dai <silentdai@gmail.com>
Network::SocketOptRef sharedSocket() const override {
// If a tcp listener socket doesn't bind to port, there is no listen socket so there is no
// shared socket.
if (socketType() == Network::Socket::Type::Stream && !bind_to_port_) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this apply only to Stream sockets or should it also apply to other socket types?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good question!
I think the sharedSocket() matters only if my imagined bind_to_port is on.
It seems udp listener's bind_to_port has a different semantic
@danzh2010 Could you add some comment on the bind_to_port usage in udp/quic listener?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regardless the semantic of bind_to_port in UDP world, the implementation of UDP on bind_to_port = false is that a socket is created so sharedSocket() must returns it.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the answer to the original question:
As per the current impl of udp listener, sharedSocket() should return it as in this PR.

However, if udp listener turns to not create socket on bind_to_port==false, this sharedSocket() need to align with that change.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to have the behaviors in sync. bind_to_port == false doesn't make much sense to me for UDP, I wonder if it should be disallowed.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That being said, I'm wondering if returning nullopt in sharedSocket() based on bind_to_port is the right approach? According to the contract sharedSocket() returns nullopt when the sockets among worker threads are not shared, so it is called in ListenerManagerImpl::drainListener() to decide whether to close shared socket or not. This PR seems to change that contract. Should we also change the corresponding logic of closing socket?

I agree this PR is somewhat changing the contract of sharedSocket(). Essentially we have two approaches, one is to provide a dedicate socket impl for listener doesn't bind to port. That specialized socket can be returned by sharedSocket(). The other approach in this PR is not to allow sharedSocket() from the beginning. Close is not an issue as per implementation: no underlying BSD socket is created so we are not leaking fd.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I mentioned elsewhere, I'm currently working on simplifying all of this code, so I would prefer to not make major changes here which I'm going to undo anyway. Per my above comment, why can't we just check if io_handle_ is null in various places?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to not make major changes here which I'm going to undo anyway.

I feel this PR has two part: sharedSocket() and the the ListenerSocketImpl.

Your changes will delete sharedSocket(). However, the latter ListenerSocketImpl is needed. The reason is that your commits seems create number of concurrency listen socket regardless it's bind_to_port, for simplicity. I don't know how is that listenersocket closed correctly... I could be wrong though.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your changes will delete sharedSocket(). However, the latter ListenerSocketImpl is needed. The reason is that your commits seems create number of concurrency listen socket regardless it's bind_to_port, for simplicity. I don't know how is that listenersocket closed correctly... I could be wrong though.

Sorry I can't parse this. We can review my other change when I post it.

Copy link
Copy Markdown
Contributor Author

@lambdai lambdai Jul 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, i had the wrong impression of udp listen socket always need a not-nullptr io_handle even when bind_to_port is false. I discovered this "fact" and I now incline to believe the fact is b/c a test case is improperly using listener socket.

@lambdai
Copy link
Copy Markdown
Contributor Author

lambdai commented Jun 24, 2021

/retest

@repokitteh-read-only
Copy link
Copy Markdown

Retrying Azure Pipelines:
Retried failed jobs in: envoy-presubmit

🐱

Caused by: a #17090 (comment) was created by @lambdai.

see: more, trace.

Signed-off-by: Yuchen Dai <silentdai@gmail.com>
@lambdai
Copy link
Copy Markdown
Contributor Author

lambdai commented Jun 25, 2021

Do you have any idea why the CI is not completed in time? @ggreenway
Also, what are the changes needed other than CI?

@mattklein123
Copy link
Copy Markdown
Member

I haven't reviewed this PR yet, but if it helps you in any way, I'm in the process of getting rid of the sharedSocket() function entirely. You can see my WIP branch here (this is about changing the default of reuse_port as well as making almost all listener errors happen on the main thread vs. workers which vastly simplifies error handling): 22361bc

@mattklein123 mattklein123 self-assigned this Jun 28, 2021
@lambdai
Copy link
Copy Markdown
Contributor Author

lambdai commented Jun 30, 2021

I haven't reviewed this PR yet, but if it helps you in any way, I'm in the process of getting rid of the sharedSocket() function entirely. You can see my WIP branch here (this is about changing the default of reuse_port as well as making almost all listener errors happen on the main thread vs. workers which vastly simplifies error handling): 22361bc

+1 for killing sharedSocket() or mutate reuse_port. I recall the changes (was a PR IIRC).
The blocker to me is not the sharedSocket() nor the default value. It's the crash at the rare combination listed in the added test case in this PR.
Can we merge this to fix the crash?

Signed-off-by: Yuchen Dai <silentdai@gmail.com>
Copy link
Copy Markdown
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/wait

Network::SocketOptRef sharedSocket() const override {
// If a tcp listener socket doesn't bind to port, there is no listen socket so there is no
// shared socket.
if (socketType() == Network::Socket::Type::Stream && !bind_to_port_) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I mentioned elsewhere, I'm currently working on simplifying all of this code, so I would prefer to not make major changes here which I'm going to undo anyway. Per my above comment, why can't we just check if io_handle_ is null in various places?

Signed-off-by: Yuchen Dai <silentdai@gmail.com>
Signed-off-by: Yuchen Dai <silentdai@gmail.com>

void sendCHLO(quic::QuicConnectionId connection_id) {
client_sockets_.push_back(std::make_unique<Socket>(local_address_, nullptr, /*bind*/ false));
client_sockets_.push_back(std::make_unique<Network::SocketImpl>(Network::Socket::Type::Datagram,
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mattklein123 With this change, the io_handle == nullptr iff bind_to_port for listensocket, both tcp and udp.

Now UDP and TCP listener behavior is now unified including what you pointed out

Copy link
Copy Markdown
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks this makes sense to me with small comment.

/wait

Comment on lines +83 to +84
// These four overrides are introduced to perform check. A null io handle is possible only if the
// io handle is listener side.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does "is listener side" mean? Clarify?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to

A null io handle is possible only if the the owner socket is a listen socket that does not bind to port.

Is it better?

return *io_handle_;
}
void close() override {
if (io_handle_ != nullptr) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In working through my other change, I'm hitting a similar issue on setSocketOption (now that reuse port is the default). I think there is a similar bug there also. It makes me think we are going about the no bind sockets incorrectly in general if we need all these guards.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(i.e. why do no bind listeners even have sockets. I realize this is a much larger change and I'm trying to get through my current mess first, but this seems like something we might want to fix in the future.)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SetSocketOption should check bind_to_port w/ or w/o PR. At least ListenSocketImpl::setupSocket has a bind_to_port arg.

why do no bind listeners even have sockets

Good point! I was thinking about create a derived not-bind-listen-socket and you think deeper

Signed-off-by: Yuchen Dai <silentdai@gmail.com>
Copy link
Copy Markdown
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks this makes sense. I will merge this change into my other branch which will take a lot longer to review.

@mattklein123 mattklein123 merged commit b2cd50b into envoyproxy:main Jul 7, 2021
leyao-daily pushed a commit to leyao-daily/envoy that referenced this pull request Sep 30, 2021
Signed-off-by: Yuchen Dai <silentdai@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants