udp: no SO_REUSEPORT if not explicitly configured#9495
udp: no SO_REUSEPORT if not explicitly configured#9495mattklein123 merged 6 commits intoenvoyproxy:masterfrom
Conversation
Signed-off-by: Dan Zhang <danzh@google.com>
b3052a3 to
d504d0f
Compare
|
/assign @htuch |
mattklein123
left a comment
There was a problem hiding this comment.
Thanks for looking into this. Just to make sure I understand: when specifying port 0 with SO_REUSEPORT you can wind up w/ the same port? Is that right?
/wait
source/server/listener_impl.cc
Outdated
| if ((socket_type == Network::Address::SocketType::Datagram && concurrency > 1) || | ||
| config.reuse_port()) { |
There was a problem hiding this comment.
Won't this break restart cases in which the user still expects to be able to rebind? I think we probably need to make the option actually work for UDP also such that it can be disabled in config for the the tests?
There was a problem hiding this comment.
How does restart case work if port is 0 in config? I would suspect in the old way, we can't guarantee rebinding to the same port either. Does restarting actually depends on SO_REUSEPORT?
If that's the case, for UDP we can make it work by prioritizing reuse_port in config. If that field is set to false, RELEASE_ASSERT failure if concurrency > 1, if concurrency == 1 not set SO_REUSEPORT. In this way, a config with concurrency > 1 and reuse_port not specified or set to false will crash at loading config.
There was a problem hiding this comment.
I'm not worried about the port 0 case, just the case in which someone wants to start a new proxy with SO_REUSEPORT and then shut down the old proxy. I'm pretty sure there are people doing this, and I don't think we should break that case for UDP and concurrency 1?
There was a problem hiding this comment.
How did user achieve that? SO_REUSEPORT only allows port re-bind within same process.
There was a problem hiding this comment.
I could be wrong, but I'm pretty sure that's not true. I think it has to be part of the same process group, same owner, or something like that.
There was a problem hiding this comment.
I changed the logic to only set SO_REUSEPORT if reuse_port == true in config.
Yes. |
Signed-off-by: Dan Zhang <danzh@google.com>
Signed-off-by: Dan Zhang <danzh@google.com>
|
/assign @htuch for doc change |
|
neither of for, doc, change can be assigned to this issue. |
|
/assign @htuch |
|
/lgtm api |
mattklein123
left a comment
There was a problem hiding this comment.
Thanks, LGTM with small comment. This is also going to need a master merge and format fix for the shadow API stuff.
/wait
source/server/listener_impl.cc
Outdated
| ENVOY_LOG(warn, "Listening on UDP without SO_REUSEPORT socket option may result to unstable " | ||
| "packet proxying."); |
There was a problem hiding this comment.
WDYT about the following text:
Listening on UDP without SO_REUSEPORT socket option may result in unstable packet proxying. Consider configuring the reuse_port listener option.
Signed-off-by: Dan Zhang <danzh@google.com>
Signed-off-by: Dan Zhang <danzh@google.com>
|
/retest go_control_plane_mirror |
|
🔨 rebuilding |
|
/retest coverage |
|
🔨 rebuilding |
Signed-off-by: Dan Zhang <danzh@google.com> Signed-off-by: Prakhar <prakhar_au@yahoo.com>
SO_REUSEPORT socket option can lead bazel test to use the same port across parallel jobs. This causes test flakiness in http_quic_integration_test with --runs_per_test > 1. To mitigate this issue, the integration test should run with enable_reuse_port explicitly false unless it needs concurrency_ > 1. For those tests with concurrency_ > 1, running with --jobs=1 would eliminate the flakiness. Additional Description: We used to disable SO_REUSEPORT for tests whose concurrency_ = 1 in #9495. However this was undo in #17259. Signed-off-by: Dan Zhang <danzh@google.com> Co-authored-by: Dan Zhang <danzh@google.com>
SO_REUSEPORT socket option can lead bazel test to use the same port across parallel jobs. This causes test flakiness in http_quic_integration_test with --runs_per_test > 1. To mitigate this issue, the integration test should run with enable_reuse_port explicitly false unless it needs concurrency_ > 1. For those tests with concurrency_ > 1, running with --jobs=1 would eliminate the flakiness. Additional Description: We used to disable SO_REUSEPORT for tests whose concurrency_ = 1 in envoyproxy#9495. However this was undo in envoyproxy#17259. Signed-off-by: Dan Zhang <danzh@google.com> Co-authored-by: Dan Zhang <danzh@google.com> Signed-off-by: Josh Perry <josh.perry@mx.com>
Only set SO_REUSEPORT socket option if config says so. If there are multiple UDP listeners and config doesn't explicitly set reuse_port to true, log a warning.
This change fixes a bazel test issue with SO_REUSEPORT. With that socket option enabled, bazel test with --run_per_test > 1 can be flaky because all the tests are created in different threads and they may pick up the same port. So the packets will be sent cross individual runs of tests. This makes some QUIC integration test flaky. I saw different runs of a test created listeners listening on the same port in the log.
Risk Level: low, not in use
Testing: fixing existing flaky tests
Part of #8794