Skip to content

udp: no SO_REUSEPORT if not explicitly configured#9495

Merged
mattklein123 merged 6 commits intoenvoyproxy:masterfrom
danzh2010:fix_reuseport
Dec 31, 2019
Merged

udp: no SO_REUSEPORT if not explicitly configured#9495
mattklein123 merged 6 commits intoenvoyproxy:masterfrom
danzh2010:fix_reuseport

Conversation

@danzh2010
Copy link
Copy Markdown
Contributor

@danzh2010 danzh2010 commented Dec 26, 2019

Only set SO_REUSEPORT socket option if config says so. If there are multiple UDP listeners and config doesn't explicitly set reuse_port to true, log a warning.

This change fixes a bazel test issue with SO_REUSEPORT. With that socket option enabled, bazel test with --run_per_test > 1 can be flaky because all the tests are created in different threads and they may pick up the same port. So the packets will be sent cross individual runs of tests. This makes some QUIC integration test flaky. I saw different runs of a test created listeners listening on the same port in the log.

Risk Level: low, not in use
Testing: fixing existing flaky tests
Part of #8794

Signed-off-by: Dan Zhang <danzh@google.com>
@danzh2010
Copy link
Copy Markdown
Contributor Author

/assign @htuch

Copy link
Copy Markdown
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for looking into this. Just to make sure I understand: when specifying port 0 with SO_REUSEPORT you can wind up w/ the same port? Is that right?

/wait

Comment on lines +158 to +159
if ((socket_type == Network::Address::SocketType::Datagram && concurrency > 1) ||
config.reuse_port()) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't this break restart cases in which the user still expects to be able to rebind? I think we probably need to make the option actually work for UDP also such that it can be disabled in config for the the tests?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does restart case work if port is 0 in config? I would suspect in the old way, we can't guarantee rebinding to the same port either. Does restarting actually depends on SO_REUSEPORT?
If that's the case, for UDP we can make it work by prioritizing reuse_port in config. If that field is set to false, RELEASE_ASSERT failure if concurrency > 1, if concurrency == 1 not set SO_REUSEPORT. In this way, a config with concurrency > 1 and reuse_port not specified or set to false will crash at loading config.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not worried about the port 0 case, just the case in which someone wants to start a new proxy with SO_REUSEPORT and then shut down the old proxy. I'm pretty sure there are people doing this, and I don't think we should break that case for UDP and concurrency 1?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How did user achieve that? SO_REUSEPORT only allows port re-bind within same process.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could be wrong, but I'm pretty sure that's not true. I think it has to be part of the same process group, same owner, or something like that.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed the logic to only set SO_REUSEPORT if reuse_port == true in config.

@danzh2010
Copy link
Copy Markdown
Contributor Author

Thanks for looking into this. Just to make sure I understand: when specifying port 0 with SO_REUSEPORT you can wind up w/ the same port? Is that right?

Yes.

@htuch htuch removed their assignment Dec 26, 2019
Signed-off-by: Dan Zhang <danzh@google.com>
@danzh2010 danzh2010 changed the title udp: no SO_REUSEPORT if concurrency == 1 udp: no SO_REUSEPORT if not explicitly configured Dec 27, 2019
Signed-off-by: Dan Zhang <danzh@google.com>
@repokitteh-read-only
Copy link
Copy Markdown

CC @envoyproxy/api-shepherds: Your approval is needed for changes made to api/.

🐱

Caused by: #9495 was synchronize by danzh2010.

see: more, trace.

@danzh2010
Copy link
Copy Markdown
Contributor Author

/assign @htuch for doc change

@repokitteh-read-only
Copy link
Copy Markdown

neither of for, doc, change can be assigned to this issue.

🐱

Caused by: a #9495 (comment) was created by @danzh2010.

see: more, trace.

@danzh2010
Copy link
Copy Markdown
Contributor Author

/assign @htuch
for doc change

@htuch
Copy link
Copy Markdown
Member

htuch commented Dec 29, 2019

/lgtm api

Copy link
Copy Markdown
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, LGTM with small comment. This is also going to need a master merge and format fix for the shadow API stuff.

/wait

Comment on lines +161 to +162
ENVOY_LOG(warn, "Listening on UDP without SO_REUSEPORT socket option may result to unstable "
"packet proxying.");
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WDYT about the following text:

Listening on UDP without SO_REUSEPORT socket option may result in unstable packet proxying. Consider configuring the reuse_port listener option.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Signed-off-by: Dan Zhang <danzh@google.com>
Signed-off-by: Dan Zhang <danzh@google.com>
Signed-off-by: Dan Zhang <danzh@google.com>
@danzh2010
Copy link
Copy Markdown
Contributor Author

/retest go_control_plane_mirror

@repokitteh-read-only
Copy link
Copy Markdown

🔨 rebuilding ci/circleci: go_control_plane_mirror (failed build)

🐱

Caused by: a #9495 (comment) was created by @danzh2010.

see: more, trace.

@danzh2010
Copy link
Copy Markdown
Contributor Author

/retest coverage

@repokitteh-read-only
Copy link
Copy Markdown

🔨 rebuilding ci/circleci: coverage (failed build)

🐱

Caused by: a #9495 (comment) was created by @danzh2010.

see: more, trace.

Copy link
Copy Markdown
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, thanks.

@mattklein123 mattklein123 merged commit 1e2a529 into envoyproxy:master Dec 31, 2019
prakhag1 pushed a commit to prakhag1/envoy that referenced this pull request Jan 3, 2020
Signed-off-by: Dan Zhang <danzh@google.com>
Signed-off-by: Prakhar <prakhar_au@yahoo.com>
wrowe pushed a commit that referenced this pull request Dec 14, 2021
SO_REUSEPORT socket option can lead bazel test to use the same port across parallel jobs. 

This causes test flakiness in http_quic_integration_test with --runs_per_test > 1. 

To mitigate this issue, the integration test should run with enable_reuse_port explicitly false unless it needs concurrency_ > 1. 

For those tests with concurrency_ > 1, running with --jobs=1 would eliminate the flakiness.

Additional Description: We used to disable SO_REUSEPORT for tests whose concurrency_ = 1 in #9495. However this was undo in #17259.

Signed-off-by: Dan Zhang <danzh@google.com>

Co-authored-by: Dan Zhang <danzh@google.com>
joshperry pushed a commit to joshperry/envoy that referenced this pull request Feb 13, 2022
SO_REUSEPORT socket option can lead bazel test to use the same port across parallel jobs. 

This causes test flakiness in http_quic_integration_test with --runs_per_test > 1. 

To mitigate this issue, the integration test should run with enable_reuse_port explicitly false unless it needs concurrency_ > 1. 

For those tests with concurrency_ > 1, running with --jobs=1 would eliminate the flakiness.

Additional Description: We used to disable SO_REUSEPORT for tests whose concurrency_ = 1 in envoyproxy#9495. However this was undo in envoyproxy#17259.

Signed-off-by: Dan Zhang <danzh@google.com>

Co-authored-by: Dan Zhang <danzh@google.com>
Signed-off-by: Josh Perry <josh.perry@mx.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants