listener: make reuse port the default#17259
Conversation
1) Deprecate existing reuse_port field 2) Add new enable_reuse_port field which uses a WKT 3) Make the new default hot restart aware so the default is not changed during hot restart. 4) Allow the default to be reverted using the "envoy.reloadable_features.listener_reuse_port_default_enabled" feature flag. 5) Change listener init so that almost all error handling occurs on the main thread. This a) vastly simplifies error handling and b) makes it so that we pre-create all sockets on the main thread and can use them all during hot restart. 6) Change hot restart to pass reuse port sockets by socket/worker index. This works around a race condition in which a draining listener has a new connection on its accept queue, but it's never accepted by the old process worker. It will be dropped. By passing all sockets (even reuse port sockets) we make sure the accept queue is fully processed. Fixes #15794 Risk Level: High, scary stuff involving hot restart and listener init Testing: New and existing tests. It was very hard to get the tests to pass which gives me more confidence. Docs Changes: N/A Release Notes: Added Platform Specific Features: N/A Signed-off-by: Matt Klein <mklein@lyft.com>
|
CC @envoyproxy/api-shepherds: Your approval is needed for changes made to |
|
@ggreenway @lambdai scary change, but ultimately I think cleans up a lot of stuff. I could potentially split out the listener init changes from the reuseport part of the change, but I'm going to be honest, it took me so long to get the tests passing that I would prefer not to as I will have to redo listener_manager_impl_test which will take a long time. Let's see what we think during review. |
|
cc @danzh2010 for QUIC/UDP changes. |
ggreenway
left a comment
There was a problem hiding this comment.
Out of time for today; I'll continue tomorrow, but posting this here for now. (Note to self: got through listener_manager_impl.cc)
This is a huge improvement! I really like moving all this error handling to the main thread.
I'm curious how this will affect QUIC during hot-restart. Hot-restart doesn't work with QUIC right now AFAIK, but I'm curious if this will make it better, worse, or just different. I think in the old behavior, the child would create all new listeners, and that would probably result in the BPF routing all packets to the parent listeners (because in the kernel, they were created earlier, and we mod by the number of envoy workers). If we duplicate the FDs and send them to the child, probably half the packets will go the parent, and half to the child, but not based on the connection/tuple, just randomly. Not sure if this matters or not.
Have you tested hot-restart between the previous code and this? I think it'll work, but a manual test of that would be good.
/wait
Signed-off-by: Matt Klein <mklein@lyft.com>
|
Bringing some offline conversations to the forefront, it appears that on OSX, REUSE_PORT does not do what it does on Linux. It is a "last socket wins" option, such that the last socket that binds gets all the connections. This means that for Envoy purposes, enable_reuse_port does not work on OSX and should probably be blocked. The issue with blocking is this is going to cause the tests (especially listener_manager_impl_test) to get really messy. I will see how messy it will be to just block this on OSX and change the default to false. |
Signed-off-by: Matt Klein <mklein@lyft.com>
Signed-off-by: Matt Klein <mklein@lyft.com>
danzh2010
left a comment
There was a problem hiding this comment.
Sorry for the late review. This is a huge PR and big behavior change. Does it worth to document how listen socket is created and sometimes duplicated and closed. And applying socket option and listen() before starting workers is a great idea!
Sure I can do that. Do you have a location in mind or should I wrote a markdown file in https://github.com/envoyproxy/envoy/tree/main/source/docs? |
I tried the following hot restarts on our latest fleet: old -> new Nothing obvious breaks 🚀 . |
That sounds to be a reasonable location. |
|
@danzh2010 updated and I also added the MD file. LMK what you think. |
rgs1
left a comment
There was a problem hiding this comment.
[ not reviewing the code although I did gloss over it to make sure switching concurrency level over hot restarts wouldn't be a problem – it isn't ]
We've been running this in prod and have done lots of hot restarts, all good thus far.
ggreenway
left a comment
There was a problem hiding this comment.
This is great! Just a few doc/comment issues. I think the code is good to go.
Might be slightly clearer to have "reuse_port" instead of "reuse port" in comments/docs, but up to you if you want to make that change.
Are there any docs about what happens if concurrency is changed during hot-restart? My guess is that if concurrency is increased, everything works fine (more queues are created, all previously existing queues are reused). If concurrency is reduced, connections may be dropped.
Signed-off-by: Matt Klein <mklein@lyft.com>
|
@ggreenway updated, thanks for the reviews. |
|
cc: @fishcakez who found an apparent case where the default isn't kicking in after hot restart. |
1) Deprecate existing reuse_port field 2) Add new enable_reuse_port field which uses a WKT 3) Make the new default hot restart aware so the default is not changed during hot restart. 4) Allow the default to be reverted using the "envoy.reloadable_features.listener_reuse_port_default_enabled" feature flag. 5) Change listener init so that almost all error handling occurs on the main thread. This a) vastly simplifies error handling and b) makes it so that we pre-create all sockets on the main thread and can use them all during hot restart. 6) Change hot restart to pass reuse port sockets by socket/worker index. This works around a race condition in which a draining listener has a new connection on its accept queue, but it's never accepted by the old process worker. It will be dropped. By passing all sockets (even reuse port sockets) we make sure the accept queue is fully processed. Fixes envoyproxy#15794 Risk Level: High, scary stuff involving hot restart and listener init Testing: New and existing tests. It was very hard to get the tests to pass which gives me more confidence. Docs Changes: N/A Release Notes: Added Platform Specific Features: N/A Signed-off-by: Matt Klein <mklein@lyft.com>
SO_REUSEPORT socket option can lead bazel test to use the same port across parallel jobs. This causes test flakiness in http_quic_integration_test with --runs_per_test > 1. To mitigate this issue, the integration test should run with enable_reuse_port explicitly false unless it needs concurrency_ > 1. For those tests with concurrency_ > 1, running with --jobs=1 would eliminate the flakiness. Additional Description: We used to disable SO_REUSEPORT for tests whose concurrency_ = 1 in #9495. However this was undo in #17259. Signed-off-by: Dan Zhang <danzh@google.com> Co-authored-by: Dan Zhang <danzh@google.com>
SO_REUSEPORT socket option can lead bazel test to use the same port across parallel jobs. This causes test flakiness in http_quic_integration_test with --runs_per_test > 1. To mitigate this issue, the integration test should run with enable_reuse_port explicitly false unless it needs concurrency_ > 1. For those tests with concurrency_ > 1, running with --jobs=1 would eliminate the flakiness. Additional Description: We used to disable SO_REUSEPORT for tests whose concurrency_ = 1 in envoyproxy#9495. However this was undo in envoyproxy#17259. Signed-off-by: Dan Zhang <danzh@google.com> Co-authored-by: Dan Zhang <danzh@google.com> Signed-off-by: Josh Perry <josh.perry@mx.com>
not changed during hot restart.
"envoy.reloadable_features.listener_reuse_port_default_enabled"
feature flag.
the main thread. This a) vastly simplifies error handling and
b) makes it so that we pre-create all sockets on the main thread
and can use them all during hot restart.
index. This works around a race condition in which a draining
listener has a new connection on its accept queue, but it's
never accepted by the old process worker. It will be dropped.
By passing all sockets (even reuse port sockets) we make sure
the accept queue is fully processed.
Fixes #15794
Risk Level: High, scary stuff involving hot restart and listener init
Testing: New and existing tests. It was very hard to get the tests to pass which gives me more confidence.
Docs Changes: N/A
Release Notes: Added
Platform Specific Features: N/A