listen_socket: Support multiple options and allow socket options to be set before bind().#2734
Conversation
ggreenway
left a comment
There was a problem hiding this comment.
Overall I think the code looks good.
There was a problem hiding this comment.
What happens if the listener configuration has changed, but in such a way that it wants to modify a listener option that must be set before binding? Should an error be raised in that case? Close/unbind the listener temporarily so the option can be changed?
There was a problem hiding this comment.
I personally think that pre-bind options shouldn't be allowed to change on a listener, because that would require the socket to be closed and reopened. But I don't see a way to guarantee options aren't changed with this design.
There was a problem hiding this comment.
Closing risks dropping incoming connections during a listener reconfigure. Maybe we could use SO_REUSEPORT option to allow multiple listening sockets on the same address, and create a new socket, apply pre-bind options, bind, listen, and then close the old socket?
There was a problem hiding this comment.
@jrajahalme I'm not clear whether there would be problems when those multiple sockets have different options, e.g. one socket has SOL_IP,IP_TRANSPARENT and another not?
Anyway, my vote would be to go with your design as a purely internal implementation, but not expose the bound flag directly into the LDS API. And any feature that sets pre-bind options should expose specific fields in LDS, and check that those fields are not modified when updating a listener. We already do that for bind addresses, and I proposed to do that for the transparent field in #2719.
What do you think?
There was a problem hiding this comment.
I would prefer to avoid the complexity around SO_REUSEPORT if possible. @rlenglet what are you proposing? That we drop this change and require any pre-bind options to have explicit settings?
There was a problem hiding this comment.
I don't have a strong opinion about the internal implementation. It could be explicit settings like I did in #2719, or it could be this implementation.
However, in any case, I think we shouldn't expose the bound flag as-is in LDS, since it would make it harder to control whether an option is modified when a listener is modified.
We need to be able to compare the old and new configs of listeners to prevent changing listen socket options, like in https://github.com/envoyproxy/envoy/pull/2719/files#diff-cb0af0ff1b04bcf9f82748317d3b98e3R345.
There was a problem hiding this comment.
@rlenglet what do you mean by:
However, in any case, I think we shouldn't expose the bound flag as-is in LDS, since it would make it harder to control whether an option is modified when a listener is modified.
Do you mean in the API for filters setOptions call don't pass a bound flag? If we don't do that I'm not sure how this change could work as-is? Sorry I'm confused.
So far the listen socket options have been set in the internal `setSocket()` function after the socket has already been bound. This has the benefit of (re)setting the options whenever a listener configuration is changed, which may involve a change in listener filter configurarion. Some socket options, however, need to be set before the bind() call for them to have the desired effect. This patch adds a boolean parameter to the setOptions() callback, telling the options implementation whether the socket has already been bound or not. The setOptions() callback is called on the listen socket once with `bound=false` when the socket is first created right before the bind call. Then, when whenever the listen socket is "set" the setOptions() callback is called as before, but now typicallyl with `bound=true`. This latter call will be done again whenever the same socket is re-used for a new or modified listener. For non-bound listen sockets, the setOptions() is always called with `bound=false`. Client connections will have the setOptions() called before the bind (like before) and with `bound=false`. Client sockets are not recycled like the listen sockets, so they will not have the setOptions() called again after the bind() call. It should be noted that setting any options via the setOptions() callback functionality discussed here still requires the options object be set by a listener filter. While it would be possible to create a listener filter for the sole purpose of setting socket options, no such filter exist yet. Signed-off-by: Jarno Rajahalme <jarno@covalent.io>
Keep a list of options so that multiple options can be set on the same sockets, maybe by different listener filters. TODO: Test with multiple options & verify hashKey() is properly handled as well. Signed-off-by: Jarno Rajahalme <jarno@covalent.io>
eb71c3b to
08705d1
Compare
|
Sorry for the rebase, had to change the approach a bit. |
|
@ggreenway Added support for multiple options. |
|
Updated the title and description to match the new approach. |
include/envoy/server/filter_config.h
Outdated
|
@jrajahalme I don't think this design requires adding a listener filter to set an option. |
ggreenway
left a comment
There was a problem hiding this comment.
I like the direction this is going; I think this is a good step towards more flexibility in setting whatever options we end up needing to set.
There was a problem hiding this comment.
I think there are 3 points at which we may/will want to set socket options: before bind, after bind but before listen, after listen.
I think this should be an enum instead of bool to handle this better. For this PR, you can just add the 2 enum values that are implemented so far, but I think we'll want to be able to add more as we go.
There was a problem hiding this comment.
Do you have an example of options that only make sense after listen? Also, do you think some options only make sense after bind() is done?
The current sets options only once per configuration event, and let the options know if a bind() is going to happen later or not. The options visitors will not be called again right after bind(), but will be called again if the socket is reused on a new or reconfigured listener. In that case bind() never follows. Currently options will never be set between bind() and listen().
There was a problem hiding this comment.
Also, some listeners do not bind. For those before/after bind() makes no sense, but pre_bind=false simply means that there is not going to be a bind call later (so no point setting options that only make a difference for bind()).
There was a problem hiding this comment.
Given this I don't find it easy to come up with the definition of an enum that would be equivalent to the bool I have now. Maybe:
enum class OptionContext { PreBind, None };
There was a problem hiding this comment.
Are you sure that TCP_FASTOPEN can not be set before bind() or before listen(), i.e., at any time?
There was a problem hiding this comment.
I'm pretty sure it's required on MacOS to be set after listen(); I think it can be set anytime on linux. @bmetzdorf has the details on this option.
There was a problem hiding this comment.
There was a problem hiding this comment.
I'll implement PreBind, PostBind for now, let you worry about PostListen...
There was a problem hiding this comment.
std::vector is better than std::list for this, I think.
There was a problem hiding this comment.
whitespace nit: for (const auto& option : *options) {
There was a problem hiding this comment.
What is the significance of >2? Please add comments
There was a problem hiding this comment.
I'll have to fix this, thanks for not understanding it :-)
There was a problem hiding this comment.
Why switch from unordered to ordered? Is it just because you don't have a hash function?
There was a problem hiding this comment.
Right. std::map supports a vector as a key out of the box, for unordered map I'd need to provide the hash function, I guess.
There was a problem hiding this comment.
Looked into possible hash support, boost::hash would be nice, as there is no std hash combiner. Or then just keep this as unordered map for now.
There was a problem hiding this comment.
Yeah, I've run into this before too. I'm fine leaving it as std::map, although other reviewers may feel more strongly about it.
There was a problem hiding this comment.
No strong opinion that this has to be unordered, unless it affect performance. std::hash does support string as key so if we need a hash function we can just reuse some from that?
There was a problem hiding this comment.
Yeah, no strong preference. IMHO, the STL has a lot of inconsistency around how hashing and comparison works with containers and inbuilt types, let's fix it only when we need to for perf reasons and prefer simplicity otherwise.
Insert a Listener Socket Option if "transparent" option is requested. Signed-off-by: Jarno Rajahalme <jarno@covalent.io>
08705d1 to
1b58b6f
Compare
Requested-by: Greg Greenway <ggreenway@apple.com> Signed-off-by: Jarno Rajahalme <jarno@covalent.io>
Reported-by: Greg Greenway <ggreenway@apple.com> Signed-off-by: Jarno Rajahalme <jarno@covalent.io>
Signed-off-by: Jarno Rajahalme <jarno@covalent.io>
Use an enum instead of a bool for socket state, and call the visitors in both states initially. Upon closer examination, changing the IP_TRANSPARENT option can have effects also after bind(), so we need to change it also after bind, and potentially set it to zero when set as well. Signed-off-by: Jarno Rajahalme <jarno@covalent.io>
|
@ggreenway Addressed all your comments :-) |
ggreenway
left a comment
There was a problem hiding this comment.
A few more small nits, but I think the design of this is looking really good.
| // If we are unable to set the option we still return success for the default value (0), | ||
| // as we never changed the option to begin with. This also prevents failures in all current | ||
| // test cases that do not set the transparent option. | ||
| if (rc == 0 || transparent == 0) { |
There was a problem hiding this comment.
Is there a case where rc != 0, transparent == 0, and transparent_ != 0? In other words, will setsockopt ever modify transparent? If not, I think it's clearer if you check against transparent_ here, so it's clear that the result doesn't depend on setsockopt possibly modifying it.
There was a problem hiding this comment.
You are right, will compare against transparent_
| return true; | ||
| #endif | ||
| throw EnvoyException("ListenSocketOption: Error setting IP_TRANSPARENT socket option"); | ||
| return false; |
There was a problem hiding this comment.
This line is unreachable. But shouldn't this just return false instead of throwing? I thought that was the convention for this function.
There was a problem hiding this comment.
The caller of this callback will also throw() in case of false return. I added a more specific throw here to be able to test that this implementation is called in the test case.
I was concerned that compiler would complain about not returning, but I was wrong, leaving the return statement out works as well, so I removed it.
There was a problem hiding this comment.
But isn't the API contract for this interface that it will return false on failure? If it is allowed to throw, I think the interface should return void. Or at a minimum, the documentation should specify in what cases it will throw vs when it could return false.
| int transparent = transparent_; | ||
|
|
||
| #ifdef SOL_IP | ||
| int rc = setsockopt(socket.fd(), SOL_IP, IP_TRANSPARENT, &transparent, sizeof(transparent)); |
There was a problem hiding this comment.
You need to set IPV6_TRANSPARENT if the socket is IPv6, right?
Beware that IPV6_TRANSPARENT is not defined everywhere, so that will need to be guarded by an #ifdef.
There was a problem hiding this comment.
Was not aware of this. Will do.
There was a problem hiding this comment.
Also, it may make testing easier if you add setsockopt() to Envoy::Api::OsSysCalls so you can mock setsockopt().
Search around for TestThreadsafeSingletonInjector<Api::OsSysCallsImpl> for an example.
There was a problem hiding this comment.
Will address this too, went unnoticed so far, sorry.
Signed-off-by: Jarno Rajahalme <jarno@covalent.io>
|
Nits & other comments addressed :-) |
Signed-off-by: Jarno Rajahalme <jarno@covalent.io>
"add" fits better now that we support multiple options. Suggested-by: Lizan Zhou <zlizan@google.com> Signed-off-by: Jarno Rajahalme <jarno@covalent.io>
|
I'll push a few commits addressing @lizan's review comments soon. |
Shared vector of shared pointers kept the ownership of socket option visitors a bit unclear. Transfer the ownership of option visitors to Socket, but allow the vector of all sockets be shared. Requested-by: Lizan Zhou <zlizan@google.com> Signed-off-by: Jarno Rajahalme <jarno@covalent.io>
| Configuration::ListenerFactoryContext& context) override { | ||
| EXPECT_CALL(*option_, setOption(_, Network::Socket::SocketState::PreBind)) | ||
| auto option = std::make_unique<Network::MockSocketOption>(); | ||
| EXPECT_CALL(*(option.get()), setOption(_, Network::Socket::SocketState::PreBind)) |
There was a problem hiding this comment.
EXPECT_CALL(*option, setOption should work, no?
There was a problem hiding this comment.
You are right, an explicit get() is not needed here. Will simplify this in a commit ASAP.
Signed-off-by: Jarno Rajahalme <jarno@covalent.io>
Add setsockopt() to Envoy::Api::OsSysCalls and provide mock implementation supporting boolean options defaulting to 'false'. Change the listener manager implementation test 'TransparentListener' to use the new mocks. Suggested-by: Greg Greenway <ggreenway@apple.com> Signed-off-by: Jarno Rajahalme <jarno@covalent.io>
…ot available. Fix unused variable when IP_TRANSPARENT option is not available. Signed-off-by: Jarno Rajahalme <jarno@covalent.io>
|
Problems with the CI: |
e55ed9e to
da28038
Compare
|
@ggreenway Have no idea how I "dismissed" your review, not sure if your approval still counts? |
|
@jrajahalme approval is dismissed anytime a new commit is pushed. I think the theory behind this is so that you can't try to push a commit right before someone pushes the merge button (trying to sneak in an unapproved commit). Don't worry about it too much. We're still waiting on review from a senior maintainer (@alyssawilk in this case). |
htuch
left a comment
There was a problem hiding this comment.
Thanks @jrajahalme. Mostly looks great, just a few small changes requested. In particular, would like to see the IP transparent stuff in a separate PR, it's easier to review and lowers risk of any given PR to separate concerns.
| * to 32 bits to allow these bits to be efficiently combined into a larger hash key | ||
| * used in connection pool lookups. | ||
| * @param vector of bits that can be used to separate connections based on the options. Should | ||
| * return zero if connections with different options can be pooled together. This is |
There was a problem hiding this comment.
Comment updated, thanks for noticing this!
| * zero if connections with different options can be pooled together. This is limited | ||
| * to 32 bits to allow these bits to be efficiently combined into a larger hash key | ||
| * used in connection pool lookups. | ||
| * @param vector of bits that can be used to separate connections based on the options. Should |
There was a problem hiding this comment.
Nit: this is now a singular option.
There was a problem hiding this comment.
Yeah, no strong preference. IMHO, the STL has a lot of inconsistency around how hashing and comparison works with containers and inbuilt types, let's fix it only when we need to for perf reasons and prefer simplicity otherwise.
|
|
||
| // Network::Socket::Option | ||
| bool setOption(Network::Socket& socket, Network::Socket::SocketState state) const override { | ||
| // IP_TRANSPARENT has an effect on bind() (allowing bind() to non-local addresses), but |
There was a problem hiding this comment.
Can you split this into a separate PR? I think it's orthogonal.
There was a problem hiding this comment.
Removed from this PR.
Requested-by: Harvey Tuch <htuch@google.com> Signed-off-by: Jarno Rajahalme <jarno@covalent.io>
Signed-off-by: Jarno Rajahalme <jarno@covalent.io>
This follows up from envoyproxy#558 which made IP_FREEBIND a BoolValue for LDS but not for upstream. I think it makes sense to have it in both places given the new socket options setup introduce in envoyproxy/envoy#2734. Some bonus docs fixups thrown in. Signed-off-by: Harvey Tuch <htuch@google.com>
This follows up from #558 which made IP_FREEBIND a BoolValue for LDS but not for upstream. I think it makes sense to have it in both places given the new socket options setup introduce in envoyproxy/envoy#2734. Some bonus docs fixups thrown in. Signed-off-by: Harvey Tuch <htuch@google.com>
This patch introduces support for setting IP_FREEBIND on both listener sockets and upstream connection sockets prior to binding. This enables the use of IP addresses that are not currently bound to the NIC for listening and initiating connections from. This is useful in environments with virtualized networking. There's also some related work on SocketOption that continues from envoyproxy#2734, which was needed to enable this to work cleanly. Risk Level: Low (no change unless enabled). Testing: Unit tests for ListenerManager, ClusterManager and SocketOptionImpl. Manual end-to-end validation with steps described in configs/freebind/README.md. API Changes: envoyproxy/data-plane-api#536 Fixes envoyproxy#528. Signed-off-by: Harvey Tuch <htuch@google.com>
…2922) This patch introduces support for setting IP_FREEBIND on both listener sockets and upstream connection sockets prior to binding. This enables the use of IP addresses that are not currently bound to the NIC for listening and initiating connections from. This is useful in environments with virtualized networking. There's also some related work on SocketOption that continues from #2734, which was needed to enable this to work cleanly. Risk Level: Low (no change unless enabled). Testing: Unit tests for ListenerManager, ClusterManager and SocketOptionImpl. Manual end-to-end validation with steps described in configs/freebind/README.md. API Changes: envoyproxy/data-plane-api#536 Fixes #528. Signed-off-by: Harvey Tuch <htuch@google.com>
Signed-off-by: Kuat Yessenov <kuat@google.com>
This follows up from #558 which made IP_FREEBIND a BoolValue for LDS but not for upstream. I think it makes sense to have it in both places given the new socket options setup introduce in envoyproxy/envoy#2734. Some bonus docs fixups thrown in. Signed-off-by: Harvey Tuch <htuch@google.com>
Support multiple options by collecting options to a list and then
applying all of them in order on the socket.
So far the listen socket options have been set in the internal
setSocket()function after the socket has already been bound. Thishas the benefit of (re)setting the options whenever a listener
configuration is changed, which may involve a change in listener
filter configurarion. Some socket options, however, need to be set
before the bind() call for them to have the desired effect.
This commit adds an enum
SocketStateparameter to the setOption()callback, telling the options implementation whether the options are being
set before or after the bind() call. The setOption() callback is called
on the listen socket once with
state=PreBindwhen the socket is firstcreated right before the bind() call. Then, after the
bind()call andwhen whenever the listener socket is reused for a new or modified listener,
the setOption() callback is called as before, but now with
state=PostBind.Client connections will have the setOptions() called before the bind
(like before) and with
state=PreBind. Client sockets are not recycledlike the listen sockets, so they will not have the setOptions() called
again later.
Finally, this PR adds support for setting the IP_TRANSPARENT socket
option defined in the Envoy LDS API. The same
ListenerSocketOptionclass can easily be extended to support other socket options as well.
Signed-off-by: Jarno Rajahalme jarno@covalent.io
Risk Level: Low