Remove direct access to SO_REUSEPORT#5965
Remove direct access to SO_REUSEPORT#5965alexpyattaev wants to merge 11 commits intoanza-xyz:masterfrom
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #5965 +/- ##
=======================================
Coverage 82.8% 82.8%
=======================================
Files 849 849
Lines 379373 379434 +61
=======================================
+ Hits 314300 314374 +74
+ Misses 65073 65060 -13 🚀 New features to boost your workflow:
|
bw-solana
left a comment
There was a problem hiding this comment.
Left a few comments. I'm still not a fan of the silent rugging of config. I can get onboard if we at least thrown a warn to say "hey, you called me with an invalid config and I fixed it for you"
| let (_port, read_sockets) = solana_net_utils::multi_bind_in_range_with_config( | ||
| ip_addr, | ||
| (port, port + num_sockets as u16), | ||
| SocketConfig::default().reuseport(true), |
There was a problem hiding this comment.
why were we explicitly setting reuse port to true? Was this working around some problem that no longer exists? Were we just foolish?
There was a problem hiding this comment.
I see. We used to return error if reuse port was not set on the config. Now we just force reuse port to true within the function
| }; | ||
| #[cfg(not(any(windows, target_os = "ios")))] | ||
| set_reuse_port(&socket)?; | ||
| config.reuseport = true; |
There was a problem hiding this comment.
no, since legitimate use assumes reuseport is not set externally, so no point warning here.
There was a problem hiding this comment.
why do we let someone pass in a config if we don't respect it?
There was a problem hiding this comment.
this. never change a config behind the callers back. if it's wrong, return error
There was a problem hiding this comment.
The whole idea is that reuseport will not be controllable by the caller anymore. So we can change it without caller knowing if we did or not. Or we can break public API and just delete it from config completely. Why would we want to make it possible to use this incorrectly and get runtime panics when it is trivial to make it impossible?
There was a problem hiding this comment.
I'm leaning towards either:
- Deprecate all of these APIs and create a new
SocketConfigtype that doesn't allow settingreuseport+ APIs that use the new config type. - Setup the configurations properly everywhere and throw errors when they're not.
While I agree overwriting the config is probably a better user experience in 99% of cases, it is a behavioral breaking change, and I would rather not set the precedent that it's something we want to do
There was a problem hiding this comment.
I think the point that is still missed is that all the functions that are "rugging" the config are impossible to use correctly with the old API and old behavior. For example, when we bind TPU ports, we need to call pub fn bind_two_in_range_with_offset_and_config which will yield 2 ports with correct QUIC offset. For that function to make sense, it needs to have reuseport=false (else it will bind on top of existing ports). After that, we want to bind additional sockets on top of the stuff bound by bind_two_in_range_with_offset_and_config, which means the original socket needs to have reuseport=true (else binding on top of it will fail). We could change the logic in bind_two_in_range_with_offset_and_config such that it first binds without SO_REUSEPORT, then sets the flag on both sockets AFTER they are bound to reflect what is in the config (but that is changing what the function does, which is something you want to avoid). bind_in_range_with_config is more tricky as it can concievably be used with reuseport=true (assuming the user WANTS it to bind on top of existing stuff) even though it makes no sense. So, we end up either changing the API towards a sane option (which is what I'm proposing) or changing the implementation (which would be necesary for option 2).
As for option 1, I do not think we should maintain code that is objectively hard to use correctly, we are not making C++ standard library here =)
There was a problem hiding this comment.
For that function to make sense, it needs to have reuseport=false (else it will bind on top of existing ports)
Can you break this down for me? We call it with reuseport set to true today, right? Which existing port is it binding on top of? Wouldn't this imply that the port range being passed in is the problem?
There was a problem hiding this comment.
Can you break this down for me?
Gladly :)
We call it with reuseport set to true today, right?
Yes, noone calls this with false, but that is technically allowed (just does not do anything).
Which existing port is it binding on top of?
The first one in range, + a random offset some of the functions pick. Of course, they can only bind on top of ports with reuseport=true (but this does not save us)
Wouldn't this imply that the port range being passed in is the problem?
Well yes, functions that operate on specific port number are not problematic. if we were to "consume" ports from the port range as we bind sockets we would not have as much of an issue here. But that would call for agave to keep stateful mutable port ranges rather than a tuple of ints, not sure it is a better way.
There was a problem hiding this comment.
Not rugging config anymore. If user sets reuseport, we will respect his choice. We will only crash in totally pathological case where user asks for extra sockets on top of existing one, and explicitly sets SO_REUSEPORT = false (which would get IO error from the OS anyway).
| #[deprecated(since = "2.3.0", note = "SO_REUSEPORT is now managed automatically")] | ||
| pub fn reuseport(mut self, reuseport: bool) -> Self { |
There was a problem hiding this comment.
This is the key change - we are making it such that users know that manually controlling reuseport flag is not supported anymore. This allows us to manipulate field in the SocketConfig as we deem appropriate.
There was a problem hiding this comment.
We would typically deprecate and then wait a version before removing.
I view overwriting the field as almost the same thing as just removing the API because the API becomes completely meaningless
There was a problem hiding this comment.
If someone keeps using this API, there are principally 2 outcomes: they are setting this field correctly for each of the functions they are calling, and our "rugging" is a noop, or they are setting it incorrectly, and "rugging" fixes a bug in their code. We can of course preserve the exact semantics here at the cost of complexity inside agave.
There was a problem hiding this comment.
Agree. "Fixing" their code is a behavioral breaking API change that probably makes life better in 99% of cases
There was a problem hiding this comment.
Now we are tracking whether user manually set the flag, and if they did we fall back to blindly obeying what they have specified.
0a3a3e8 to
76bd9cd
Compare
|
Rebased to pull in fix for |
Neither am I, but since external callers are not supposed to be setting up that part of config by hand they should never notice the rugging. We have discussed a full rework of API with @KirillLykov today and deemed it to be not warranted here since correct usage will never trigger any issues with rugging, and incorrect use would be a bug anyway.
Added warn! calls in 76bd9cd |
76bd9cd to
41e501e
Compare
|
I think I've found a way to have the cake and eat it too in 6e16683. We can check if the user has set the reuseport manually, and if they did fall back to old behavior. Old behavior has obviously problematic usage patterns patched, i.e. when calling for mulitple ports to be scanned/bound to, we will bind to free ports first, then set whatever reuseport flag the user actually asked for. TL;DR: if a user manually sets reuseport through (now deprecated) setter method, |
9ea8a8c to
1d67467
Compare
| let orig_reuseport2 = sock2_config.reuseport; | ||
| // clear flags to be able to find actually free ports | ||
| sock1_config.reuseport = false; | ||
| sock2_config.reuseport = false; |
There was a problem hiding this comment.
I still don't like these config shenanigans. We should respect what gets passed in even if it seems undesirable. If we want to take away some optionality and force the "right" behavior, we should make new APIs and deprecate the broken ones (or fix them and roll the major version)
There was a problem hiding this comment.
ok I will bump major version then :)
There was a problem hiding this comment.
Decided against version bumps just yet, got rid of the shenanigans in 21c8316 (now our behavior is same as original if reuseport is set manually, including pathological cases).
There was a problem hiding this comment.
the monorepo crates are versioned synchronously. we have a release schedule, versioning cadence and backward compatibility policy. there is zero chance we are bumping major for this
There was a problem hiding this comment.
Yeah I've figured as much by looking at other PRs. Makes sense to keep everything coherent.
can you explain what the proper usage of My minimal understanding is if we let a user explicitly set For this PR you're trying to prevent a user for ever explicitly setting |
Gladly!
Yes, pretty much. Since they had to bind with SO_REUSEPORT set, they set this option in SocketConfig, which allowed them to bind on top of already bound sockets, causing tests to sometimes fail. We had a bunch of hacks to work around this flakiness such as doing
Yeah, exactly. With user calling |
gregcusack
left a comment
There was a problem hiding this comment.
I'm having a hard time understanding the logic flow of this code with reuseport_set_by_user. It's hard to tell when a port is actually getting set with SO_REUSEPORT or not. I kinda agree with Brennan on this one. I think we should almost just start over. Deprecate all the methods that are currently allowing users to misuse SO_REUSEPORT and then create new APIs that force the proper use. It sucks because it's going to mean more code in this library until we can completely remove remove the old API misusing SO_REUSEPORT. But i think it is a lot more clear and less convoluted/hacky than using reuseport_set_by_user.
| let port_range = localhost_port_range_for_tests(); | ||
| let udp_config = SocketConfig::default(); | ||
| let quic_config = SocketConfig::default().reuseport(true); | ||
| let quic_config = SocketConfig::default(); |
There was a problem hiding this comment.
so we find to tpu_quic socket without SO_REUSEPORT and then rebind using SO_REUSEPORT below here:
let tpu_quic = bind_more_with_config(tpu_quic, num_quic_endpoints, quic_config).unwrap();
| if orig_reuseport & !config.reuseport { | ||
| set_reuse_port(&udp_socket)?; | ||
| } |
There was a problem hiding this comment.
how can this ever be true? orig_reuseport cannot be true without config.reuseport_set_by_user also being true, right?
There was a problem hiding this comment.
Not quite, functions can call each other, so it could be set by us within this module.
thank you for the info on this! |
Yeah it was much simpler when we were just rugged the config.
The existing API is mostly fine. There are a few pathological functions allowing SO_REUSEPORT to be specified. We will gradually eradicate them. No need to uproot half the codebase for this. |
|
you've been misled by a symptom. the true bug is here Line 39 in d45d10c every "default" that's not a param to a clap arg is a bug. the port range is particularly insidious |
Yes, looks like best thing we can do to this crate is deprecate the whole thing and just make better ones instead. |
dc049c8 to
2ff4456
Compare
|
Since we are at 3.0 I have made the breaking changes to implement the originally intended changes. |
are the symbols deprecated in 2.3? we typically commit to deprecations being published for at least one minor release cycle |
nope, this PR was supposed to be a preparation to completely remove the broken API, but it never got merged because reasons. |
ya we probably should have deprecated these in v2.3 but it slipped through. Chatted with a couple of the devrel folks at foundation and they basically said as long as we are following semver, we should be ok to break the public API. They also mentioned they would help with comms when we roll out the breaking changes in v3.0. So, I'd suggest we break it when we have the chance |
|
guys you can deprecate the symbols in a separate pr and bp that. we're still early |
2a3988f to
1ee0377
Compare
1ee0377 to
7b2ae82
Compare
|
Backports to the beta branch are to be avoided unless absolutely necessary for fixing bugs, security issues, and perf regressions. Changes intended for backport should be structured such that a minimum effective diff can be committed separately from any refactoring, plumbing, cleanup, etc that are not strictly necessary to achieve the goal. Any of the latter should go only into master and ride the normal stabilization schedule. Exceptions include CI/metrics changes, CLI improvements and documentation updates on a case by case basis. |
| // binds many sockets to the same port in a range | ||
| // Note: The `mut` modifier for `num` is unused but kept for compatibility with the public API. | ||
| #[deprecated( | ||
| since = "2.2.0", | ||
| note = "use `multi_bind_in_range_with_config` instead" | ||
| )] | ||
| #[allow(unused_mut)] | ||
| pub fn multi_bind_in_range( | ||
| ip_addr: IpAddr, | ||
| range: PortRange, | ||
| mut num: usize, | ||
| ) -> io::Result<(u16, Vec<UdpSocket>)> { | ||
| let config = SocketConfig::default(); | ||
| multi_bind_in_range_with_config(ip_addr, range, config, num) | ||
| } | ||
|
|
There was a problem hiding this comment.
a little confused here. are you creating a new function and deprecating it at the same time?
|
Basically the situation is that we can not backport just the deprecation warnings without providing a new implementation to actually use. So this PR is supposed to provide a stopgap implementation that people can actually use, and in 3.0 we remove the backwards compatibility. |
Problem
This is a split-off from #5832
Summary of Changes