Multihoming: Hotswap gossip socketaddr #6474
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #6474 +/- ##
=========================================
- Coverage 83.4% 83.4% -0.1%
=========================================
Files 850 851 +1
Lines 377710 377863 +153
=========================================
+ Hits 315086 315166 +80
- Misses 62624 62697 +73 🚀 New features to boost your workflow:
|
|
Overall, I think this is a good approach for now, we should follow through. I'll clean up the socket binding logic to make it easier to integrate this for other services with more esoteric binding patterns. |
|
in my last commit I took out |
3d3eb14 to
2e4203a
Compare
| should_check_duplicate_instance: bool, | ||
| stats_reporter_sender: Option<Sender<Box<dyn FnOnce() + Send>>>, | ||
| exit: Arc<AtomicBool>, | ||
| gossip_rebind_rx: Option<Receiver<SocketAddr>>, |
There was a problem hiding this comment.
I wonder, since we are sending a socket and a rebinder to the service, would it not be better if we sent both in the same argument? They are logically tied to each other fairly hard... Would it make sense to send AtomicUdpSocket rather than a UdpSocket and a notification channel separately?
There was a problem hiding this comment.
so we create GossipService in bootstrap as well, but we don't necessarily want the GossipService in bootstrap to be rebindable: https://github.com/gregcusack/solana/blob/2e4203abd39239a8fce1ff7894bdb28f354f9c1b/validator/src/bootstrap.rs#L161. We could but it will never be used.
we could create some GossipSocket enum like:
pub enum GossipSocket {
Static(UdpSocket)
Rebindable(RebindableSocket) // maybe not the best name lol
}
where
pub struct RebindableSocket {
pub socket: Arc<AtomicUdpSocket>,
pub rebind_rx: Receiver<SocketAddr>,
}
There was a problem hiding this comment.
Maybe use an equivalent of RebindableSocket always and just fill rebind_rx with https://docs.rs/crossbeam/latest/crossbeam/channel/fn.never.html ? Then you can trivially create one from UdpSocket for cases when rebinding is not necessary.
There was a problem hiding this comment.
ended up just removing the crossbeam channel altogether
| socket_addr_space, | ||
| stats_reporter_sender, | ||
| ); | ||
| let t_rebind = gossip_rebind_rx.map(|rebind_rx| { |
There was a problem hiding this comment.
Given that binding a socket takes literally microseconds, do we need a thread waiting on a channel to do it? Can we maybe rebind immediately and just swap the thing in place immediately in the admin RPC handling code?
|
|
||
| meta.with_post_init(|post_init| { | ||
| if let Some(gossip_rebinder) = &post_init.gossip_rebinder { | ||
| gossip_rebinder.rebind(new_addr).map_err(|e| { |
There was a problem hiding this comment.
Do you think it would be possible to perform the actual rebinding here and avoid having the channel and Rebinder abstraction on top of it? We are not calling this every second, so even if this RPC blocks for a few milliseconds it is not a big deal.
There was a problem hiding this comment.
much better idea. yes i can do that
alexpyattaev
left a comment
There was a problem hiding this comment.
LGTM, this does not need to be perfect at the moment.
I'd like to wait on merging this until I can test on a box w/ two routable IPs just to make sure we're good to go here |
baef28d to
f1ee213
Compare
…inrpc command to gossipservice
f1ee213 to
1809a7a
Compare
|
pushed another commit: 1809a7a removes a double |
|
@alexpyattaev UPDATE: this has been tested on a multihomed machine and is working! Now that we have support to pass in Probably doesn't really matter. But currently you can pass in any IP/port into AdminRPC. Of course, if you pass in an IP that doesn't exist as an interface on your machine the swap will fail. Should we instead check if the IP exists in |
alexpyattaev
left a comment
There was a problem hiding this comment.
Let us not add this coupling just yet. Keeping the changeset small has benefits too. For now we can just assume the adminRPC does not get garbage in (which should be a reasonable assumption for what is an experimental feature at the moment). Once we have the logic mostly sorted we can add idiotproofing where appropriate.
Problem
No ability to swap IP address for multihoming support
Summary of Changes
HOW TO:
example: