eth: stabilize tx relay peer selection#31714
Conversation
|
@fjl I don't think we use the atomicity of |
|
What is the intention behind stabilizing peer selection for direct transaction propagation? If the peer set remains stable, does that mean transactions will be propagated to the same |
|
If the peer set is stable, it is consistent already. The problem is when you jump from 49 to 48 peers and then back to 49, it changes back and forth. Since our default is 50, this can easily happen. We are using a stable selection to avoid as much as possible nonce gaps. |
|
Why Would it be simpler to just make our peer default to 48 or say 60? |
|
The original issue fixed by this PR is that the modulo operation causes the algorithm to choose different peers for broadcasting whenever the total peer count changes. The existing algorithm works like this: Can't we do it like this, to avoid the modulo and square? |
When maxPeers was just above some perfect square, and a few peers dropped for some reason, we changed the peer selection function. When new peers were acquired, we changed again. This patch stabilizes the selection function under normal operating conditions close to saturation. Signed-off-by: Csaba Kiraly <csaba.kiraly@gmail.com>
…cast When maxPeers was just above some perfect square, and a few peers dropped for some reason, we changed the peer selection function. When new peers were acquired, we changed again. This patch stabilizes the selection function under normal operating conditions by adding some hysteresis. Signed-off-by: Csaba Kiraly <csaba.kiraly@gmail.com>
Signed-off-by: Csaba Kiraly <csaba.kiraly@gmail.com>
Signed-off-by: Csaba Kiraly <csaba.kiraly@gmail.com>
1bcc733 to
583fca2
Compare
Compared to the previous approach, which was probabilistic, the new peer choice will always select exactly sqrt(peers) for any transaction. After some careful consideration, it has been determined that use of cryptographic hash function is not required for this algorithm.
|
I have now added another version that implements the 'ideal' algorithm: For each transaction, the peer list is sorted in a pseudo-random way that depends on the peer ids and |
|
Not yet sure if we should go with the 'ideal' version or the probabilistic one from before. While it's nice to always select the same number of peers, the sorting is a bit expensive. A benchmark will have to be added. |
|
Added a benchmark to determine the cost of choosing the broadcast peers for one transaction. Looks like it costs 4ms to choose for 1k transactions / 50 peers, 21ms for 1k txs / 200 peers. I think it's too slow. |
|
Here are the results for a version of the probabilistic algorithm based on the FNV hash. With this one, we can skip the sorting, at the cost of choosing different number of peers for each transaction (including zero sometimes!). The overhead of sorting the peer list is ~1.5µs per transaction (at 50 peers). @@ -708,27 +706,26 @@ func newBroadcastChoice(self enode.ID) *broadcastChoice {
// choosePeers selects the peers that will receive a direct transaction broadcast message.
// Note the return value will only stay valid until the next call to choosePeers.
func (bc *broadcastChoice) choosePeers(peers []*ethPeer, txSender common.Address) map[*ethPeer]struct{} {
+ if len(peers) == 0 {
+ return nil
+ }
+
+ // Compute threshold.
+ unit := ^uint64(0) / uint64(len(peers))
+ n := uint64(math.Ceil(math.Sqrt(float64(len(peers)))))
+ threshold := unit * n
+
// Compute scores.
- bc.tmp = slices.Grow(bc.tmp[:0], len(peers))[:len(peers)]
+ clear(bc.buffer)
hash := fnv.New64()
- for i, peer := range peers {
+ for _, peer := range peers {
hash.Reset()
hash.Write(bc.self[:])
hash.Write(peer.Peer.Peer.ID().Bytes())
hash.Write(txSender[:])
- bc.tmp[i] = broadcastPeer{peer, hash.Sum64()}
- }
-
- // Sort by score.
- slices.SortFunc(bc.tmp, func(a, b broadcastPeer) int {
- return cmp.Compare(a.score, b.score)
- })
-
- // Take top n.
- clear(bc.buffer)
- n := int(math.Ceil(math.Sqrt(float64(len(bc.tmp)))))
- for i := range n {
- bc.buffer[bc.tmp[i].p] = struct{}{}
+ if hash.Sum64() < threshold {
+ bc.buffer[peer] = struct{}{}
+ }
} |
|
Using the "sorting" version seems like a 50% overhead. I would say lets go with the probabilistic threshold based one for now, which is closer to the current behaviour, and discuss how we could evaluate the network-wide effect of the sorted version. |
|
It's important to put the number into perspective. Even with the sorting, it is still way cheaper than the previous code which uses sha3 and big integer math to select the peers. |
|
Pushed a change to use SipHash with a static but secret key. It's even a tiny bit faster than using FNV, but the reason for using SipHash is that it hides the salt:
In discussions, we arrived at the conclusion that a salted hash might be needed, because a potential adversary could use the additional information provided by the simpler FNV-based construction to locate the initial node relaying transactions from a certain sender. It's a bit of a contrived attack, but the salt should make it harder to do, since there is no way for the attacker to infer which nodes would get the broadcast from which other nodes, even if all node IDs and tx sender addresses are known. |
|
Note that to round the sqrt we used floor ( LGTM. |
When maxPeers was just above some perfect square, and a few peers dropped for some reason, we changed the peer selection function. When new peers were acquired, we changed again. This PR improves the selection function, in two ways. First, it will always select sqrt(peers) to broadcast to. Second, the selection now uses siphash with a secret key, to guard against information leaks about tx source. --------- Signed-off-by: Csaba Kiraly <csaba.kiraly@gmail.com> Co-authored-by: Felix Lange <fjl@twurst.com>
When maxPeers was just above some perfect square, and a few peers dropped for some reason, we changed the peer selection function. When new peers were acquired, we changed again. This PR improves the selection function, in two ways. First, it will always select sqrt(peers) to broadcast to. Second, the selection now uses siphash with a secret key, to guard against information leaks about tx source. --------- Signed-off-by: Csaba Kiraly <csaba.kiraly@gmail.com> Co-authored-by: Felix Lange <fjl@twurst.com>
When maxPeers was just above some perfect square, and a few peers
dropped for some reason, we changed the peer selection function.
When new peers were acquired, we changed again.
This PR will stabilize the selection function under normal operating
conditions by adding some hysteresis.
The first patch was a first implementation that worked only close to maxPeers.
The second version (second patch) works all over the spectrum.