Skip to content

eth: stabilize tx relay peer selection#31714

Merged
fjl merged 16 commits into
ethereum:masterfrom
cskiraly:stabilize-tx-peer-selection
Aug 28, 2025
Merged

eth: stabilize tx relay peer selection#31714
fjl merged 16 commits into
ethereum:masterfrom
cskiraly:stabilize-tx-peer-selection

Conversation

@cskiraly
Copy link
Copy Markdown
Contributor

When maxPeers was just above some perfect square, and a few peers
dropped for some reason, we changed the peer selection function.
When new peers were acquired, we changed again.

This PR will stabilize the selection function under normal operating
conditions by adding some hysteresis.

The first patch was a first implementation that worked only close to maxPeers.
The second version (second patch) works all over the spectrum.

@cskiraly cskiraly requested a review from rjl493456442 as a code owner April 25, 2025 10:41
@cskiraly cskiraly requested a review from fjl April 25, 2025 10:44
@fjl fjl changed the title Stabilize tx peer selection eth: stabilize tx relay peer selection Apr 29, 2025
@fjl fjl self-assigned this Apr 29, 2025
@cskiraly
Copy link
Copy Markdown
Contributor Author

cskiraly commented May 6, 2025

@fjl I don't think we use the atomicity of lastDirect, simply because the way we use BroadcastTransactions, it has its own loop in a dedicated goroutine.
Still it seemed better to me to make it atomic, but we can also remove it.

@rjl493456442
Copy link
Copy Markdown
Member

What is the intention behind stabilizing peer selection for direct transaction propagation?

If the peer set remains stable, does that mean transactions will be propagated to the same
peers consistently?

@cskiraly
Copy link
Copy Markdown
Contributor Author

cskiraly commented Aug 19, 2025

If the peer set is stable, it is consistent already. The problem is when you jump from 49 to 48 peers and then back to 49, it changes back and forth. Since our default is 50, this can easily happen.

We are using a stable selection to avoid as much as possible nonce gaps.

@lightclient
Copy link
Copy Markdown
Member

Why hysteresis = 0.5? Would math.Ceil do something similar? It seems like you'll face the issue again going between peer counts of 42 and 43 -- this is okay because it's uncommon to oscillate there and much more common between 48, 49, and 50?

Would it be simpler to just make our peer default to 48 or say 60?

@fjl
Copy link
Copy Markdown
Contributor

fjl commented Aug 25, 2025

The original issue fixed by this PR is that the modulo operation causes the algorithm to choose different peers for broadcasting whenever the total peer count changes.

The existing algorithm works like this:

direct = sqrt(peers)
total = direct*direct
if hash%total < direct {
    send
}

Can't we do it like this, to avoid the modulo and square?

unit = (2^256-1) / peers
threshold = unit * sqrt(peers)
if hash < threshold {
    send
}

cskiraly and others added 7 commits August 26, 2025 10:49
When maxPeers was just above some perfect square, and a few peers
dropped for some reason, we changed the peer selection function.
When new peers were acquired, we changed again.
This patch stabilizes the selection function under normal operating
conditions close to saturation.

Signed-off-by: Csaba Kiraly <csaba.kiraly@gmail.com>
…cast

When maxPeers was just above some perfect square, and a few peers
dropped for some reason, we changed the peer selection function.
When new peers were acquired, we changed again.

This patch stabilizes the selection function under normal operating
conditions by adding some hysteresis.

Signed-off-by: Csaba Kiraly <csaba.kiraly@gmail.com>
Signed-off-by: Csaba Kiraly <csaba.kiraly@gmail.com>
Signed-off-by: Csaba Kiraly <csaba.kiraly@gmail.com>
@fjl fjl force-pushed the stabilize-tx-peer-selection branch from 1bcc733 to 583fca2 Compare August 26, 2025 08:51
fjl added 3 commits August 27, 2025 00:24
Compared to the previous approach, which was probabilistic, the new peer choice will
always select exactly sqrt(peers) for any transaction. After some careful consideration,
it has been determined that use of cryptographic hash function is not required for this
algorithm.
@fjl
Copy link
Copy Markdown
Contributor

fjl commented Aug 26, 2025

I have now added another version that implements the 'ideal' algorithm:

For each transaction, the peer list is sorted in a pseudo-random way that depends on the peer ids and
transaction sender, using the FNV hash function. We can then select sqrt(peers) number of entries from this list. The main advantage, over the probabilistic algorithm from before, is that the count of selected peers is the same for all transactions. It also retains the other property we want, the selection is stable in that fluctuations in peer count will not cause totally different peers to be chosen for the same transaction sender.

@fjl
Copy link
Copy Markdown
Contributor

fjl commented Aug 27, 2025

Not yet sure if we should go with the 'ideal' version or the probabilistic one from before. While it's nice to always select the same number of peers, the sorting is a bit expensive. A benchmark will have to be added.

@fjl
Copy link
Copy Markdown
Contributor

fjl commented Aug 27, 2025

Added a benchmark to determine the cost of choosing the broadcast peers for one transaction. Looks like it costs 4ms to choose for 1k transactions / 50 peers, 21ms for 1k txs / 200 peers. I think it's too slow.

goos: darwin
goarch: arm64
pkg: github.com/ethereum/go-ethereum/eth
cpu: Apple M1
BenchmarkBroadcastChoice/50-8         	  241519	      4958 ns/op	       0 B/op	       0 allocs/op
BenchmarkBroadcastChoice/200-8        	   54642	     21957 ns/op	       0 B/op	       0 allocs/op
BenchmarkBroadcastChoice/500-8        	   20396	     58897 ns/op	       0 B/op	       0 allocs/op

@fjl
Copy link
Copy Markdown
Contributor

fjl commented Aug 27, 2025

Here are the results for a version of the probabilistic algorithm based on the FNV hash. With this one, we can skip the sorting, at the cost of choosing different number of peers for each transaction (including zero sometimes!). The overhead of sorting the peer list is ~1.5µs per transaction (at 50 peers).

goos: darwin
goarch: arm64
pkg: github.com/ethereum/go-ethereum/eth
cpu: Apple M1
BenchmarkBroadcastChoice/50-8         	  318982	      3754 ns/op	       0 B/op	       0 allocs/op
BenchmarkBroadcastChoice/200-8        	   85832	     14032 ns/op	       0 B/op	       0 allocs/op
BenchmarkBroadcastChoice/500-8        	   35127	     34415 ns/op	       0 B/op	       0 allocs/op
@@ -708,27 +706,26 @@ func newBroadcastChoice(self enode.ID) *broadcastChoice {
 // choosePeers selects the peers that will receive a direct transaction broadcast message.
 // Note the return value will only stay valid until the next call to choosePeers.
 func (bc *broadcastChoice) choosePeers(peers []*ethPeer, txSender common.Address) map[*ethPeer]struct{} {
+	if len(peers) == 0 {
+		return nil
+	}
+
+	// Compute threshold.
+	unit := ^uint64(0) / uint64(len(peers))
+	n := uint64(math.Ceil(math.Sqrt(float64(len(peers)))))
+	threshold := unit * n
+
 	// Compute scores.
-	bc.tmp = slices.Grow(bc.tmp[:0], len(peers))[:len(peers)]
+	clear(bc.buffer)
 	hash := fnv.New64()
-	for i, peer := range peers {
+	for _, peer := range peers {
 		hash.Reset()
 		hash.Write(bc.self[:])
 		hash.Write(peer.Peer.Peer.ID().Bytes())
 		hash.Write(txSender[:])
-		bc.tmp[i] = broadcastPeer{peer, hash.Sum64()}
-	}
-
-	// Sort by score.
-	slices.SortFunc(bc.tmp, func(a, b broadcastPeer) int {
-		return cmp.Compare(a.score, b.score)
-	})
-
-	// Take top n.
-	clear(bc.buffer)
-	n := int(math.Ceil(math.Sqrt(float64(len(bc.tmp)))))
-	for i := range n {
-		bc.buffer[bc.tmp[i].p] = struct{}{}
+		if hash.Sum64() < threshold {
+			bc.buffer[peer] = struct{}{}
+		}
 	}

@cskiraly
Copy link
Copy Markdown
Contributor Author

Using the "sorting" version seems like a 50% overhead. I would say lets go with the probabilistic threshold based one for now, which is closer to the current behaviour, and discuss how we could evaluate the network-wide effect of the sorted version.

@fjl
Copy link
Copy Markdown
Contributor

fjl commented Aug 27, 2025

It's important to put the number into perspective. Even with the sorting, it is still way cheaper than the previous code which uses sha3 and big integer math to select the peers.

@fjl
Copy link
Copy Markdown
Contributor

fjl commented Aug 28, 2025

Pushed a change to use SipHash with a static but secret key. It's even a tiny bit faster than using FNV, but the reason for using SipHash is that it hides the salt:

SipHash instead guarantees that, having seen Xi and SipHash(Xi, k), an attacker who does not know the key k cannot find (any information about) k or SipHash(Y, k) for any message Y ∉ {Xi} which they have not seen before.

In discussions, we arrived at the conclusion that a salted hash might be needed, because a potential adversary could use the additional information provided by the simpler FNV-based construction to locate the initial node relaying transactions from a certain sender. It's a bit of a contrived attack, but the salt should make it harder to do, since there is no way for the attacker to infer which nodes would get the broadcast from which other nodes, even if all node IDs and tx sender addresses are known.

@cskiraly
Copy link
Copy Markdown
Contributor Author

Note that to round the sqrt we used floor ( int64() ) previously, now we use int(math.Ceil(). This is a small bump in push count, but actually I like it better for small networks.

LGTM.

@fjl fjl added this to the 1.16.3 milestone Aug 28, 2025
@fjl fjl merged commit 9af1f71 into ethereum:master Aug 28, 2025
4 of 6 checks passed
gballet pushed a commit to gballet/go-ethereum that referenced this pull request Sep 11, 2025
When maxPeers was just above some perfect square, and a few peers
dropped for some reason, we changed the peer selection function.
When new peers were acquired, we changed again.

This PR improves the selection function, in two ways. First, it will always select
sqrt(peers) to broadcast to. Second, the selection now uses siphash with a secret
key, to guard against information leaks about tx source.

---------

Signed-off-by: Csaba Kiraly <csaba.kiraly@gmail.com>
Co-authored-by: Felix Lange <fjl@twurst.com>
Sahil-4555 pushed a commit to Sahil-4555/go-ethereum that referenced this pull request Oct 12, 2025
When maxPeers was just above some perfect square, and a few peers
dropped for some reason, we changed the peer selection function.
When new peers were acquired, we changed again.

This PR improves the selection function, in two ways. First, it will always select
sqrt(peers) to broadcast to. Second, the selection now uses siphash with a secret
key, to guard against information leaks about tx source.

---------

Signed-off-by: Csaba Kiraly <csaba.kiraly@gmail.com>
Co-authored-by: Felix Lange <fjl@twurst.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants