eth: stabilize tx relay peer selection by cskiraly · Pull Request #31714 · ethereum/go-ethereum

cskiraly · 2025-04-25T10:41:07Z

When maxPeers was just above some perfect square, and a few peers
dropped for some reason, we changed the peer selection function.
When new peers were acquired, we changed again.

This PR will stabilize the selection function under normal operating
conditions by adding some hysteresis.

The first patch was a first implementation that worked only close to maxPeers.
The second version (second patch) works all over the spectrum.

cskiraly · 2025-05-06T12:04:53Z

@fjl I don't think we use the atomicity of lastDirect, simply because the way we use BroadcastTransactions, it has its own loop in a dedicated goroutine.
Still it seemed better to me to make it atomic, but we can also remove it.

rjl493456442 · 2025-08-19T11:33:20Z

What is the intention behind stabilizing peer selection for direct transaction propagation?

If the peer set remains stable, does that mean transactions will be propagated to the same
peers consistently?

cskiraly · 2025-08-19T12:25:55Z

If the peer set is stable, it is consistent already. The problem is when you jump from 49 to 48 peers and then back to 49, it changes back and forth. Since our default is 50, this can easily happen.

We are using a stable selection to avoid as much as possible nonce gaps.

lightclient · 2025-08-21T22:17:30Z

Why hysteresis = 0.5? Would math.Ceil do something similar? It seems like you'll face the issue again going between peer counts of 42 and 43 -- this is okay because it's uncommon to oscillate there and much more common between 48, 49, and 50?

Would it be simpler to just make our peer default to 48 or say 60?

fjl · 2025-08-25T21:19:54Z

The original issue fixed by this PR is that the modulo operation causes the algorithm to choose different peers for broadcasting whenever the total peer count changes.

The existing algorithm works like this:

direct = sqrt(peers)
total = direct*direct
if hash%total < direct {
    send
}

Can't we do it like this, to avoid the modulo and square?

unit = (2^256-1) / peers
threshold = unit * sqrt(peers)
if hash < threshold {
    send
}

When maxPeers was just above some perfect square, and a few peers dropped for some reason, we changed the peer selection function. When new peers were acquired, we changed again. This patch stabilizes the selection function under normal operating conditions close to saturation. Signed-off-by: Csaba Kiraly <csaba.kiraly@gmail.com>

…cast When maxPeers was just above some perfect square, and a few peers dropped for some reason, we changed the peer selection function. When new peers were acquired, we changed again. This patch stabilizes the selection function under normal operating conditions by adding some hysteresis. Signed-off-by: Csaba Kiraly <csaba.kiraly@gmail.com>

Signed-off-by: Csaba Kiraly <csaba.kiraly@gmail.com>

Compared to the previous approach, which was probabilistic, the new peer choice will always select exactly sqrt(peers) for any transaction. After some careful consideration, it has been determined that use of cryptographic hash function is not required for this algorithm.

fjl · 2025-08-26T22:37:19Z

I have now added another version that implements the 'ideal' algorithm:

For each transaction, the peer list is sorted in a pseudo-random way that depends on the peer ids and
transaction sender, using the FNV hash function. We can then select sqrt(peers) number of entries from this list. The main advantage, over the probabilistic algorithm from before, is that the count of selected peers is the same for all transactions. It also retains the other property we want, the selection is stable in that fluctuations in peer count will not cause totally different peers to be chosen for the same transaction sender.

fjl · 2025-08-27T06:16:59Z

Not yet sure if we should go with the 'ideal' version or the probabilistic one from before. While it's nice to always select the same number of peers, the sorting is a bit expensive. A benchmark will have to be added.

fjl · 2025-08-27T13:17:35Z

Added a benchmark to determine the cost of choosing the broadcast peers for one transaction. Looks like it costs 4ms to choose for 1k transactions / 50 peers, 21ms for 1k txs / 200 peers. I think it's too slow.

goos: darwin
goarch: arm64
pkg: github.com/ethereum/go-ethereum/eth
cpu: Apple M1
BenchmarkBroadcastChoice/50-8         	  241519	      4958 ns/op	       0 B/op	       0 allocs/op
BenchmarkBroadcastChoice/200-8        	   54642	     21957 ns/op	       0 B/op	       0 allocs/op
BenchmarkBroadcastChoice/500-8        	   20396	     58897 ns/op	       0 B/op	       0 allocs/op

fjl · 2025-08-27T13:57:06Z

Here are the results for a version of the probabilistic algorithm based on the FNV hash. With this one, we can skip the sorting, at the cost of choosing different number of peers for each transaction (including zero sometimes!). The overhead of sorting the peer list is ~1.5µs per transaction (at 50 peers).

goos: darwin
goarch: arm64
pkg: github.com/ethereum/go-ethereum/eth
cpu: Apple M1
BenchmarkBroadcastChoice/50-8         	  318982	      3754 ns/op	       0 B/op	       0 allocs/op
BenchmarkBroadcastChoice/200-8        	   85832	     14032 ns/op	       0 B/op	       0 allocs/op
BenchmarkBroadcastChoice/500-8        	   35127	     34415 ns/op	       0 B/op	       0 allocs/op

@@ -708,27 +706,26 @@ func newBroadcastChoice(self enode.ID) *broadcastChoice {
 // choosePeers selects the peers that will receive a direct transaction broadcast message.
 // Note the return value will only stay valid until the next call to choosePeers.
 func (bc *broadcastChoice) choosePeers(peers []*ethPeer, txSender common.Address) map[*ethPeer]struct{} {
+	if len(peers) == 0 {
+		return nil
+	}
+
+	// Compute threshold.
+	unit := ^uint64(0) / uint64(len(peers))
+	n := uint64(math.Ceil(math.Sqrt(float64(len(peers)))))
+	threshold := unit * n
+
 	// Compute scores.
-	bc.tmp = slices.Grow(bc.tmp[:0], len(peers))[:len(peers)]
+	clear(bc.buffer)
 	hash := fnv.New64()
-	for i, peer := range peers {
+	for _, peer := range peers {
 		hash.Reset()
 		hash.Write(bc.self[:])
 		hash.Write(peer.Peer.Peer.ID().Bytes())
 		hash.Write(txSender[:])
-		bc.tmp[i] = broadcastPeer{peer, hash.Sum64()}
-	}
-
-	// Sort by score.
-	slices.SortFunc(bc.tmp, func(a, b broadcastPeer) int {
-		return cmp.Compare(a.score, b.score)
-	})
-
-	// Take top n.
-	clear(bc.buffer)
-	n := int(math.Ceil(math.Sqrt(float64(len(bc.tmp)))))
-	for i := range n {
-		bc.buffer[bc.tmp[i].p] = struct{}{}
+		if hash.Sum64() < threshold {
+			bc.buffer[peer] = struct{}{}
+		}
 	}

cskiraly · 2025-08-27T14:32:31Z

Using the "sorting" version seems like a 50% overhead. I would say lets go with the probabilistic threshold based one for now, which is closer to the current behaviour, and discuss how we could evaluate the network-wide effect of the sorted version.

fjl · 2025-08-27T15:38:14Z

It's important to put the number into perspective. Even with the sorting, it is still way cheaper than the previous code which uses sha3 and big integer math to select the peers.

fjl · 2025-08-28T10:19:29Z

Pushed a change to use SipHash with a static but secret key. It's even a tiny bit faster than using FNV, but the reason for using SipHash is that it hides the salt:

SipHash instead guarantees that, having seen Xi and SipHash(Xi, k), an attacker who does not know the key k cannot find (any information about) k or SipHash(Y, k) for any message Y ∉ {Xi} which they have not seen before.

In discussions, we arrived at the conclusion that a salted hash might be needed, because a potential adversary could use the additional information provided by the simpler FNV-based construction to locate the initial node relaying transactions from a certain sender. It's a bit of a contrived attack, but the salt should make it harder to do, since there is no way for the attacker to infer which nodes would get the broadcast from which other nodes, even if all node IDs and tx sender addresses are known.

cskiraly · 2025-08-28T11:20:17Z

Note that to round the sqrt we used floor ( int64() ) previously, now we use int(math.Ceil(). This is a small bump in push count, but actually I like it better for small networks.

LGTM.

When maxPeers was just above some perfect square, and a few peers dropped for some reason, we changed the peer selection function. When new peers were acquired, we changed again. This PR improves the selection function, in two ways. First, it will always select sqrt(peers) to broadcast to. Second, the selection now uses siphash with a secret key, to guard against information leaks about tx source. --------- Signed-off-by: Csaba Kiraly <csaba.kiraly@gmail.com> Co-authored-by: Felix Lange <fjl@twurst.com>

cskiraly requested a review from rjl493456442 as a code owner April 25, 2025 10:41

cskiraly requested a review from fjl April 25, 2025 10:44

fjl changed the title ~~Stabilize tx peer selection~~ eth: stabilize tx relay peer selection Apr 29, 2025

fjl self-assigned this Apr 29, 2025

cskiraly added the status:triage label Aug 6, 2025

cskiraly and others added 7 commits August 26, 2025 10:49

eth: simplify bigInt usage in BroadcastTransactions

129a94b

Signed-off-by: Csaba Kiraly <csaba.kiraly@gmail.com>

eth/handler: fix unaligned 64-bit atomic operation error

fcc53c7

Signed-off-by: Csaba Kiraly <csaba.kiraly@gmail.com>

eth: implement new broadcast choice

256bf9d

eth: remove lastDirect

59b224a

eth: remove directPeersHysteresis

583fca2

fjl force-pushed the stabilize-tx-peer-selection branch from 1bcc733 to 583fca2 Compare August 26, 2025 08:51

rjl493456442 removed the status:triage label Aug 26, 2025

fjl added 3 commits August 27, 2025 00:24

eth: remove package renaming

411b671

eth: remove append

4cef22c

eth: optimize and add benchmark

d6098cc

go.mod: add siphash

3da045c

eth: use salted hash function for tx broadcast

b9bd450

fjl added 3 commits August 28, 2025 12:38

go mod tidy

32b26c3

eth: move assignment of key

fbe5401

eth: update comment

8ce7159

fjl added this to the 1.16.3 milestone Aug 28, 2025

fjl merged commit 9af1f71 into ethereum:master Aug 28, 2025
4 of 6 checks passed

BrewTestBot mentioned this pull request Sep 2, 2025

ethereum 1.16.3 Homebrew/homebrew-core#235846

Merged

buddh0 mentioned this pull request Dec 31, 2025

upstream: merge geth-v1.16.2 ～ geth-v1.16.7 bnb-chain/bsc#3505

Merged

qianhh mentioned this pull request Mar 20, 2026

geth: upgrade Geth to v1.16.9 bane-labs/go-ethereum#582

Merged

4 tasks

Conversation

cskiraly commented Apr 25, 2025

Uh oh!

cskiraly commented May 6, 2025

Uh oh!

rjl493456442 commented Aug 19, 2025

Uh oh!

cskiraly commented Aug 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lightclient commented Aug 21, 2025

Uh oh!

fjl commented Aug 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fjl commented Aug 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fjl commented Aug 27, 2025

Uh oh!

fjl commented Aug 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fjl commented Aug 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cskiraly commented Aug 27, 2025

Uh oh!

fjl commented Aug 27, 2025

Uh oh!

fjl commented Aug 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cskiraly commented Aug 28, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

cskiraly commented Aug 19, 2025 •

edited

Loading

fjl commented Aug 25, 2025 •

edited

Loading

fjl commented Aug 26, 2025 •

edited

Loading

fjl commented Aug 27, 2025 •

edited

Loading

fjl commented Aug 27, 2025 •

edited

Loading

fjl commented Aug 28, 2025 •

edited

Loading