p2p/discover: fix race involving the seed node iterator#1859
p2p/discover: fix race involving the seed node iterator#1859obscuren merged 3 commits intoethereum:developfrom
Conversation
67305d6 to
8820a0c
Compare
There was a problem hiding this comment.
Do we want to delete the self query test? At a certain point we had issues with geth trying to connect to itself because self id ended up in the seed database. Wouldn't it be worth to keep this test (and the self test check in the lookup mechanism) still in?
There was a problem hiding this comment.
In hindsight, I think that removing the self entries is misguided. We should rather make sure that
self never ends up in the database.
There was a problem hiding this comment.
I added a check to not include self in the seed result slice.
78ff0d1 to
7ce4cda
Compare
|
👏🏻 Have no meaningful comment but would like to say that I enjoyed looking at the code and thanks for finding the bug. |
7ce4cda to
ecac7e3
Compare
|
LGTM 👍 |
nodeDB.querySeeds was not safe for concurrent use but could be called concurrenty on multiple goroutines in the following case: - the table was empty - a timed refresh started - a lookup was started and initiated refresh These conditions are unlikely to coincide during normal use, but are much more likely to occur all at once when the user's machine just woke from sleep. The root cause of the issue is that querySeeds reused the same leveldb iterator until it was exhausted. This commit moves the refresh scheduling logic into its own goroutine (so only one refresh is ever active) and changes querySeeds to not use a persistent iterator. The seed node selection is now more random and ignores nodes that have not been contacted in the last 5 days.
The strict matching can get in the way of protocol upgrades.
ecac7e3 to
32dda97
Compare
There was a problem hiding this comment.
Just out of curiosity, what is the advantage of this label compared to labelling the innermost for loop?
There was a problem hiding this comment.
The loop is structured like this:
for {
node = load random node
check
check
check
add node as result
}
If any of the checks fail, the node is not added to the result. continue seek is supposed to make that clear.
|
👍 squash commits please |
p2p/discover: fix race involving the seed node iterator
* Parallel block import (ethereum#1662) * core: naive first implmenetation of parallel block import in InsertChainStateless * core: add logic to handle dependency while parallel import * core: handle unknown ancestor + init some UTs * core: implement alternative way to check for unknown ancestor * core: defer ValidateWitnessPreState * zero commit * core: do sequential insert if the importing chain is small * core: fix error handling * core: update UTs * core,eth,internal/cli: add config to enable/disable parallel import + refactor * core: add metrics for seq and parallel import * internal/cli: moved parallelstatelessimport flag to the witness config * core: changed debug import log to info * core: added 2 new metrics to track the number of blocks processed in sequential and parallel stateless imports * core: init benchmarks * core: fix benchmarks * core: use worker pool for parallel block import * core: try parallel import for smaller batch * core: dedup header verification logic * zero commit * core,eth,internal/cli: improvements based on comments * core: add adversarial tests for stateless insert * core: re-add empty chain insertion case * zero commit to trigger CI * core,eth/catalyst: report correct execution stats for stateless insert * CI: temp update for running stateless tests * zero commit to trigger CI * zero commit to trigger CI * CI: try running on stateless branch * Revert "CI: try running on stateless branch" This reverts commit 98b5704fabcc6855225cc03e8f84e77f913fda08. * zero commit to trigger CI * core: add log for deferred exec of blocks due to invalid root err in parallel import * core: changed log to debug * core,eth,internal/ethapi: suppress error log for state sync when parallel import is enabled * eth/filters: fix gomock * eth: add missing configs * core: retry execution in parallel import for block validator errors * core: commit block before retrying failed execution * core: rm redundant witness in execResult * core: try flushing state db asap post execution * core: add temp log when committing code * core: enable retry for first block in batch * Revert "core: add temp log when committing code" This reverts commit cf63cc344ce7c0981c3410320dea04ba74dc448f. * core: cap workers spun * Revert "core: cap workers spun" This reverts commit 96367eebfe94dfc76e74dced4d55b71ce8a50896. * core: ensure statedb cleanup * core: add temp debug logs * core: add temp debug logs * core: add temp log to track sidechain * core: add temp log for contract code in commitAndFlush * core: add retry for all processing and internal state db errors * core: add state db error validation during deferred exec * core: add TestParallelStateless_ContractDeployedThenCalled * Revert "core: add temp log for contract code in commitAndFlush" This reverts commit 174765f14ced2ccff8cd8570935b04bda478342c. * Revert "core: add temp log to track sidechain" This reverts commit 29b33667a61016a7f2e71817c2bd287a27852f06. * Revert "core: add temp debug logs" This reverts commit 2c4f8407de9c4535565d092a4310576e6aab87b0. * internal/cli: turn off parallel import by default * internal/cli: update flag identation * CI: update comment * zero commit to trigger CI * internal/cli: rm witness prune flags --------- Co-authored-by: Pratik Patil <pratikspatil024@gmail.com> * emit event when write block in stateless node (ethereum#1840) * Address Biased Trie Cache (ethereum#1837) * Address-biased trie cache * Stop preload early * Print more stats * Increase cache size * Store by depth * More customizable * Add metrics * Fix key format * Async preload * Simplify * Fix linter * Allow interrupt * Remove free disk step in CI (ethereum#1843) Remove this step since the runner has 600G of disk space. * chore: bump kurtosis-pos (ethereum#1845) * chore: bump kurtosis-pos * chore: bump to v1.2.1 * PIP-74: state-sync txs inclusion (ethereum#1726) * first working version of new state sync tx * remove logs * fix lint * fix integration tests * test fixes * lint fix * fix parallel processor * type improvement and unit tests * remove duplicates * ignore data test output * fixing method calls and fields * build fix * remove bor filter for after hf blocks * sort logs just when necessary * ssTxs: fix receiptsHash mismatch (ethereum#1829) * ssTxs: fix receiptsHash mismatch * chore: remove block hash and number * revert: add block number * core(tx): fix Hash method for StateSyncTxType (ethereum#1830) * eth: fixes in receipt handling via p2p post HF (ethereum#1825) * eth: include bor receipts in ReceiptHash post HF * eth: rename to receiptListHash * eth: extrat typecasting logic for better testing, fix type matching issue * eth: add e2e tests for receipt delivery * eth/protocols/eth: apply HF logic while handling receipt query over eth69 * core: skip split receipts post HF * tests/bor: extend e2e test to check presence of state-sync in block * core/types: return nil for chainID() call over state-sync tx * internal/ethapi: handle bor txs and receipts post HF (ethereum#1834) * fix: cumulativeGasUsed in insertStateSyncTransactionAndCalculateReceipt (ethereum#1835) * fix: sort logs and remove duplicate append in Finalize (ethereum#1836) * fix: append tx only in FinalizeAndAssemble and use state instead of wrappedState * consensus/bor: sort logs before extracting state-sync logs * chore: revert statedb changes --------- Co-authored-by: Manav Darji <manavdarji.india@gmail.com> * chore: nit (ethereum#1838) * chore: typos * fix: lint * Renaming to Madhugiri HF * chore: nits * (fix): handle pointers to receipt list received via p2p * (chore): rename tests with HF name * chore: more nits * core: add blocktime in bor receipt logs (ethereum#1848) * core/types: derive bloom for bor receipts --------- Co-authored-by: kamuikatsurgi <shahkrishang11@gmail.com> Co-authored-by: Krishang Shah <109511742+kamuikatsurgi@users.noreply.github.com> Co-authored-by: Manav Darji <manavdarji.india@gmail.com> * core, miner, params, cmd: implement EIP-7823, EIP-7825 and EIP-7883 (ethereum#1842) * first working version of new state sync tx * remove logs * fix lint * fix integration tests * test fixes * lint fix * fix parallel processor * type improvement and unit tests * remove duplicates * ignore data test output * fixing method calls and fields * build fix * remove bor filter for after hf blocks * sort logs just when necessary * ssTxs: fix receiptsHash mismatch (ethereum#1829) * ssTxs: fix receiptsHash mismatch * chore: remove block hash and number * revert: add block number * core(tx): fix Hash method for StateSyncTxType (ethereum#1830) * eth: fixes in receipt handling via p2p post HF (ethereum#1825) * eth: include bor receipts in ReceiptHash post HF * eth: rename to receiptListHash * eth: extrat typecasting logic for better testing, fix type matching issue * eth: add e2e tests for receipt delivery * eth/protocols/eth: apply HF logic while handling receipt query over eth69 * core: skip split receipts post HF * tests/bor: extend e2e test to check presence of state-sync in block * core/types: return nil for chainID() call over state-sync tx * internal/ethapi: handle bor txs and receipts post HF (ethereum#1834) * fix: cumulativeGasUsed in insertStateSyncTransactionAndCalculateReceipt (ethereum#1835) * fix: sort logs and remove duplicate append in Finalize (ethereum#1836) * fix: append tx only in FinalizeAndAssemble and use state instead of wrappedState * consensus/bor: sort logs before extracting state-sync logs * chore: revert statedb changes --------- Co-authored-by: Manav Darji <manavdarji.india@gmail.com> * chore: nit (ethereum#1838) * chore: typos * fix: lint * Renaming to Madhugiri HF * chore: nits * (fix): handle pointers to receipt list received via p2p * (chore): rename tests with HF name * chore: more nits * core: implement eip7883 * core: implement eip7825 * fix lint * fix isOsaka condition * fix lint and add comment for int test failing * fix failing test / address comments * core: add blocktime in bor receipt logs (ethereum#1848) * core/types: derive bloom for bor receipts * prioritize ss hf / remove todo * fix lint * split precompiled contracts and addresses for madhugiri HF from osaka * revert err msg to align with geth * eth/gasestimator: check ErrGasLimitTooHigh conditions (ethereum#32348) This PR makes 2 changes to how [EIP-7825](ethereum#31824) behaves. When `eth_estimateGas` or `eth_createAccessList` is called without any gas limit in the payload, geth will choose the block's gas limit or the `RPCGasCap`, which can be larger than the `maxTxGas`. When this happens for `estimateGas`, the gas estimation just errors out and ends, when it should continue doing binary search to find the lowest possible gas limit. This PR will: - Add a check to see if `hi` is larger than `maxTxGas` and cap it to `maxTxGas` if it's larger. And add a special case handling for gas estimation execute when it errs with `ErrGasLimitTooHigh` --------- Co-authored-by: Gary Rong <garyrong0905@gmail.com> * internal/ethapi: skip tx gas limit check for calls (ethereum#32641) This disables the tx gaslimit cap for eth_call and related RPC operations. I don't like how this fix works. Ideally we'd be checking the tx gaslimit somewhere else, like in the block validator, or any other place that considers block transactions. Doing the check in StateTransition means it affects all possible ways of executing a message. The challenge is finding a place for this check that also triggers correctly in tests where it is wanted. So for now, we are just combining this with the EOA sender check for transactions. Both are disabled for call-type messages. * internal/ethapi: use gas from gaspool for call defaults * eth/gasestimator: check for madhugiri HF * consensus/bor: honour MaxTxGas for system calls --------- Co-authored-by: Lucca Martins <lucca_martins30@yahoo.com.br> Co-authored-by: kamuikatsurgi <shahkrishang11@gmail.com> Co-authored-by: Krishang Shah <109511742+kamuikatsurgi@users.noreply.github.com> Co-authored-by: Manav Darji <manavdarji.india@gmail.com> Co-authored-by: Minhyuk Kim <kimminhyuk1004@gmail.com> Co-authored-by: Gary Rong <garyrong0905@gmail.com> Co-authored-by: Felix Lange <fjl@twurst.com> * (chore): update madhugiri block number for amoy and update consensus block time (ethereum#1851) * first working version of new state sync tx * remove logs * fix lint * fix integration tests * test fixes * lint fix * fix parallel processor * type improvement and unit tests * remove duplicates * ignore data test output * fixing method calls and fields * build fix * remove bor filter for after hf blocks * sort logs just when necessary * ssTxs: fix receiptsHash mismatch (ethereum#1829) * ssTxs: fix receiptsHash mismatch * chore: remove block hash and number * revert: add block number * core(tx): fix Hash method for StateSyncTxType (ethereum#1830) * eth: fixes in receipt handling via p2p post HF (ethereum#1825) * eth: include bor receipts in ReceiptHash post HF * eth: rename to receiptListHash * eth: extrat typecasting logic for better testing, fix type matching issue * eth: add e2e tests for receipt delivery * eth/protocols/eth: apply HF logic while handling receipt query over eth69 * core: skip split receipts post HF * tests/bor: extend e2e test to check presence of state-sync in block * core/types: return nil for chainID() call over state-sync tx * internal/ethapi: handle bor txs and receipts post HF (ethereum#1834) * fix: cumulativeGasUsed in insertStateSyncTransactionAndCalculateReceipt (ethereum#1835) * fix: sort logs and remove duplicate append in Finalize (ethereum#1836) * fix: append tx only in FinalizeAndAssemble and use state instead of wrappedState * consensus/bor: sort logs before extracting state-sync logs * chore: revert statedb changes --------- Co-authored-by: Manav Darji <manavdarji.india@gmail.com> * chore: nit (ethereum#1838) * chore: typos * fix: lint * Renaming to Madhugiri HF * chore: nits * (fix): handle pointers to receipt list received via p2p * (chore): rename tests with HF name * chore: more nits * core: add blocktime in bor receipt logs (ethereum#1848) * core/types: derive bloom for bor receipts * (chore): update madhugiri block for amoy * Change consensus block time to 1s at Madhugiri HF (ethereum#1852) --------- Co-authored-by: Lucca Martins <lucca_martins30@yahoo.com.br> Co-authored-by: kamuikatsurgi <shahkrishang11@gmail.com> Co-authored-by: Krishang Shah <109511742+kamuikatsurgi@users.noreply.github.com> Co-authored-by: Jerry <jerrycgh@gmail.com> * Completed reset prefetcher (ethereum#1853) * internal/ethapi: restore original RPC gas cap (ethereum#1856) * Restore original RPC gas cap * Fix test * chore: fix wrong function name in comment (ethereum#1841) Signed-off-by: reddaisyy <reddaisy@outlook.jp> * Revert "(chore): update madhugiri block number for amoy and update consensus block time (ethereum#1851)" (ethereum#1857) This reverts commit 103f1f4. * params: bump version to v2.4.0-beta * Dont OpenTrie because HistoricDatabase doesnt implement trie (ethereum#1858) * chore: use 2^25 as the MaxTxGas (ethereum#1859) * params: bump version to v2.4.0-beta2 * chore: set madhugiri block for amoy and consensus block time (ethereum#1867) * chore: set madhugiri block for amoy and consensus block time * params: bump version * Disable txn indexer in stateless mode Since stateless nodes are using a different pruner from the regular pruner in geth, the transaction indexer will have problem with pruned blocks, resulting hanging goroutines in memory. This change prevents stateless node from running indexer on pruned blocks. * core: init PrecompiledAddressesMadhugiri * params: bump version * params: bump version for stable release * chore: set madhugiri block for mainnet and consensus block time * chore: fix lint * Include missing check of P256 on Amoy (ethereum#1877) * checks p256 instruction * lint fix * v2.5.0 candidate (ethereum#1878) * chore: MadhugiriPro hf * params: use stable tag * chore: set new HF heights * Reinforce Precompile Check on any new HF (ethereum#1881) * Include missing check of P256 on Amoy (ethereum#1877) * checks p256 instruction * lint fix * reinforce precompilers check --------- Co-authored-by: Lucca Martins <lucca_martins30@yahoo.com.br> --------- Signed-off-by: reddaisyy <reddaisy@outlook.jp> Co-authored-by: Raneet Debnath <35629432+Raneet10@users.noreply.github.com> Co-authored-by: Pratik Patil <pratikspatil024@gmail.com> Co-authored-by: Lucca Martins <lucca_martins30@yahoo.com.br> Co-authored-by: Jerry <jerrycgh@gmail.com> Co-authored-by: Krishang Shah <109511742+kamuikatsurgi@users.noreply.github.com> Co-authored-by: kamuikatsurgi <shahkrishang11@gmail.com> Co-authored-by: Manav Darji <manavdarji.india@gmail.com> Co-authored-by: Minhyuk Kim <kimminhyuk1004@gmail.com> Co-authored-by: Gary Rong <garyrong0905@gmail.com> Co-authored-by: Felix Lange <fjl@twurst.com> Co-authored-by: reddaisyy <reddaisy@outlook.jp> Co-authored-by: Angel Valkov <avalkov@polygon.technology>
nodeDB.querySeeds was not safe for concurrent use but could be called
concurrenty on multiple goroutines in the following case:
These conditions are unlikely to coincide during normal use, but are
much more likely to occur all at once when the user's machine just woke
from sleep. The root cause of the issue is that querySeeds reused the
same leveldb iterator until it was exhausted.
This commit moves the refresh scheduling logic into its own goroutine
(so only one refresh is ever active) and changes querySeeds to not use
a persistent iterator. The seed node selection is now more random and
ignores nodes that have not been contacted in the last 5 days.
Fixes #1660.
@karalabe wrote the previous implementation of the refresh logic and might be the
best person to judge these changes.
Note: The last commit removes the strict version matching on discovery packets
because it makes protocol upgrades much harder. I'm trying to sneak this in before
1.2.0.