miner: exit loop when downloader Done or Failed#21653
miner: exit loop when downloader Done or Failed#21653meowsbits wants to merge 17 commits intoethereum:masterfrom
Conversation
Following the logic of the comment at the method, this fixes a regression introduced at 7cf56d6 , which would allow external parties to DoS with blocks, preventing mining progress. Signed-off-by: meows <b5c6@protonmail.com>
Signed-off-by: meows <b5c6@protonmail.com>
Signed-off-by: meows <b5c6@protonmail.com>
This reverts commit eaefcd3.
This reverts commit 23abd34.
Signed-off-by: meows <b5c6@protonmail.com>
This helper function would return an affirmation on the first positive match on a desired bool. This was imprecise; it return false positives by not waiting initially for an 'updated' value. This fix causes TestMiner_2 to fail, which is expected. Signed-off-by: meows <b5c6@protonmail.com>
This test demonstrated the imprecision of the test helper function waitForMiningState. This function has been fixed with 6d365c2, and this test test may now be removed. Signed-off-by: meows <b5c6@protonmail.com>
See comment for logic. Signed-off-by: meows <b5c6@protonmail.com>
We expect that once the downloader emits a DoneEvent, signaling a successful sync, that subsequent StartEvents are not longer permitted to stop the miner. This prevents a security vulnerability where forced syncs via fake high blocks would stall mining operation. Signed-off-by: meows <b5c6@protonmail.com>
- Break downloader event handling into event separating Done and Failed events. We need to treat these cases differently since a DoneEvent should prevent the miner from being stopped on subsequent downloader Start events. - Use canStop state to handle the one-off case when a downloader first succeeds. Signed-off-by: meows <b5c6@protonmail.com>
Signed-off-by: meows <b5c6@protonmail.com>
|
@holiman PTAL. Please note that the tests were broken (
|
Signed-off-by: meows <b5c6@protonmail.com>
558efc3 to
bfe2728
Compare
This makes mining pause/start logic regarding downloader events more explicit. Instead of eternally handling downloader events after the first done event, the subscription is closed when downloader events are no longer actionable. Signed-off-by: meows <b5c6@protonmail.com>
Signed-off-by: meows <b5c6@protonmail.com>
Signed-off-by: meows <b5c6@protonmail.com>
|
Hey @meowsbits Thank you for fixing my shit code and sorry for all the troubles I caused. |
The go routine handling the downloader events handling vars in parallel with the parent routine, causing a race condition. This change, though ugly, remove the condition while still allowing the downloader event subscription to be closed when the miner has no further use for it (ie DoneEvent).
|
I created a fix that passes your test cases here: https://github.com/MariusVanDerWijden/go-ethereum/tree/miner_updateloop_fix |
|
Ah just saw that you pushed something. I think our solutions are mostly identical now (you do have better naming though :D) |
|
Nice catch @MariusVanDerWijden! 68946a9 fixes that, although it's certainly not the prettiest code, having two near-duplicate for/select loops. I also considered using atomics/mutexes to hold the contended variables, but given the apparent reasoning behind the initial PR (replacing atomics with channels) that seemed backward-working ;) If you have an alternative or alternative preference, let me know! |
|
Ah! Same place, same time! 😃 |
|
Just to note as well that although I had indeed introduced another race, there is a pre-existing race condition in the ubp52 ~/go/src/github.com/ethereum/go-ethereum foundation-master=
> git rev-parse HEAD
9d1e2027a05ed099b9ef43021a6cb09839e94a3a
ubp52 ~/go/src/github.com/ethereum/go-ethereum foundation-master=
> go test -race ./miner/...
==================
WARNING: DATA RACE
Read at 0x000001389720 by goroutine 31:
github.com/ethereum/go-ethereum/miner.(*worker).newWorkLoop()
/home/ia/go/src/github.com/ethereum/go-ethereum/miner/worker.go:380 +0xcbd
Previous write at 0x000001389720 by goroutine 80:
[failed to restore the stack]
Goroutine 31 (running) created at:
github.com/ethereum/go-ethereum/miner.newWorker()
/home/ia/go/src/github.com/ethereum/go-ethereum/miner/worker.go:228 +0xa90
github.com/ethereum/go-ethereum/miner.New()
/home/ia/go/src/github.com/ethereum/go-ethereum/miner/miner.go:75 +0x128
github.com/ethereum/go-ethereum/miner.createMiner()
/home/ia/go/src/github.com/ethereum/go-ethereum/miner/miner_test.go:182 +0x926
github.com/ethereum/go-ethereum/miner.TestMiner()
/home/ia/go/src/github.com/ethereum/go-ethereum/miner/miner_test.go:83 +0x44
testing.tRunner()
/home/ia/go1.15.linux-amd64/src/testing/testing.go:1108 +0x202
Goroutine 80 (finished) created at:
testing.(*T).Run()
/home/ia/go1.15.linux-amd64/src/testing/testing.go:1159 +0x796
testing.runTests.func1()
/home/ia/go1.15.linux-amd64/src/testing/testing.go:1430 +0xa6
testing.tRunner()
/home/ia/go1.15.linux-amd64/src/testing/testing.go:1108 +0x202
testing.runTests()
/home/ia/go1.15.linux-amd64/src/testing/testing.go:1428 +0x5aa
testing.(*M).Run()
/home/ia/go1.15.linux-amd64/src/testing/testing.go:1338 +0x4eb
main.main()
_testmain.go:71 +0x236
==================
--- FAIL: TestGenerateBlockAndImportEthash (2.08s)
testing.go:1023: race detected during execution of test
FAIL
FAIL github.com/ethereum/go-ethereum/miner 10.502s
FAIL |
|
Yeah, the existing race seems to be on |
|
Instead of having two near-identical loops, something like this can be used: diff --git a/miner/miner.go b/miner/miner.go
index c8bb7fbca9..98506a4f17 100644
--- a/miner/miner.go
+++ b/miner/miner.go
@@ -127,12 +127,16 @@ func (miner *Miner) update() {
// withDownloaderState loop handles channeled events while the miner cares about the downloader events.
// This loop will be broken once the downloader emits a DoneEvent, signaling a successful sync.
// The following unnamed loop should be equivalent to this loop, but without the downloader event handling.
-withDownloaderState:
+ var ignoreDownloader = false
for {
select {
case ev := <-downloaderEvents.Chan():
+ if ignoreDownloader {
+ continue
+ }
if ev == nil {
- break withDownloaderState
+ ignoreDownloader = true
+ continue
}
handleDownloaderEvents(ev)
case addr := <-miner.startCh:
@@ -149,24 +153,6 @@ withDownloaderState:
return
}
}
-
- // Same loop as above, minus downloader event handling.
- for {
- select {
- case addr := <-miner.startCh:
- if !isDownloading {
- miner.SetEtherbase(addr)
- miner.worker.start()
- }
- miningRequested = true
- case <-miner.stopCh:
- miningRequested = false
- miner.worker.stop()
- case <-miner.exitCh:
- miner.worker.close()
- return
- }
- }
}
func (miner *Miner) Start(coinbase common.Address) {
|
|
@holiman Reading from a closed channel will return the zero value, so this will create a busy loop. |
|
What about this: for {
if canSync {
select {
case ev := <-events.Chan():
if ev == nil {
// Subscription closed, don't poll
canSync = false
}
handleDownloaderEvent(ev)
case addr := <-miner.startCh:
if canStart {
miner.SetEtherbase(addr)
miner.worker.start()
}
shouldStart = true
case <-miner.stopCh:
shouldStart = false
miner.worker.stop()
case <-miner.exitCh:
miner.worker.close()
return
}
} else {
select {
case addr := <-miner.startCh:
if canStart {
miner.SetEtherbase(addr)
miner.worker.start()
}
shouldStart = true
case <-miner.stopCh:
shouldStart = false
miner.worker.stop()
case <-miner.exitCh:
miner.worker.close()
return
}
}
} |
|
BTW: I fixed the third race (the one thats on master) here: #21664 |
|
I can't say I think this is really any better than as-is. It merges two for loops into one, which indeed removes a couple lines of code, but doesn't really change the essential duplicity of the original, and comes at the expense of quite non-idiomatic code structure, and another function-scoped (and mysteriously named 😉 ) bool. We would also want to What was the reasoning behind moving from atomics to channels in the first place? If the urge to refactor the duplicity from the code as-is is strong enough, and function-scoped atomics are OK, then I would recommend moving the With that said, I'm OK with the code as-is since I do think it's readable, tested, and documented, and I'm not a fanatic about DRYness in the first place. I also have the relative urgency of the need for a fix in mind. If either of you do feel strongly one way or another, I'll be happy to follow your leads. |
MariusVanDerWijden
left a comment
There was a problem hiding this comment.
I'm fine with it too
|
Merged in #21701 |
Following the logic of the comment at the method,
this fixes a regression introduced at 7cf56d6, which would allow external parties to DoS with blocks, preventing mining progress.
Rel