Add SealingState; don't prepare block when not ready.#103
Conversation
|
In my local test, it worked as expected, and it seemed to prevent most of the cases where we prepared unsealable blocks, but not all. This looks like a safe change to me, but I wonder why |
|
Can we propose this upstream? |
|
I suspect the problem is missing synchronization. Can we place a global lock around the engine and miner? |
|
Yes, I suspect that, too: Ideally, we do want to start preparing a block, even if our turn has not quite started yet, so that when it starts, we're ready to seal it. So basically we want to prepare one in our predecessor's turn as soon as we've received our predecessor's block, i.e. it shouldn't depend on time at all, but only on the latest imported block. So maybe this PR is the wrong approach after all?
Somehow it seems in Parity every struct has all of its fields locked internally. It may be worth a try, but I suspect that global locks around engine and miner would cause them to deadlock, since there are places where (directly or indirectly) they call each other in both directions. 😬 |
|
I think so. I suspect that the real answer is that anything we need to do in a block we are sealing, needs to be done in the actual function that seals the block. |
So you're essentially saying the issue doesn't have a solution, and we always need to prepare the block and try to seal it, like we currently do? |
36d9a4c to
296a483
Compare
|
I force-pushed a new approach that doesn't use |
|
I also addressed the TODO: In |
|
@afck In retrospect, what I am saying is: |
DemiMarie
left a comment
There was a problem hiding this comment.
What will happen if we are selected to validate for the first block after genesis? This should be checked.
Otherwise, this code looks good to me.
| }; | ||
|
|
||
| let parent_step = header_step(&parent, self.empty_steps_transition) | ||
| .expect("Header has been verified; qed"); |
There was a problem hiding this comment.
Could this fail due to an I/O error involving the database?
|
If I'm not mistaken, @DemiMarie 's remark can be extended to the case when at least one of the validators is down and skips blocks - then the network gets stuck because |
You're right! 😩 I don't have an idea yet how to solve this. But somehow |
|
@afck is it in |
|
Fixed. We were using the wrong |
|
@phahulin can you test this? |
|
@DemiMarie couldn't get past the first few (1-2) blocks with all nodes up: I've seen |
|
I can confirm that it fails for me, too, both with and without the last commit, if I disable nodes 2 and 5. |
|
…but disabling the sealing queue (i.e. explicitly never taking that branch) fixes it for me! |
false from seals_internally if not our turn.false from seals_internally if not our turn.
|
With this change it works for me. Maybe |
|
IMO we shouldn't merge it yet, because the issue itself is not a blocker, and it's not yet clear (at least to me), what can we break by disabling the sealing queue. |
|
I agree; let's not merge something where we don't know what we're doing: |
|
OK, I think
But I'm not sure yet what the proper solution is. |
false from seals_internally if not our turn.false from seals_internally if not our turn.
|
@afck |
|
Good point… so maybe we don't need |
|
I think that we should keep the enum SealingState {
Ready,
NotReady,
External,
}would be a good fit? Also, if we never use the sealing queue, we can delete all of its supporting infrastructure. |
DemiMarie
left a comment
There was a problem hiding this comment.
We should not include SealingState::Prepare but never construct it. Either there should be a use for it, or it should be removed.
Also, we should panic!() if we hit a branch that is unreachable.
|
Thanks, I renamed the variants as you suggested, and removed I had originally meant |
DemiMarie
left a comment
There was a problem hiding this comment.
Panic message needs to be updated. Otherwise, looks good.
38fdfc9 to
ae424a0
Compare
ae424a0 to
c6b0193
Compare
vkomenda
left a comment
There was a problem hiding this comment.
Looks good. Some integration testing might be required as a separate task perhaps.
c6b0193 to
9549426
Compare
false from seals_internally if not our turn.|
I squashed and rebased. |
9549426 to
389c520
Compare
|
@afck Please clarify how this can be tested? If I add a node which is not a validator to In other words, how can I understand that these changes work fine? |
|
I think it rather affects nodes that are validators: Without this change, they print messages that they are not ready to seal, because they prepare blocks all the time, even if it's not their turn: But for some reason they're not even warnings, just traces, even though it seems like a big waste of CPU time. The hard part is: Could this PR cause any other problems in practice? That's why I'd really like someone to review it who is very familiar with the inner workings of Parity and Aura, and it's why I also made an upstream PR: openethereum#10529 |
|
Ok, I left ping in that PR, hope the guys will help to check the code. |
|
Since Parity approved the same changes for upstream, I'll try to launch the build based on the |
|
I tried to launch it and don't see the trace message |
This prevents the miner from creating new blocks for sealing even if it's not our turn to seal them.
However, it likely does not detect all cases yet where it's not our turn.
(Closes #70.)