op-node: Prefer following seq drift instead of conf depth#3861
op-node: Prefer following seq drift instead of conf depth#3861
Conversation
|
3fc3794 to
e684af2
Compare
6604552 to
2463c11
Compare
|
Hey @trianglesphere! This PR has merge conflicts. Please fix them before continuing review. |
2463c11 to
5a812b5
Compare
e684af2 to
060673f
Compare
|
Hey @trianglesphere! This PR has merge conflicts. Please fix them before continuing review. |
We found a log on a testnet where a batch was rejected because it's timestamp was greater than the origin timestamp + sequencer drift. I looked into origin selection and determined we produce empty blocks if the block time is ahead of the origin + seq drift; however if there is a next origin, it is not enough to produce an empty block, we must change the origin. I hypothesize that this occurred because the L1 head that the node was looking at was lagging. That would cause the node to keep using an old origin even if the timestamp was past the max drift. I added a unit test which shows this case & would fail without this fix being applied.
060673f to
3306cbe
Compare
|
@Mergifyio refresh |
✅ Pull request refreshed |
protolambda
left a comment
There was a problem hiding this comment.
What happens if the next L1 origin time is more than the sequencer time drift away from the current L1 origin? It should keep using the same L1 origin and force the blocks to be empty iirc.
But with this check, we can't produce a L2 block with the an old origin anymore, and get stuck with sequencing I think, until block derivation produces the block.
And when the next L1 origin time is within the time drift span from the current origin, we can stay on the old origin safely I believe?
Then it should compare the next L2 time vs the child block of the head L2 block's L1 Origin. Sequencer drift is only about the L2 block time vs the L1 origin time of that block. Note: If we lose connection to L1 for too long (more than the sequencer drift), the sequencer may produce blocks when it should not have. (It must produce empty block for all L2 times from l1Origin.Time + seq drift to lOriginNext.Time). However, if the sequencer produced empty blocks past l1OriginNext.Time because it did not have l1OriginNext, ignore those batches. |
|
Say if we have: Then with this PR: But the next origin time is way ahead: |
If the current origin is past the sequencer drift, it needs to ignore the conf depth staying, but does still need to check next L2 block time against the next origin time. I've also added a test for this case.
|
@protolambda I see what you mean. My mistake was that I was primarily considering an online sequencer where even if the next origin is past the seq drift, we would not receive it until the L2 block time was read - it's a bad assumption about timings. I've fixed this so it ignores the conf depth if it is past the seq depth so it then uses the time check. I've also added a test case based on the times you gave. |
|
Comments & clarifications. I'd need to do a deeper dive on the code to see what happens there, but I already have a lot to say based on the discussion alone:
Confirming this, but reframing: If we're ahead of origin + seq drift, the only allowed move is to create a new block with a new origin (since that moves the L2 time 2s but the moves L1 time at least 12s, hence reducing the drift). Though not strictly necessary (in terms of properties), it's probably a good move to indeed make these blocks empty — the spec mandates it as we'll see.
Can this happen — that so many slots fail that two successive L1 blocks are very far away from one another? But anyway, if I understand correctly, the issue here would be that that new origin's timestamp is actually now ahead of the L2 head, meaning we do need to output more empty blocks to catch up. This is definitely an edge case to consider. In general, the algorithm should then be:
We can, but I think the right move is to adopt new origins as fast as possible to reduce the drift. Btw, these conversations should be related back to the spec. There are two relevant snippets:
where The first snippet seems correct: it allows us to go past the drift if we have to — though that should only be allowed if we reduce the drift. I guess that's something it doesn't say, and also " The second snippet seems correct as well! If we're past the drift, generate an empty block, and since we have a drift, we need to adopt the next origin ( |
| // the current origin block. | ||
| nextOrigin, err := los.l1.L1BlockRefByNumber(ctx, currentOrigin.Number+1) | ||
| if err != nil { | ||
| // TODO: this could result in a bad origin being selected if we are past the seq |
There was a problem hiding this comment.
What is the deal with this todo?
There was a problem hiding this comment.
This would occur if pastSeqDrift is true & we fail to get the potential next L1 origin. In the case that the next L1 origin block time would have been valid (i.e. the next L2 time >= nextOrigin.Time) we would incorrectly return the currentOrigin instead of the next origin. This is a bit tricky because we don't know if the nextOrigin is valid until after we get it, so only part of the time the currentOrigin is invalid & part of the time it is valid.
protolambda
left a comment
There was a problem hiding this comment.
Change LGTM, and fixes part of the problem, but I think the discussion here + the new added todo highlights something that's still possible to go wrong on a testnet, and we should fix too. Let's discuss it on Monday.
Description
We found a log on a testnet where a batch was rejected because it's timestamp was greater than the origin timestamp + sequencer drift. I looked into origin selection and determined we produce empty blocks if the block time is ahead of the origin + seq drift; however if there is a next origin, it is not enough to produce an empty block, we must change the origin.
I hypothesize that this occurred because the L1 head that the node was looking at was lagging. That would cause the node to keep using an old origin even if the timestamp was past the max drift. I added a unit test which shows this case & would fail without this fix being applied.
Tests
Unit tested with a regression test.
Metadata