op-challenger: PoC for super node trace provider#18617
op-challenger: PoC for super node trace provider#18617
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## develop #18617 +/- ##
===========================================
- Coverage 72.22% 70.72% -1.50%
===========================================
Files 189 134 -55
Lines 11163 7132 -4031
===========================================
- Hits 8062 5044 -3018
+ Misses 2955 2088 -867
+ Partials 146 0 -146
Flags with carried forward coverage won't be shown. Click here to find out more. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Switched this to max instead of min because we want to know if all data required to fully verify the L2 blocks at the requested timestamp is available from the game's L1 head.
There was a problem hiding this comment.
I suspect to get the expected handling of numbers we should be using timestamp hexutil.Uint64 here rather than just uint64. JSON_RPC (sadly) uses hex numbers and it would be very confusing if we wound up with some APIs needing hex and some decimal.
852caa7 to
05899cf
Compare
05899cf to
355af0e
Compare
…n block data not available instead of not found error.
…ound, not on all errors.
| if !ok { | ||
| return nil, fmt.Errorf("unsupported super root type %T", nextRoot.Super) | ||
| } | ||
| for i := uint64(0); i < min(step, uint64(len(nextSuperV1.Chains))); i++ { |
There was a problem hiding this comment.
a note for the supernode RPC: We expect the chains in the response to be sorted by their chain IDs
There was a problem hiding this comment.
Definitely. eth.NewSuperV1 will do that automatically - otherwise you'd get a different (incorrect) super root.
Ensures that challenger gets all data required to calculate the claim in a single request.
05353aa to
ea0caca
Compare
| g.logger.Trace("Checking if actions are required") | ||
| if err := g.actor.Act(ctx); err != nil { | ||
| if err := g.actor.Act(ctx); errors.Is(err, client.ErrNotInSync) { | ||
| g.logger.Warn("Local node not sufficiently up to date to act on game", "err", err) |
There was a problem hiding this comment.
| g.logger.Warn("Local node not sufficiently up to date to act on game", "err", err) | |
| g.logger.Error("Local node not sufficiently up to date to act on game", "err", err) |
This shouldn't happen given the ValidateNodeSynced precheck. Suggests either a bug or a bad supernode RPC.
There was a problem hiding this comment.
Yeah I'm tossing up whether we should just skip the pre-validate step for the Supernode since we can handle it here. Otherwise the sync validator would just make this same superroot_atTimestamp request and check only the CurrentL1 value, throwing away the rest of the data.
I'm also thinking a bit more about how to better handle load balanced nodes where we might get one in sync response and then a not in sync response from another node. It would be nice to handle that nicely at least if the nodes are both healthy but may have a small lag in the latest update. That way you can have a proxy with pretty simple health checks that works, without needing to have the full consensus aware handling that we do for ELs.
There was a problem hiding this comment.
I'm not sure whether load balancing is needed (it's not expensive to load a super root is it?). But improving robustness against out of synced nodes would be nice indeed. The CurrentL1 is the only constraint to do this since you can check every node until there's one that's synced past the CurrentL1.
There was a problem hiding this comment.
Load balancing isn't required for performance, but is very useful for reliability. A load balancer could handle some nodes being down easily and with even very basic health checks could remove a node that's too far out of sync from active service. Just right now the challenger requires consistency across multiple calls which doesn't work with simple load balancers. Fixing that single point of failure is a very common request and we've had quite a few cases where the challenger did the wrong thing because it was pointed at a load balancer and got inconsistent responses.
| // verifiedAt returns the L2 block which is fully verified at the given timestamp, and the minimum L1 block at which verification is possible | ||
| verifiedL2, verifiedL1, err := chain.VerifiedAt(ctx, timestamp) | ||
| if err != nil { | ||
| if errors.Is(err, engine_controller.ErrNotFound) { |
There was a problem hiding this comment.
This is one of the few things/idioms I'm not thrilled about in Go. I'm concerned that as supernode evolves this specific error value gets lost. The unit test we have asserting AtTimestamp behavior isn't robust against changes to the engine controller changing the returned error type. Since the op-challenger is highly sensitive to the error here, it would be good to have extra guards.
Consider making the following changes:
- document clearly here and in engine_controller that
engine_controller.ErrNotFoundcan be returned (I can see someone easily confusing that error value withethereum.NotFound). - or, change the interface to also return a bool that clearly indicates no data found.
There was a problem hiding this comment.
Yeah I was thinking that we should switch it to return ethereum.NotFound as the error type and document that as part of the API. A bool for found/not found could be a good idea too.
There was a problem hiding this comment.
Since the op-challenger is highly sensitive to the error here, it would be good to have extra guards.
If we're needing to preserve error semantics across services, it reads more like a business-logic signal than a genuine error.
I could imagine a more robust version of Super-Root where it almost never returns an actual error, but leaves sections of data blank where none is available, likely attaching reasonings (like the bool Adrian is suggesting). This would let the challenger handle all the error inference.
…quest." Doesn't seem worth it. This reverts commit ea0caca.
| type SuperRootResponseData struct { | ||
| // UnverifiedAtTimestamp is the L2 block that would be applied if verification were assumed to be successful, | ||
| // and the minimum L1 block required to derive them. | ||
| UnverifiedAtTimestamp map[eth.ChainID]OutputWithRequiredL1 `json:"unverified_at_timestamp"` | ||
|
|
||
| // VerifiedRequiredL1 is the minimum L1 block including the required data to fully verify all blocks at this timestamp | ||
| VerifiedRequiredL1 eth.BlockID `json:"verified_required_l1"` | ||
|
|
||
| // Super is the unhashed data for the superroot at the given timestamp after all verification is applied. | ||
| Super eth.Super `json:"super"` | ||
|
|
||
| // SuperRoot is the superroot at the given timestamp after all verification is applied. | ||
| SuperRoot eth.Bytes32 `json:"super_root"` | ||
| } |
There was a problem hiding this comment.
Just noting for myself
| old | new | difference |
|---|---|---|
| CurrentL1Derived | removed? | |
| CurrentL1Verified | removed? | |
| VerifiedAtTimestamp | VerifiedRequiredL1 | just the L1 part of the verification data |
| OptimisticAtTimestamp | UnferifiedAtTimestamp | just naming change I think |
| Min* | Min Values Removed | |
| Super | Added, but I'm not sure what it newly satisfies |
Mostly this seems fine, I just need to understand why CurrentL1Verified isn't needed anymore, and what Super provides. I think the CurrentVerifiedL1 is gone because now the logic will just return the unverified data if verification isn't available.
There was a problem hiding this comment.
I think you found this over in #18652 but, CurrentL1 is effectively CurrentL1Verified now. This is only the internal Data object which returns SuperRoot data if available. The outer wrapper has info about the processed L1 and is returned even if a super root isn't found.
Description
Quick PoC to check compatibility of the super node superRootAtTimestamp response with op-challenger. Needed to tweak a couple of things but it hangs together ok.
Not trying to be clever with code reuse and haven't even integrated it with challenger yet. Just de-risking to see if the required data is actually available.
Metadata
Part of #18524