Skip to content

op-challenger: PoC for super node trace provider#18617

Closed
ajsutton wants to merge 11 commits intodevelopfrom
aj/supernode-challenger
Closed

op-challenger: PoC for super node trace provider#18617
ajsutton wants to merge 11 commits intodevelopfrom
aj/supernode-challenger

Conversation

@ajsutton
Copy link
Contributor

Description

Quick PoC to check compatibility of the super node superRootAtTimestamp response with op-challenger. Needed to tweak a couple of things but it hangs together ok.

Not trying to be clever with code reuse and haven't even integrated it with challenger yet. Just de-risking to see if the required data is actually available.

Metadata

Part of #18524

@codecov
Copy link

codecov bot commented Dec 16, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 70.72%. Comparing base (5401e3a) to head (d0aa6f6).

Additional details and impacted files
@@             Coverage Diff             @@
##           develop   #18617      +/-   ##
===========================================
- Coverage    72.22%   70.72%   -1.50%     
===========================================
  Files          189      134      -55     
  Lines        11163     7132    -4031     
===========================================
- Hits          8062     5044    -3018     
+ Misses        2955     2088     -867     
+ Partials       146        0     -146     
Flag Coverage Δ
cannon-go-tests-64 ?
contracts-bedrock-tests 70.72% <ø> (-4.23%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.
see 64 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switched this to max instead of min because we want to know if all data required to fully verify the L2 blocks at the requested timestamp is available from the game's L1 head.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect to get the expected handling of numbers we should be using timestamp hexutil.Uint64 here rather than just uint64. JSON_RPC (sadly) uses hex numbers and it would be very confusing if we wound up with some APIs needing hex and some decimal.

@ajsutton ajsutton force-pushed the aj/supernode-challenger branch from 852caa7 to 05899cf Compare December 17, 2025 02:47
@ajsutton ajsutton force-pushed the aj/supernode-challenger branch from 05899cf to 355af0e Compare December 18, 2025 00:38
if !ok {
return nil, fmt.Errorf("unsupported super root type %T", nextRoot.Super)
}
for i := uint64(0); i < min(step, uint64(len(nextSuperV1.Chains))); i++ {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a note for the supernode RPC: We expect the chains in the response to be sorted by their chain IDs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely. eth.NewSuperV1 will do that automatically - otherwise you'd get a different (incorrect) super root.

Ensures that challenger gets all data required to calculate the claim in a single request.
@ajsutton ajsutton force-pushed the aj/supernode-challenger branch from 05353aa to ea0caca Compare December 18, 2025 20:54
g.logger.Trace("Checking if actions are required")
if err := g.actor.Act(ctx); err != nil {
if err := g.actor.Act(ctx); errors.Is(err, client.ErrNotInSync) {
g.logger.Warn("Local node not sufficiently up to date to act on game", "err", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
g.logger.Warn("Local node not sufficiently up to date to act on game", "err", err)
g.logger.Error("Local node not sufficiently up to date to act on game", "err", err)

This shouldn't happen given the ValidateNodeSynced precheck. Suggests either a bug or a bad supernode RPC.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I'm tossing up whether we should just skip the pre-validate step for the Supernode since we can handle it here. Otherwise the sync validator would just make this same superroot_atTimestamp request and check only the CurrentL1 value, throwing away the rest of the data.

I'm also thinking a bit more about how to better handle load balanced nodes where we might get one in sync response and then a not in sync response from another node. It would be nice to handle that nicely at least if the nodes are both healthy but may have a small lag in the latest update. That way you can have a proxy with pretty simple health checks that works, without needing to have the full consensus aware handling that we do for ELs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure whether load balancing is needed (it's not expensive to load a super root is it?). But improving robustness against out of synced nodes would be nice indeed. The CurrentL1 is the only constraint to do this since you can check every node until there's one that's synced past the CurrentL1.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Load balancing isn't required for performance, but is very useful for reliability. A load balancer could handle some nodes being down easily and with even very basic health checks could remove a node that's too far out of sync from active service. Just right now the challenger requires consistency across multiple calls which doesn't work with simple load balancers. Fixing that single point of failure is a very common request and we've had quite a few cases where the challenger did the wrong thing because it was pointed at a load balancer and got inconsistent responses.

// verifiedAt returns the L2 block which is fully verified at the given timestamp, and the minimum L1 block at which verification is possible
verifiedL2, verifiedL1, err := chain.VerifiedAt(ctx, timestamp)
if err != nil {
if errors.Is(err, engine_controller.ErrNotFound) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is one of the few things/idioms I'm not thrilled about in Go. I'm concerned that as supernode evolves this specific error value gets lost. The unit test we have asserting AtTimestamp behavior isn't robust against changes to the engine controller changing the returned error type. Since the op-challenger is highly sensitive to the error here, it would be good to have extra guards.
Consider making the following changes:

  • document clearly here and in engine_controller that engine_controller.ErrNotFound can be returned (I can see someone easily confusing that error value with ethereum.NotFound).
  • or, change the interface to also return a bool that clearly indicates no data found.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I was thinking that we should switch it to return ethereum.NotFound as the error type and document that as part of the API. A bool for found/not found could be a good idea too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the op-challenger is highly sensitive to the error here, it would be good to have extra guards.

If we're needing to preserve error semantics across services, it reads more like a business-logic signal than a genuine error.

I could imagine a more robust version of Super-Root where it almost never returns an actual error, but leaves sections of data blank where none is available, likely attaching reasonings (like the bool Adrian is suggesting). This would let the challenger handle all the error inference.

…quest."

Doesn't seem worth it.

This reverts commit ea0caca.
Comment on lines +49 to +62
type SuperRootResponseData struct {
// UnverifiedAtTimestamp is the L2 block that would be applied if verification were assumed to be successful,
// and the minimum L1 block required to derive them.
UnverifiedAtTimestamp map[eth.ChainID]OutputWithRequiredL1 `json:"unverified_at_timestamp"`

// VerifiedRequiredL1 is the minimum L1 block including the required data to fully verify all blocks at this timestamp
VerifiedRequiredL1 eth.BlockID `json:"verified_required_l1"`

// Super is the unhashed data for the superroot at the given timestamp after all verification is applied.
Super eth.Super `json:"super"`

// SuperRoot is the superroot at the given timestamp after all verification is applied.
SuperRoot eth.Bytes32 `json:"super_root"`
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just noting for myself

old new difference
CurrentL1Derived removed?
CurrentL1Verified removed?
VerifiedAtTimestamp VerifiedRequiredL1 just the L1 part of the verification data
OptimisticAtTimestamp UnferifiedAtTimestamp just naming change I think
Min* Min Values Removed
Super Added, but I'm not sure what it newly satisfies

Mostly this seems fine, I just need to understand why CurrentL1Verified isn't needed anymore, and what Super provides. I think the CurrentVerifiedL1 is gone because now the logic will just return the unverified data if verification isn't available.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you found this over in #18652 but, CurrentL1 is effectively CurrentL1Verified now. This is only the internal Data object which returns SuperRoot data if available. The outer wrapper has info about the processed L1 and is returned even if a super root isn't found.

@ajsutton ajsutton closed this Dec 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants

Comments