-
Notifications
You must be signed in to change notification settings - Fork 238
feat: forced inclusion #2797
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat: forced inclusion #2797
Conversation
|
The latest Buf updates on your PR. Results from workflow CI and Release / buf-check (pull_request).
|
|
Claude finished @julienrbrt's task —— View job Comprehensive Code Review: Forced Inclusion MechanismI've completed a thorough review of this forced inclusion implementation. This is a significant feature adding censorship resistance and based sequencer capabilities to ev-node. Overall, the implementation is well-architected and follows Go best practices, but there are several areas that need attention before merging. Summary AssessmentStrengths:
Areas of Concern:
Critical Issues (Must Fix)1. Concurrency Safety in Based SequencerLocation: Issue: The func (s *BasedSequencer) createBatchFromQueue(maxBytes uint64) *coresequencer.Batch {
// No mutex held here!
if len(s.txQueue) == 0 {
return &coresequencer.Batch{Transactions: nil}
}
// ... modifies s.txQueue ...
}Risk: Race conditions could lead to data corruption or panics from concurrent slice access. Recommendation: type BasedSequencer struct {
// ... existing fields ...
mu sync.RWMutex // Add mutex for txQueue protection
}
func (s *BasedSequencer) GetNextBatch(...) {
s.mu.Lock()
defer s.mu.Unlock()
// ... existing code ...
}
func (s *BasedSequencer) createBatchFromQueue(...) {
// Caller must hold s.mu
}2. Memory Leak in Pending Forced Inclusion TxsLocation: Issue: The type Sequencer struct {
// ...
pendingForcedInclusionTxs []pendingForcedInclusionTx // Can grow unbounded
}Scenario: If a malicious actor posts many large (but < 2MB) transactions to DA, they could exhaust sequencer memory. Recommendation:
3. Epoch Boundary Edge CaseLocation: Issue: When if daHeight < daStartHeight {
return daStartHeight, daStartHeight + daEpochSize - 1
}This returns a future epoch when the height is in the past. This could cause issues if a node syncs from before Recommendation: if daHeight < daStartHeight {
// Return an empty/invalid epoch indicator
return 0, 0 // Or return error
}4. Missing Error Context in Syncer VerificationLocation: Issue: When forced inclusion verification fails, the error handling doesn't properly halt the node or transition to based sequencer mode as mentioned in the PR description. if err := s.verifyForcedInclusionTxs(currentState, data); err != nil {
s.logger.Error().Err(err).Uint64("height", nextHeight).Msg("forced inclusion verification failed")
if errors.Is(err, errMaliciousProposer) {
s.cache.RemoveHeaderDAIncluded(headerHash)
return err // But what happens next?
}
}Missing:
Recommendation: Add a critical error handler that:
High Priority Issues (Should Fix)5. Hash Collision Risk in Syncer VerificationLocation: Issue: Using Recommendation:
6. No Handling of Gas LimitsLocation: Issue: TODO comment indicates gas validation is missing: // TODO(@julienrbrt): technically we may need to check for block gas as wellRisk: A transaction could be small in bytes but consume excessive gas, causing block production failures after forced inclusion. Recommendation:
7. DA Height SynchronizationLocation: Issue: DA height updates happen inside if forcedEvent.EndDaHeight > currentDAHeight {
c.SetDAHeight(forcedEvent.EndDaHeight)
} else if forcedEvent.StartDaHeight > currentDAHeight {
c.SetDAHeight(forcedEvent.StartDaHeight)
}Concern: Why check both Recommendation:
8. Epoch Fetching GranularityLocation: Issue: The code fetches each intermediate DA height sequentially: for epochHeight := epochStart + 1; epochHeight < epochEnd; epochHeight++ {
result := r.client.RetrieveForcedInclusion(ctx, epochHeight)
// ...
}Performance: For large epochs (e.g., 100 blocks), this makes 100 sequential DA calls. Recommendation:
Medium Priority Issues (Nice to Have)9. Configuration ValidationLocation: Issue: Validation only checks if based sequencer requires aggregator mode, but doesn't validate:
Recommendation: func (c *Config) Validate() error {
// ... existing checks ...
if c.Node.BasedSequencer && c.DA.ForcedInclusionNamespace == "" {
return fmt.Errorf("based sequencer requires forced_inclusion_namespace to be configured")
}
// Validate namespace format if using Celestia
if c.DA.ForcedInclusionNamespace != "" {
if err := validateNamespaceFormat(c.DA.ForcedInclusionNamespace); err != nil {
return fmt.Errorf("invalid forced_inclusion_namespace: %w", err)
}
}
return nil
}10. Logging ConsistencyIssue: Some critical operations use Examples:
Recommendation: Review and adjust log levels:
11. Test Coverage GapsMissing Test Scenarios:
Recommendation: Add integration tests covering these scenarios before merge. 12. Documentation CompletenessIssues:
Recommendation:
Architecture & Design FeedbackStrengths
Design Questions
Security AnalysisPositive Security Properties
Security Concerns
Performance ConsiderationsCurrent Performance Characteristics
Performance Optimization Opportunities
Testing AssessmentTest CoverageWell-tested components:
Under-tested components:
Test QualityStrengths:
Improvements needed:
Code QualityGo Best PracticesFollowed:
Could improve:
Metrics & ObservabilityCurrent state:
Recommendations:
Migration & DeploymentBreaking Changes
Migration path:
Deployment Recommendations
Recommendations SummaryMust Fix Before Merge (P0)
Should Fix Before Merge (P1)
Can Be Follow-up Work (P2)
Final AssessmentThis is a well-architected and important feature that significantly improves ev-node's censorship resistance and decentralization properties. The ADR is excellent, the code is generally high quality, and the test coverage is good. However, there are critical concurrency safety and resource management issues that must be addressed before merge. The unbounded pending queue and unprotected shared state in the based sequencer are production risks. Recommendation: Request changes to address P0 issues. Once those are resolved and tested, this PR will be ready to merge. The team has done great work here, and with the critical fixes, this will be a solid foundation for forced inclusion and based sequencer capabilities. Reviewed FilesCore implementation:
Test files reviewed:
Total files changed: 47 files (+4351/-574 lines) |
alpe
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice start!
Can you elaborate why you decided on a separate namespace for the force inclusion TX? The consumers have to read both ns anyway to stay up to date
|
|
||
| event.StartDaHeight = epochHeight | ||
| event.Txs = append(event.Txs, result.Data...) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to prepare for malicious content. let's exit the loop early when a tx size threshold is reached. This can be a multiple of common.DefaultMaxBlobSize used by the executor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense for the height check yes!. However i was thinking of doing no other checks and let the execution client deal with gibberish data (this is why i added that as requirement in the execution interface description)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we want to keep raw TX data in the namespace, there is not much we can do here to validate, indeed. A size check is an easy win but more would require extending the executor interface for a checkTX.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i agree, and this actually may be required to avoid congestions issue and losing txs.
This was a suggestion. Personally I think it makes sense, as we are filtering what's coming up in that namespace at fetching level directly in ev-node. What is posted in the force included namespace is handled directly by the execution client. ev-node only pass down bytes. |
|
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #2797 +/- ##
=======================================
Coverage ? 65.44%
=======================================
Files ? 85
Lines ? 7777
Branches ? 0
=======================================
Hits ? 5090
Misses ? 2121
Partials ? 566
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
The latest Buf updates on your PR. Results from workflow CI / buf-check (pull_request).
|
|
List of improvements to do in follow-ups:
|
|
We discussed the above in the standup (#2797 (comment)), and a few ideas came. 1 - 2 . When making the call async, we need to make sure the executor and full node stay insync with an epoch. This can be done easily by making an epoch a few blocks behind the actual DA height.
|
alpe
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for answering all my questions and comments.
There is still the todo in the code to store unprocessed direct TX when the max block size is reached.
|
|
||
| event.StartDaHeight = epochHeight | ||
| event.Txs = append(event.Txs, result.Data...) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we want to keep raw TX data in the namespace, there is not much we can do here to validate, indeed. A size check is an easy win but more would require extending the executor interface for a checkTX.
we decided to remove the sequencer go.mod, as ev-node can provide directly the sequencer implementation (sequencers/single was already depending on ev-node anyway) this means no go.mod need to be added for the new based sequencers in #2797
|
Once is PR is merged, we should directly after:
In the meantime, I have disabled the feature so it can be merged (0d790ef) |
|
FYI the upgrade test will fail until tastora is updated. |
| Users can submit transactions in two ways: | ||
|
|
||
| ### Systems Affected | ||
| 1. **Normal Path**: Submit to sequencer's mempool/RPC (fast, low cost) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the mempool not used app side for abci? Does ev-node have a mempool? Or does "sequencer's mempool/RPC" here refer to the sequencer node as a single entity even if its running the app out-of-process as is with evm.
From what I understand, the reth/evm mempool is used for evm and the sequencer queries the pending txs pool/queue in GetTxs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is the execution layers mempool. you are correct
| ### Full Node Verification Flow | ||
|
|
||
| ``` | ||
| 1. Receive block from DA or P2P | ||
| 2. Before applying block: | ||
| a. Fetch forced inclusion txs from DA at block's DA height | ||
| b. Build map of transactions in block | ||
| c. Verify all forced txs are in block | ||
| d. If missing: reject block, flag malicious proposer | ||
| 3. Apply block if verification passes | ||
| ``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes sense! I think my mental model was assuming that ev-node did not need to be run with ev-reth for full nodes. But on reflection I think I was incorrect or misunderstood.
I assume ev-node must always be run even for evm stack full nodes but with --evnode.node.aggregator=false.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, full node runs the whole stack. Light nodes on the other hand just fetch headers.
| - Only at epoch boundaries | ||
| - Scan epoch range for forced transactions | ||
| 3. Get batch from mempool queue | ||
| 4. Prepend forced txs to batch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So if we wanted to zk prove forced inclusion txs we could query the forced inclusion namespace at each epoch and prepend them to the txs list that we compare with the execution client's state transition function 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i dont believe you would need to check the forcedinclusion namespace since the txs will be included in a block at some point. if you want to verify that the txs on the namespace were included then you would need to follow it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, thanks! Does that mean that the ev-node sequencer will fetch txs from the FI namespace and then repost them in a SignedData payload to the data namespace?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
correct, that is how i understand it @julienrbrt, correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is correct. This is how we do the verification as well on the sync node side.
Rename `evm-single` to `evm` and `grpc-single` to `evgrpc` for clarity. ref: #2797 (comment)
Extract some logic from #2797. Those refactors were done to ease force inclusion integration but they can be extracted to be merged sooner
af054de to
a18e75f
Compare
| - **Censorship**: Mitigated by forced inclusion verification | ||
| - **DA Spam**: Limited by DA layer's native spam protection and two-tier blob size limits | ||
| - **Block Withholding**: Full nodes can fetch and verify from DA independently | ||
| - **Oversized Batches**: Prevented by strict size validation at multiple levels |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if a batch within an epoch is too big do we spread it out over many blocks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| 2. Before applying block: | ||
| a. Fetch forced inclusion txs from DA at block's DA height | ||
| b. Build map of transactions in block | ||
| c. Verify all forced txs are in block |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this can be done after the fact right? like if the block with forced inclusion gets created but the full node doesnt have the data then it wont block waiting for the data?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| 1. Timer triggers GetNextBatch | ||
| 2. Fetch forced inclusion txs from DA (via DA Retriever) | ||
| - Only at epoch boundaries | ||
| - Scan epoch range for forced transactions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what sort of latency does this introduce?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Negligible, if not none, as after #2842, it will be fetched async. Currently (in this PR), it is blocking.
Co-authored-by: Marko <[email protected]>
ref: #1914
A choice has been made to make this logic in the executor and avoid extending the reaper and the sequencer.
This is because, updating the repeer, means passing down the last fetched da height accross all components.
It adds a lot of complexity otherwise. Adding it in the sequencer may be preferable, but this makes the inclusion in a sync node less straightforward. This is what is being investigated.
Compared to the previous implementation, a forced transaction does not have any structure. It should be the raw structure from the execution client. This is to keep ev-node know nothing about the transaction. No signature checks, no validation of correctness. The execution client must make sure to reject gibberish transactions.
---- for later, won't be included in this pr (ref #2797 (comment))