Skip to content

Conversation

@julienrbrt
Copy link
Member

@julienrbrt julienrbrt commented Dec 9, 2025

Overview

ref: #2906

Add grace period for missing force txs inclusion. The grace period of 1 epoch can increase depending of the chain congestion (size of blocks). The grace period becomes elastic if the chain congested, to give to the sequencer to process all txs before being considered malicious.

@github-actions
Copy link
Contributor

github-actions bot commented Dec 9, 2025

The latest Buf updates on your PR. Results from workflow CI / buf-check (pull_request).

BuildFormatLintBreakingUpdated (UTC)
✅ passed⏩ skipped✅ passed✅ passedDec 11, 2025, 1:09 PM

@claude
Copy link
Contributor

claude bot commented Dec 9, 2025

Claude finished @julienrbrt's task in 3m 57s —— View job


Comprehensive Code Review: Grace Period for Forced Inclusion

I've performed a thorough review of this PR implementing a configurable grace period for forced inclusion transactions. Here's my detailed feedback:


🎯 Summary

This PR adds a sophisticated grace period mechanism with dynamic adjustment based on chain congestion. The implementation is well-designed and addresses a real operational need. However, there are several important issues to address, particularly around test coverage and the dynamic adjustment implementation.


🔴 Critical Issues

1. Insufficient Test Coverage for Grace Period Logic ⚠️

Location: block/internal/syncing/syncer_forced_inclusion_test.go

Gemini's review correctly identified a critical gap: None of the existing tests actually validate the grace period functionality with non-zero values. The test TestVerifyForcedInclusionTxs_SmoothingExceedsEpoch was added but it doesn't test the grace period behavior—it only validates smoothing within an epoch.

Missing test scenarios:

// MISSING: Test with grace period = 1
// - Forced tx from epoch ending at 100
// - Grace boundary = 100 + (1 * epochSize)
// - Verify tx can be included at DAHeight = 101..graceBoundary (should pass)
// - Verify tx NOT included at DAHeight > graceBoundary (should fail as malicious)

// MISSING: Test dynamic grace period adjustment
// - Verify high fullness increases grace period
// - Verify low fullness decreases grace period
// - Verify grace period stays within min/max bounds

Recommendation: Add comprehensive tests specifically for grace period enforcement:

  • Test transactions within grace period (should succeed)
  • Test transactions past grace boundary (should fail as malicious)
  • Test edge case: currentDAHeight == graceBoundary (should succeed per line 888)
  • Test edge case: currentDAHeight == graceBoundary + 1 (should fail)

2. Potential Integer Overflow in Grace Boundary Calculation ⚠️

Location: block/internal/syncing/syncer.go:886

graceBoundary := pending.EpochEnd + (effectiveGracePeriod * s.genesis.DAEpochForcedInclusion)

If effectiveGracePeriod or DAEpochForcedInclusion are very large, this could overflow uint64. While unlikely in practice, the calculation should use checked arithmetic or document the assumption.

Recommendation:

// Add overflow check or document maximum values
if effectiveGracePeriod > 0 && s.genesis.DAEpochForcedInclusion > math.MaxUint64/effectiveGracePeriod {
    s.logger.Error().Msg("grace period calculation would overflow")
    graceBoundary = math.MaxUint64
} else {
    graceBoundary = pending.EpochEnd + (effectiveGracePeriod * s.genesis.DAEpochForcedInclusion)
}

🟡 Architecture & Design Concerns

3. Dynamic Grace Period May Be Too Complex 🤔

Location: block/internal/syncing/syncer.go:760-809

The dynamic grace period adjustment adds significant complexity:

  • EMA tracking of block fullness
  • Multiplier adjustment based on congestion
  • Rounding and clamping logic

Concerns:

  1. Attack Surface: A malicious sequencer could artificially inflate block fullness to extend grace periods and delay detection
  2. Unpredictability: Dynamic adjustment makes it harder to reason about when censorship will be detected
  3. Configuration Burden: Multiple tuning parameters (min/max multipliers, threshold, adjustment rate) that operators must understand

From ADR-019:

"Grace period is dynamically adjusted based on block fullness to balance censorship detection with operational reliability"

But the ADR doesn't discuss the security implications of this dynamic behavior.

Recommendations:

  1. Document Security Trade-offs: Add section to ADR discussing attack vectors
  2. Consider Simpler Alternative: Use a fixed grace period initially, add dynamic adjustment in a follow-up PR if needed
  3. Add Metrics: Expose grace_period_multiplier and block_fullness_ema as metrics for observability
  4. Rate Limiting: Consider capping how quickly the multiplier can change to prevent sudden spikes

4. Block Fullness Calculation is Heuristic and May Be Inaccurate

Location: block/internal/syncing/syncer.go:741-758

const maxDataSize = seqcommon.AbsoluteMaxBlobSize // 2MB

func (s *Syncer) calculateBlockFullness(data *types.Data) float64 {
    dataSize := uint64(0)
    for _, tx := range data.Txs {
        dataSize += uint64(len(tx))
    }
    return float64(dataSize) / float64(maxDataSize)
}

Issues:

  1. Uses constant AbsoluteMaxBlobSize (2MB) which may not represent actual block capacity
  2. Only considers data size, not transaction count or gas limits
  3. Comment says "This is a heuristic - actual limits may vary by execution layer"

This means the grace period adjustment may not accurately reflect chain congestion.

Recommendations:

  1. Use actual block gas limit or execution layer capacity if available
  2. Document the heuristic nature prominently
  3. Consider making this pluggable so different chains can provide accurate fullness metrics

🟢 Code Quality & Go Best Practices

5. Excellent Use of Atomic Pointers for Concurrent Access

Location: block/internal/syncing/syncer.go:112-114

gracePeriodMultiplier *atomic.Pointer[float64]
blockFullnessEMA      *atomic.Pointer[float64]

Using atomic.Pointer[float64] is the correct approach for lock-free reads/writes. Well done!


6. Configuration Design is Clean

Location: block/internal/syncing/syncer.go:33-72

The internal configuration struct forcedInclusionGracePeriodConfig properly encapsulates tuning parameters. Good separation of concerns.

Minor suggestion: Consider exposing some parameters via config file for operator tuning:

// In genesis.Genesis or config.Config
type ForcedInclusionConfig struct {
    BasePeriod uint64 `json:"base_period"` // Expose for operator tuning
}

🔵 Documentation & Testing

7. ADR Updates are Comprehensive

Location: docs/adr/adr-019-forced-inclusion-mechanism.md

The ADR updates are thorough and well-structured:

  • Clear explanation of grace period concept
  • Multiple worked examples
  • Configuration recommendations

Minor improvements:

  1. Add sequence diagram showing grace period verification flow
  2. Document performance implications of pending tx tracking
  3. Add migration guide for existing networks

8. Good Test Coverage for Dynamic Adjustment

Location: block/internal/syncing/syncer_forced_inclusion_test.go:27-342

Excellent unit tests for the dynamic adjustment logic:

  • TestCalculateBlockFullness_* - Tests fullness calculation
  • TestUpdateDynamicGracePeriod_* - Tests adjustment algorithm
  • TestGetEffectiveGracePeriod_* - Tests period calculation
  • Integration tests for high/low congestion scenarios

9. Missing Edge Case: Zero Grace Period

The code doesn't explicitly test or document behavior when basePeriod = 0. From line 886:

graceBoundary := pending.EpochEnd + (effectiveGracePeriod * s.genesis.DAEpochForcedInclusion)

With basePeriod = 0, this becomes graceBoundary = EpochEnd, meaning strict enforcement at epoch boundary. This should be:

  1. Explicitly tested
  2. Documented as valid configuration for high-security deployments

🟣 Performance & Efficiency

10. Efficient Metrics Updates

Location: block/internal/syncing/syncer.go:907, 920

s.metrics.ForcedInclusionTxsInGracePeriod.Set(float64(txsInGracePeriod))
s.metrics.ForcedInclusionTxsMalicious.Add(float64(len(maliciousTxs)))

Good use of gauge for current state and counter for cumulative malicious txs.


11. Potential Performance Issue: Pending Tx Iteration

Location: block/internal/syncing/syncer.go:881-905, 910-916

The code iterates through stillPending twice:

  1. First to check grace boundaries (lines 881-905)
  2. Then to repopulate the sync.Map (lines 910-916)

For large pending sets, this could be inefficient. Consider single-pass:

// Single-pass version
s.pendingForcedInclusionTxs.Range(func(key, value any) bool {
    pending := value.(pendingForcedInclusionTx)
    graceBoundary := pending.EpochEnd + (effectiveGracePeriod * s.genesis.DAEpochForcedInclusion)
    
    if currentState.DAHeight > graceBoundary {
        maliciousTxs = append(maliciousTxs, pending)
        s.pendingForcedInclusionTxs.Delete(key)
    } else if currentState.DAHeight > pending.EpochEnd {
        txsInGracePeriod++
    }
    return true
})

🔒 Security Considerations

12. Grace Period Attack Vector: Malicious Sequencer Could Exploit Dynamic Adjustment

Scenario:

  1. Malicious sequencer fills blocks artificially to increase fullness
  2. Block fullness EMA rises above threshold
  3. Grace period multiplier increases
  4. Sequencer can now delay forced tx inclusion longer without detection

Mitigation:

  • Document this risk in ADR
  • Consider maximum absolute grace period (e.g., never > 10 epochs)
  • Alert when grace period exceeds expected values

13. Good: Malicious Detection is Strict

Location: block/internal/syncing/syncer.go:888

if currentState.DAHeight > graceBoundary {  // Strict >

Correctly uses strict inequality—transactions must be included by the grace boundary, not after.


📊 Metrics Implementation

14. Well-Designed Metrics

Location: block/internal/common/metrics.go:70-72, 189-202

ForcedInclusionTxsInGracePeriod metrics.Gauge   // Current state
ForcedInclusionTxsMalicious     metrics.Counter // Cumulative total

Good choice of metric types:

  • Gauge for current pending count - allows monitoring grace period health
  • Counter for malicious txs - tracks cumulative censorship attempts

Suggestion: Add histogram for grace period multiplier distribution:

ForcedInclusionGracePeriodMultiplier metrics.Histogram

🧪 Test Review

15. E2E Test Change is Minimal ⚠️

Location: test/docker-e2e/resiliency_test.go:421

-		}, 10*time.Second, time.Second, "post-recovery transaction %d should be processed", i+1)
+		}, 15*time.Second, time.Second, "post-recovery transaction %d should be processed", i+1)

Timeout increased from 10s to 15s. This might indicate:

  1. Grace period causing slower recovery
  2. Test flakiness

Recommendation: Document why timeout was increased in commit message or comment.


📋 Action Items

Must Address Before Merge:

  1. ⚠️ Add comprehensive grace period tests (address Gemini's concern)
  2. ⚠️ Document security implications of dynamic adjustment in ADR
  3. ⚠️ Add overflow protection to grace boundary calculation
  4. ⚠️ Test zero grace period edge case

Should Address:

  1. 🔧 Expose grace period multiplier and EMA as Prometheus metrics
  2. 🔧 Optimize pending tx iteration (single-pass instead of double)
  3. 🔧 Document block fullness heuristic limitations

Nice to Have:

  1. 📚 Add sequence diagram to ADR
  2. 📚 Document migration path for existing networks
  3. 🧪 Add benchmark tests for pending tx processing

Strengths

  1. Well-thought-out design addressing real operational needs
  2. Comprehensive ADR with clear examples
  3. Good use of Go concurrency patterns (atomic pointers, sync.Map)
  4. Thorough unit tests for dynamic adjustment logic
  5. Clean separation of concerns with internal configuration struct
  6. Proper metrics instrumentation

🎓 Learning & Best Practices

This PR demonstrates several Go best practices worth highlighting:

Atomic Operations:

gracePeriodMultiplier := &atomic.Pointer[float64]{}
gracePeriodMultiplier.Store(&initialMultiplier)

Using atomic.Pointer[T] (Go 1.19+) is cleaner than atomic.Value.

EMA Calculation:

newEMA := alpha*blockFullness + (1-alpha)*currentEMA

Proper exponential moving average with smoothing factor.

Sync.Map for Concurrent Access:

pendingForcedInclusionTxs sync.Map // map[string]pendingForcedInclusionTx

Correct choice for concurrent read/write with unknown keys.


📝 Verdict

Overall Assessment: This is a well-designed feature that addresses an important operational concern. The code quality is high, and the ADR documentation is comprehensive. However, the lack of test coverage for the actual grace period enforcement is a critical gap that must be addressed before merge.

Recommended Actions:

  1. ✅ Add the missing grace period enforcement tests
  2. ✅ Address security documentation in ADR
  3. ✅ Fix potential overflow issue
  4. 🤔 Consider simplifying by removing dynamic adjustment (or moving to follow-up PR)

Once these issues are addressed, this will be a solid addition to the codebase.


🔗 Related Files Reviewed

  • block/internal/syncing/syncer.go (+182/-21) - Core implementation
  • ⚠️ block/internal/syncing/syncer_forced_inclusion_test.go (+435/-55) - Tests (needs grace period tests)
  • block/internal/common/metrics.go (+23/-0) - Metrics
  • docs/adr/adr-019-forced-inclusion-mechanism.md (+162/-109) - Documentation
  • test/docker-e2e/resiliency_test.go (+2/-1) - E2E test

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @julienrbrt, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a significant improvement to the forced inclusion mechanism by implementing a configurable grace period. This grace period allows for a delay in marking a sequencer as malicious if forced inclusion transactions are not immediately included, providing crucial resilience against temporary Data Availability (DA) layer outages or network congestion. The change ensures that the system can tolerate transient issues without compromising censorship resistance, striking a better balance between security and operational reliability.

Highlights

  • Grace Period Implementation: Introduced a configurable "ForcedInclusionGracePeriod" to allow forced inclusion transactions to be included across multiple epochs before being marked as malicious.
  • Enhanced Malicious Detection: The system now flags a sequencer as malicious only if forced inclusion transactions are not included past their grace boundary, providing tolerance for temporary DA unavailability.
  • New Metrics: Added "ForcedInclusionTxsInGracePeriod" and "ForcedInclusionTxsMalicious" metrics to monitor the state of forced inclusion transactions.
  • Updated Documentation: The ADR-019 (Forced Inclusion Mechanism) has been comprehensively updated to detail the grace period, its calculation, configuration, benefits, and security implications.
  • Genesis Configuration: The "Genesis" configuration now includes "ForcedInclusionGracePeriod" with a default value of 1, allowing networks to customize this critical parameter.
  • Improved Test Coverage: A new test case ("TestVerifyForcedInclusionTxs_SmoothingExceedsEpoch") was added to specifically validate the grace period logic, alongside updates to existing tests.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a grace period for forced transaction inclusion, which is a great enhancement for network resilience against temporary DA layer issues. The implementation logic in the syncer, along with the new metrics and the extensive documentation updates in the ADR, are well-executed. My primary concern is the lack of test coverage for the new grace period functionality. The existing and newly added tests only cover the scenario where the grace period is zero, failing to validate the core behavior of the feature. I've added a comment with a suggestion to improve test coverage.

@codecov
Copy link

codecov bot commented Dec 9, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 63.53%. Comparing base (09b7efd) to head (02f8038).
⚠️ Report is 3 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2915      +/-   ##
==========================================
- Coverage   65.94%   63.53%   -2.41%     
==========================================
  Files          87       90       +3     
  Lines        7990     8584     +594     
==========================================
+ Hits         5269     5454     +185     
- Misses       2147     2547     +400     
- Partials      574      583       +9     
Flag Coverage Δ
combined 63.53% <100.00%> (-2.41%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@julienrbrt julienrbrt marked this pull request as draft December 10, 2025 09:34
@julienrbrt julienrbrt marked this pull request as ready for review December 11, 2025 13:08
@julienrbrt julienrbrt merged commit e867b3f into main Dec 11, 2025
28 of 31 checks passed
@julienrbrt julienrbrt deleted the julien/validate-loosen branch December 11, 2025 20:24
@github-actions
Copy link
Contributor

PR Preview Action v1.6.3
Preview removed because the pull request was closed.
2025-12-11 20:25 UTC

alpe added a commit that referenced this pull request Dec 15, 2025
* main:
  chore: execute goimports to format the code (#2924)
  refactor(block)!: remove GetLastState from components (#2923)
  feat(syncing): add grace period for missing force txs inclusion (#2915)
  chore: minor improvement for docs (#2918)
  feat: DA Client remove interface part 2,  add client for celestia blob api   (#2909)
  chore: update rust deps (#2917)
  feat(sequencers/based): add based batch time (#2911)
  build(deps): Bump golangci/golangci-lint-action from 9.1.0 to 9.2.0 (#2914)
  refactor(sequencers): implement batch position persistance (#2908)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants