fix(statesync): statesync is unstable and doesn't time out #1059

lklimek · 2025-02-19T08:30:01Z

Issue being fixed or feature implemented

We need state sync to be working properly. During testing, several issues arised, like:

deadlock
on genesis, state sync never timeouts
light client does not verify data correctly due to bug in QuorumVerify

What was done?

fix: deadlock in state sync
fix: QuorumVerify call has invalid order of operations
e2e: enable p2p state sync in rotate test
on statesync timeout, fall back to block sync
config: removed unused trust-height, trust-hash, trust-period from [statesync] section
config: added retries setting to [statesync] section
added safe integer type casting code

How Has This Been Tested?

Added statesync to p2p tests, also tested on dash platform mainnet as part of platform state sync effort.

Breaking Changes

Non-breaking changes in config file

Checklist:

I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
I have added or updated relevant unit/integration/functional/e2e tests
I have made corresponding changes to the documentation

For repository code-owners and collaborators only

I have assigned this pull request to a milestone

Summary by CodeRabbit

New Features
- Enhanced state synchronization with a built-in retry mechanism and improved snapshot handling.
- Added a query to reveal the current consensus height for better visibility of network status.
- Introduced a mock implementation for the ConsensusStateProvider interface to facilitate testing.
- Updated state synchronization configurations to utilize a "p2p" method for specific nodes.
Refactor
- Streamlined configuration settings by simplifying parameters and updating default connection options.
- Adjusted logging verbosity across components for more granular and informative system monitoring.
- Updated the handling of chunk management and error reporting within the synchronization process.
- Transitioned to a more unified approach in initializing light clients across state providers.
- Improved integer overflow handling and type-safe conversions in math utilities.
- Enhanced type safety and validation in quorum-related data handling.
Chores
- Upgraded several external dependencies to enhance overall performance and stability.
Tests
- Expanded test coverage for synchronization, concurrency controls, and validator functionality.
- Introduced new tests for the LockGuard and RLockGuard functions to ensure robust mutex handling.
- Added a new test to validate hash generation from a given validator threshold public key and quorum hash.
- Enhanced synchronization tests to support new retry logic and consensus state integration.

coderabbitai · 2025-02-19T08:30:08Z

Walkthrough

This pull request implements a series of updates across multiple modules. Trust parameters in state synchronization have been removed and replaced with a retry mechanism. A new ConsensusStateProvider interface (with a corresponding mock) is introduced and integrated into the reactor, while various logging levels have been adjusted for finer-grained tracing. Additional changes include enhanced chunk management, mutex guard utilities, safe math conversion functions with overflow checks, and several test updates. Minor documentation and configuration corrections are also applied throughout the codebase.

Changes

File(s)	Change Summary
`config/config.go`, `config/toml.go`	Removed trust parameters; added `Retries` field/parameter; updated default configuration and validation for state sync.
`internal/statesync/reactor.go`, `internal/statesync/mocks/consensusstateprovider.go`	Introduced new `ConsensusStateProvider` interface, updated reactor signature, and provided a mock implementation.
`internal/blocksync/reactor.go`, `internal/evidence/reactor.go`, `internal/mempool/reactor.go`, `internal/statesync/peer.go`	Changed logging levels (Debug/Info → Trace) to provide more granular output during peer and event processing.
`internal/consensus/state.go`, `internal/consensus/state_data.go`	Added `GetCurrentHeight` method to State; corrected a comment typo.
`internal/libs/sync/mutexguard.go`, `internal/libs/sync/mutexguard_test.go`	Added `LockGuard` and `RLockGuard` functions with associated interfaces and tests for safe mutex handling.
`internal/statesync/chunks.go`, `internal/statesync/chunks_test.go`	Enhanced chunk management with new filename handling, pending checks, helper methods, and improved error messages in tests.
`internal/p2p/transport_mconn.go`, `internal/statesync/stateprovider.go`	Updated default port value (26657 → 26656) and modified RPC/P2P state provider initialization (removing trust parameters).
`internal/statesync/syncer.go`, `internal/statesync/syncer_test.go`	Modified `SyncAny` and `requestChunk` signatures to include retry logic and an `initialHeight` parameter; adjusted error handling and logging.
`libs/math/safemath.go`, `libs/math/safemath_test.go`	Added safe conversion functions, new overflow error variable, and corresponding tests for robust integer conversion.
`test/e2e/networks/rotate.toml`, `test/e2e/pkg/mockcoreserver/core_server.go`, `test/e2e/pkg/mockcoreserver/methods.go`, `test/e2e/runner/setup.go`, `test/e2e/runner/start.go`, `types/validator_test.go`	Updated state sync settings from commented-out to active "p2p"; improved height conversion and unmarshalling in mocks; removed dynamic config updates; cleaned up variable scope in network start; and added a validator set hash test.

Possibly related PRs

fix(consensus): we should panic if finalize block on apply commit fails #966: The changes in the main PR, which introduce a new interface ConsensusStateProvider, are related to the retrieved PR as both involve modifications to the consensus mechanism, specifically in how state synchronization and consensus state interactions are handled.
fix(rpc): validators endpoint fail during quorum rotation #959: The changes in the main PR, which introduce a new interface ConsensusStateProvider, are related to the modifications in the retrieved PR as both involve modifications to the handling of validators and state synchronization.
fix(consensus): node stalled after client has stopped #1001: The changes in the main PR, which introduce a new interface ConsensusStateProvider, are related to the modifications in the retrieved PR that also involve the ConsensusStateProvider interface, specifically in the context of error handling and state management within the consensus logic.

Suggested reviewers

QuantumExplorer
shumkov

Poem

Hopping through the lines of code, I cheer,
Trust is replaced with retries, now crystal clear.
New interfaces and tests make my heart race,
Logs now trace each step with elegant grace.
In the coding burrow, improvements gleam bright—
A bunny’s delight in every change tonight!
🐇✨

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 golangci-lint (1.64.8)

Error: you are using a configuration file for golangci-lint v2 with golangci-lint v1: please use golangci-lint v2
Failed executing command with error: you are using a configuration file for golangci-lint v2 with golangci-lint v1: please use golangci-lint v2

📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 723556e and d5d2adc.

📒 Files selected for processing (1)

internal/statesync/reactor.go (22 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

internal/statesync/reactor.go

⏰ Context from checks skipped due to timeout of 90000ms (9)

GitHub Check: tests (04)
GitHub Check: tests (01)
GitHub Check: tests (02)
GitHub Check: tests (00)
GitHub Check: tests (03)
GitHub Check: Super linter
GitHub Check: e2e-test (dashcore)
GitHub Check: e2e-test (rotate)
GitHub Check: Build (amd64, linux)

✨ Finishing Touches

📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR.
@coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

…rectly

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (12)

libs/math/safemath.go (2)

164-199: Consider using type constraints for better readability.

The implementation is good, but readability could be improved by using a more descriptive type constraint name.
-type Integer interface {
+type Number interface {
   ~int | ~int8 | ~int16 | ~int32 | ~int64 | ~uint | ~uint8 | ~uint16 | ~uint32 | ~uint64
 }

-func MustConvert[FROM Integer, TO Integer](a FROM) TO {
+func MustConvert[FROM Number, TO Number](a FROM) TO {
263-312: Improve panic messages in Max/Min functions.

The implementation is good, but the panic messages could be more descriptive.
-       panic("unsupported type")
+       panic(fmt.Sprintf("unsupported type %T", max))

libs/math/safemath_test.go (1)

89-207: LGTM! Comprehensive test coverage.

The test cases are thorough and well-structured, covering various integer types and edge cases. Consider adding a helper function to reduce the nested switch statements and improve maintainability.

+func convertValue[F Integer, T Integer](from F, want T) (interface{}, error) {
+    return SafeConvert[F, T](from)
+}

 func TestSafeConvert(t *testing.T) {
     // ... test cases ...
     for i, tc := range testCases {
         testName := fmt.Sprintf("%d:%T(%d)-%T(%d)", i, tc.from, tc.from, tc.want, tc.want)
         t.Run(testName, func(t *testing.T) {
-            var result interface{}
-            var err error
-            switch from := tc.from.(type) {
-            case int:
-                switch tc.want.(type) {
-                case int64:
-                    result, err = SafeConvert[int, int64](from)
+            result, err := convertValue(tc.from, tc.want)

types/validator_test.go (5)

119-126: Consider parameterizing test vectors.

The test uses hardcoded test vectors. Consider parameterizing them into a table-driven test to make it easier to add more test cases in the future.

-func TestValidatorSetHashVectors(t *testing.T) {
-	thresholdPublicKey, err := base64.RawStdEncoding.DecodeString("gw5F5F5kFNnWFUc8woFOaxccUI+cd+ixaSS3RZT2HJlWpvoWM16YRn6sjYvbdtGH")
-	require.NoError(t, err)
-
-	quorumHash, err := hex.DecodeString("703ee5bfc78765cc9e151d8dd84e30e196ababa83ac6cbdee31a88a46bba81b9")
-	require.NoError(t, err)
-
-	expected := "81742F95E99EAE96ABC727FE792CECB4996205DE6BFC88AFEE1F60B96BC648B2"
+func TestValidatorSetHashVectors(t *testing.T) {
+	testCases := []struct {
+		name               string
+		thresholdPublicKey string
+		quorumHash        string
+		expectedHash      string
+	}{
+		{
+			name:               "valid hash generation",
+			thresholdPublicKey: "gw5F5F5kFNnWFUc8woFOaxccUI+cd+ixaSS3RZT2HJlWpvoWM16YRn6sjYvbdtGH",
+			quorumHash:        "703ee5bfc78765cc9e151d8dd84e30e196ababa83ac6cbdee31a88a46bba81b9",
+			expectedHash:      "81742F95E99EAE96ABC727FE792CECB4996205DE6BFC88AFEE1F60B96BC648B2",
+		},
+	}
+
+	for _, tc := range testCases {
+		t.Run(tc.name, func(t *testing.T) {
+			thresholdPublicKey, err := base64.RawStdEncoding.DecodeString(tc.thresholdPublicKey)
+			require.NoError(t, err)
+
+			quorumHash, err := hex.DecodeString(tc.quorumHash)
+			require.NoError(t, err)

128-134: Add test documentation and error cases.

The test function lacks documentation and error test cases. Consider:

Adding a function comment explaining the test's purpose
Including error cases for invalid inputs

+// TestValidatorSetHashVectors verifies that the ValidatorSet hash is correctly computed
+// from a given threshold public key and quorum hash.
 func TestValidatorSetHashVectors(t *testing.T) {
     // ... test cases setup ...
+    // Add error cases
+    t.Run("invalid public key", func(t *testing.T) {
+        valset := ValidatorSet{
+            ThresholdPublicKey: bls12381.PubKey([]byte("invalid")),
+            QuorumHash:         quorumHash,
+        }
+        assert.Panics(t, func() { valset.Hash() })
+    })

292-306: Improve error handling in Sync method.

The error handling for errNoSnapshots is good, but consider adding metrics or telemetry to track fallback occurrences.

 if errors.Is(err, errNoSnapshots) && r.postSyncHook != nil {
     r.logger.Warn("no snapshots available; falling back to block sync", "err", err)
+    if r.metrics != nil {
+        r.metrics.StateSyncFallbacks.Add(1)
+    }

1068-1104: LGTM! Enhanced snapshot safety checks.

Good addition of safety checks to ensure snapshots are only accepted when the next block is finalized.

This change improves system reliability by preventing potential state inconsistencies during snapshot synchronization.

877-877: Use safe integer conversion.

Good use of tmmath.MustConvertInt64 for safe integer conversion, but consider handling the potential panic.

-cp, err := r.stateStore.LoadConsensusParams(tmmath.MustConvertInt64(msg.Height))
+height, err := tmmath.SafeConvertInt64(msg.Height)
+if err != nil {
+    r.logger.Error("invalid height in params request",
+        "height", msg.Height,
+        "error", err)
+    return nil
+}
+cp, err := r.stateStore.LoadConsensusParams(height)

internal/libs/sync/mutexguard_test.go (4)

14-16: Consider increasing the timeout duration for CI environments.

While 1 second is reasonable for local testing, it might be too short for CI environments or slower machines. Consider making it configurable via an environment variable.

 const (
-	timeout = 1 * time.Second
+	timeout = getTestTimeout()
 )
+
+func getTestTimeout() time.Duration {
+	if t := os.Getenv("TEST_TIMEOUT"); t != "" {
+		if d, err := time.ParseDuration(t); err == nil {
+			return d
+		}
+	}
+	return 1 * time.Second
+}

18-46: Consider adding edge cases to strengthen the test.

The test covers basic unlock scenarios well. Consider adding these edge cases:

Panic recovery when unlocking an already unlocked mutex
Concurrent unlock attempts from multiple goroutines

 func TestLockGuardMultipleUnlocks(t *testing.T) {
+	t.Run("panic recovery", func(t *testing.T) {
+		var mtx deadlock.Mutex
+		mtx.Lock()
+		mtx.Unlock()
+		assert.NotPanics(t, func() {
+			unlock := tmsync.LockGuard(&mtx)
+			unlock()
+		})
+	})
+
+	t.Run("concurrent unlocks", func(t *testing.T) {
+		var mtx deadlock.Mutex
+		unlock := tmsync.LockGuard(&mtx)
+		var wg sync.WaitGroup
+		for i := 0; i < 10; i++ {
+			wg.Add(1)
+			go func() {
+				defer wg.Done()
+				assert.False(t, unlock())
+			}()
+		}
+		wg.Wait()
+	})

70-105: Verify concurrent read access.

The test should verify that multiple reads can occur simultaneously. Consider adding a channel-based synchronization to confirm this behavior.

 func TestRLockGuard(t *testing.T) {
 	var mtx deadlock.RWMutex
 	var counter int
 	var wg sync.WaitGroup
+	reading := make(chan struct{})
+	done := make(chan struct{})
 
 	read := func() {
 		defer wg.Done()
 		unlock := tmsync.RLockGuard(&mtx)
 		defer unlock()
+		reading <- struct{}{} // Signal that we're reading
+		<-done               // Wait for verification
 		_ = counter
 	}
+
+	// Verify concurrent reads
+	go func() {
+		// Wait for at least 2 readers
+		<-reading
+		<-reading
+		close(done)
+	}()

107-120: Make the helper more test-friendly.

Instead of panicking, consider returning an error to allow for better test control and error handling.

-func waitFor(f func()) {
+func waitFor(f func()) error {
 	done := make(chan struct{})
 	go func() {
 		f()
 		close(done)
 	}()
 
 	select {
 	case <-time.After(timeout):
-		panic("Test timed out")
+		return fmt.Errorf("test timed out after %v", timeout)
 	case <-done:
+		return nil
 	}
 }

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6340e50 and fab980a.

📒 Files selected for processing (5)

internal/libs/sync/mutexguard_test.go (1 hunks)
internal/statesync/reactor.go (22 hunks)
libs/math/safemath.go (5 hunks)
libs/math/safemath_test.go (2 hunks)
types/validator_test.go (2 hunks)

🧰 Additional context used

🧠 Learnings (1)

libs/math/safemath_test.go (1)

Learnt from: lklimek
PR: dashpay/tenderdash#1026
File: libs/math/safemath.go:123-161
Timestamp: 2025-01-20T13:38:10.707Z
Learning: The `SafeConvert` function in `libs/math/safemath.go` correctly handles negative values during integer type conversion through two mechanisms: explicit checks for int64 to uint64 conversion, and the Min[U]() check that catches negative values when converting to unsigned types.

⏰ Context from checks skipped due to timeout of 90000ms (5)

GitHub Check: test_apps
GitHub Check: test_abci_cli
GitHub Check: Super linter
GitHub Check: e2e-test (rotate)
GitHub Check: e2e-test (dashcore)

🔇 Additional comments (12)

libs/math/safemath.go (2)

9-9: LGTM! Good error consolidation.

Consolidating multiple specific overflow errors into a single ErrOverflow simplifies error handling while maintaining descriptive error messages.

117-158: LGTM! Robust generic implementation.

The SafeConvert function is well-implemented with:

Efficient platform-specific size checks using constants

Proper handling of edge cases for int64/uint64

Comprehensive overflow checks using Max/Min

types/validator_test.go (3)

72-73: LGTM! Good practice to extract magic number into a constant.

The introduction of minPeers constant improves code maintainability by centralizing the minimum peer requirement.

141-148: LGTM! Well-documented interface definition.

The ConsensusStateProvider interface is well-documented and follows the interface segregation principle with minimal required methods.

1000-1000: LGTM! Improved logging granularity.

Good change from Info to Trace level for peer updates, which reduces log noise while maintaining observability.

Also applies to: 1009-1009
internal/libs/sync/mutexguard_test.go (2)

1-12: LGTM! Well-organized imports and package declaration.

The package is correctly named sync_test for external tests, and imports are properly organized with standard library packages first, followed by third-party and internal packages.

48-68:
❓ Verification inconclusive

Add race condition verification.

While the test correctly verifies concurrent access, it would be beneficial to explicitly run it with the race detector and verify the absence of data races.

🏁 Script executed:
#!/bin/bash
# Description: Check if tests pass with race detection enabled
go test -race ./internal/libs/sync/...
Length of output: 139
Update test instructions to enable race detection by setting CGO_ENABLED=1

While the test correctly verifies concurrent access, it's recommended to run it with the race detector to confirm the absence of data races. However, note that the race detector requires cgo to be enabled. Please update your test command accordingly:

Run the tests using:
CGO_ENABLED=1 go test -race ./internal/libs/sync/...
internal/statesync/reactor.go (5)

24-24: LGTM! Good improvements to code organization.

The changes improve code maintainability by:

Adding safe integer conversion utilities via tmmath

Centralizing the minimum peer requirement into a constant instead of using magic numbers

Also applies to: 72-73

138-138: Great use of interface-based design!

The introduction of ConsensusStateProvider interface improves:

Testability by enabling mock implementations

Modularity by decoupling the reactor from concrete consensus state

Code clarity through focused, well-documented interface

Also applies to: 141-148

292-306: Excellent error handling and fallback mechanism!

The changes improve system resilience by:

Gracefully handling the case when no snapshots are available

Providing a fallback to block sync

Using clear, informative error messages

1068-1104: Robust snapshot safety improvements!

The changes enhance data consistency by:

Adding consensus state validation

Ensuring snapshots are only accepted when the next block is finalized

Including helpful debug logging

1000-1000: Appropriate adjustment of logging levels!

Changed peer update logging from Info to Trace level, which:

Reduces log noise for frequent events

Maintains consistency with other reactor implementations

Still provides detailed debugging capability when needed

Also applies to: 1009-1009

internal/libs/sync/mutexguard_test.go

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (3)

libs/math/safemath.go (3)

93-111: Add parameter documentation to conversion functions.

The functions are well-implemented, but the parameter documentation is missing. Add descriptions for the generic type parameter T.

-// SafeConvertUint64 takes a int and checks if it overflows.
+// SafeConvertUint64 takes a value of type T and converts it to uint64, checking for overflow.
+// Type parameter T must satisfy the Integer constraint.
 func SafeConvertUint64[T Integer](a T) (uint64, error) {

119-158: Simplify the type switch in SafeConvert.

The type switch could be simplified by removing redundant comments and empty returns.

 func SafeConvert[F Integer, T Integer](from F) (T, error) {
     const uintIsSmall = math.MaxUint < math.MaxUint64
     const intIsSmall = math.MaxInt < math.MaxInt64 && math.MinInt > math.MinInt64

     switch any(from).(type) {
     case int64:
         if _, ok := any(T(0)).(uint64); ok && from < 0 {
             return 0, ErrOverflow
         }
-        // return T(from), nil
     case uint64:
         if _, ok := any(T(0)).(int64); ok && uint64(from) > math.MaxInt64 {
             return 0, ErrOverflow
         }
-        // return T(from), nil
     case int:
         if !intIsSmall {
             return SafeConvert[int64, T](int64(from))
         }
-        // no return here - it's safe to use normal logic
     case uint:
         if !uintIsSmall {
             return SafeConvert[uint64, T](uint64(from))
         }
-        // no return here - it's safe to use normal logic
     }

263-312: Make panic messages consistent between Max and Min.

The error messages for unsupported types should be consistent:

Max uses: unsupported type %T
Min uses: unsupported type

-               panic("unsupported type")
+               panic(fmt.Sprintf("unsupported type %T", T(0)))

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fab980a and 2e8eee5.

📒 Files selected for processing (1)

libs/math/safemath.go (5 hunks)

⏰ Context from checks skipped due to timeout of 90000ms (6)

GitHub Check: test_apps
GitHub Check: test_abci_cli
GitHub Check: e2e-test (rotate)
GitHub Check: tests (01)
GitHub Check: e2e-test (dashcore)
GitHub Check: Super linter

🔇 Additional comments (2)

libs/math/safemath.go (2)

9-9: LGTM! Good error consolidation.

Consolidating multiple specific overflow errors into a single ErrOverflow reduces code duplication and improves maintainability.

26-28: LGTM! Consistent error handling.

The error handling is consistently updated across all functions to use the consolidated ErrOverflow.

Also applies to: 36-38, 66-68, 76-78, 86-88, 224-226, 234-236

libs/math/safemath.go

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (2)

libs/math/safemath_test.go (2)
89-207: LGTM! Comprehensive test coverage for safe integer conversions.

The test suite thoroughly validates integer type conversions, including edge cases and boundary conditions. The table-driven approach with descriptive test names enhances maintainability.

Consider these improvements:

Add test cases for:

Conversions between uint16/uint8

Conversions to/from custom integer types

Extract type conversion logic into helper functions to reduce switch statement complexity:
func convertValue[From, To any](t *testing.T, i int, from From) (To, error) {
    switch any(from).(type) {
    case int:
        return SafeConvert[From, To](from)
    // ... other cases
    default:
        t.Fatalf("unsupported source type %T", from)
        return *new(To), nil
    }
}
209-213: Add more test cases for panic behavior.

While the current test cases cover basic scenarios, consider adding more cases to thoroughly test panic behavior:
func TestMustConvertPanics(t *testing.T) {
    assert.NotPanics(t, func() { MustConvert[int32, int32](0) })
    assert.Panics(t, func() { MustConvert[int32, int16](math.MaxInt16 + 1) })
    assert.NotPanics(t, func() { MustConvert[int32, int16](math.MaxInt16) })
    
    // Additional test cases
    assert.Panics(t, func() { MustConvert[int64, uint64](math.MinInt64) })
    assert.Panics(t, func() { MustConvert[uint64, int64](math.MaxUint64) })
    assert.NotPanics(t, func() { MustConvert[int16, uint32](math.MaxInt16) })
}

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2e8eee5 and 55d9001.

📒 Files selected for processing (1)

libs/math/safemath_test.go (2 hunks)

🧰 Additional context used

🧠 Learnings (1)

libs/math/safemath_test.go (1)

Learnt from: lklimek
PR: dashpay/tenderdash#1026
File: libs/math/safemath.go:123-161
Timestamp: 2025-01-20T13:38:10.707Z
Learning: The `SafeConvert` function in `libs/math/safemath.go` correctly handles negative values during integer type conversion through two mechanisms: explicit checks for int64 to uint64 conversion, and the Min[U]() check that catches negative values when converting to unsigned types.

⏰ Context from checks skipped due to timeout of 90000ms (6)

GitHub Check: tests (01)
GitHub Check: e2e-test (rotate)
GitHub Check: golangci-lint
GitHub Check: Build (amd64, linux)
GitHub Check: e2e-test (dashcore)
GitHub Check: Super linter

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

internal/statesync/reactor.go (1)

1068-1104: Document the block height requirement.

While the code comments explain the need for finalized blocks, consider adding more detailed documentation about:

Why exactly 2 blocks are needed

The relationship between snapshot height and block finalization

The potential edge cases this prevents

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 55d9001 and 8eeccd4.

📒 Files selected for processing (1)

internal/statesync/reactor.go (22 hunks)

⏰ Context from checks skipped due to timeout of 90000ms (7)

GitHub Check: test_abci_cli
GitHub Check: test_apps
GitHub Check: e2e-test (rotate)
GitHub Check: tests (01)
GitHub Check: Super linter
GitHub Check: e2e-test (dashcore)
GitHub Check: golangci-lint

🔇 Additional comments (5)

internal/statesync/reactor.go (5)

24-24: LGTM! Good addition of safe math utilities.

The addition of tmmath import enables safe integer conversions, helping prevent potential overflow issues.

72-73: LGTM! Good constant extraction.

Extracting minPeers into a constant improves maintainability by centralizing the minimum peer requirement.

141-148: LGTM! Well-designed interface for improved testability.

The ConsensusStateProvider interface is well-documented and follows interface segregation principle by including only the required methods.

138-138: LGTM! Good dependency inversion.

Replacing the concrete type with ConsensusStateProvider interface follows the Dependency Inversion Principle, improving testability and flexibility.

292-306:
✅ Verification successful

Verify the error variable definition.

The error handling for no snapshots case looks good, but please ensure that errNoSnapshots is properly defined.

🏁 Script executed:
#!/bin/bash
# Search for errNoSnapshots definition
rg "errNoSnapshots.*=.*" internal/statesync/
Length of output: 240
Verified: errNoSnapshots is properly defined.

The error variable is defined in internal/statesync/syncer.go as errNoSnapshots = errors.New("no suitable snapshots found"), so the handling in internal/statesync/reactor.go is correct.

This reverts commit 8eeccd4.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

internal/statesync/reactor.go (1)
877-877: Consider using non-panicking integer conversion.

While tmmath.MustConvertInt64 prevents overflow issues, it can panic on invalid input. Consider using the non-panicking version that returns an error which can be handled gracefully.

Example approach:
-height := tmmath.MustConvertInt64(msg.Height)
+height, err := tmmath.SafeConvertInt64(msg.Height)
+if err != nil {
+    return fmt.Errorf("invalid height: %w", err)
+}
Also applies to: 1101-1101, 1120-1120

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8eeccd4 and a61dfd6.

📒 Files selected for processing (1)

internal/statesync/reactor.go (22 hunks)

⏰ Context from checks skipped due to timeout of 90000ms (11)

GitHub Check: test_abci_cli
GitHub Check: test_apps
GitHub Check: build
GitHub Check: tests (05)
GitHub Check: tests (03)
GitHub Check: tests (02)
GitHub Check: tests (01)
GitHub Check: e2e-test (rotate)
GitHub Check: golangci-lint
GitHub Check: e2e-test (dashcore)
GitHub Check: Super linter

🔇 Additional comments (5)

internal/statesync/reactor.go (5)

71-74: LGTM! Good refactoring of the minimum peers constant.

Extracting the hardcoded value into a named constant improves maintainability and makes the code more self-documenting.

141-148: LGTM! Well-designed interface abstraction.

The ConsensusStateProvider interface improves testability and follows the Interface Segregation Principle by defining only the required methods.

292-306: LGTM! Improved error handling with graceful fallback.

The enhanced error handling now:

Distinguishes between no snapshots and other errors

Provides a graceful fallback to block sync when no snapshots are available

Uses clear and informative error messages

1068-1104: LGTM! Enhanced snapshot safety with finalization check.

The addition of the finalization check is a critical safety improvement that ensures:

Only snapshots with finalized next blocks are accepted

The node won't sync to unfinalized state

Appropriate debug logging for troubleshooting

1000-1000: LGTM! Appropriate adjustment of logging levels.

Changed peer update logging from Info to Trace level, which is more appropriate for these frequent events and aligns with similar changes in other reactor files.

Also applies to: 1009-1009

github-actions · 2025-04-20T00:28:14Z

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

lklimek added 14 commits February 19, 2025 09:22

feat(light): verify trusted light block hash

a34ac3a

fix(light): light block fails with invalid sig format

03f4ace

chore: fix mock core server for new quorum verify

7bfe65a

fix(statesync): allow empty chunks

297fc45

refactor: remove trust-height and

5f08d4b

chore(config)!: remove unused trust-height,trust-hash,trust-period

0086910

test(statesync): fix state sync tests

d0c3200

chore: apply code rabbit comments

7485305

chore: self review

5c37adf

deps: update dashd-go

41ccf16

deps: update dashd-go

e431ff2

test: fix failing test

a80cb0b

chore: improve safemath lib

09339af

chore: fix linter

e228775

lklimek added 15 commits February 19, 2025 09:30

chore: code rabbit fixes

6412874

build(deps): update dashd-go to 0.26.1

e7b0836

fix(statesync): repeat image discovery when needed

e866f54

chore(statesync): retry snapshot when light block is not returned cor…

9c26e0b

…rectly

fix(statesync): don't offer snapshots when light block is not available

920d6db

chore: fix invalid condition

f081224

feat(statesync): add state sync retries limit

1396cd9

chore(statesync): handle no snapshots error

921aad5

chore: run blocksync on statesync retry failure

d6ca726

chore: fix minor condition

9486e87

chore: fix logging

ee564c1

fix: invalid condition

08acaef

chore(statesync): fix type conversion

2a3e1a0

test(types): validator set hash test vectors

54fd7ec

chore(statesync): 3 retries by default

6b6c7ba

lklimek added 2 commits February 19, 2025 13:27

refactor(math): unify ErrOverflow

0b60b9d

fix(math): invalid handling of some edge cases

0390209

lklimek marked this pull request as ready for review February 19, 2025 13:22

lklimek requested review from QuantumExplorer and shumkov as code owners February 19, 2025 13:22

chore: apply rabbit feedback

fab980a

lklimek changed the title ~~Feat/statesync improvements~~ fix(statesync): statesync is unstable and doesn't time out Feb 19, 2025

coderabbitai bot reviewed Feb 19, 2025

View reviewed changes

internal/libs/sync/mutexguard_test.go Show resolved Hide resolved

chore: fix comment

2e8eee5

coderabbitai bot reviewed Feb 19, 2025

View reviewed changes

libs/math/safemath.go Show resolved Hide resolved

test(math): add MustConvert panic test

55d9001

coderabbitai bot reviewed Feb 19, 2025

View reviewed changes

chore(statesync): remove test code that decreased backfill time

8eeccd4

coderabbitai bot reviewed Feb 20, 2025

View reviewed changes

Revert "chore(statesync): remove test code that decreased backfill time"

a61dfd6

This reverts commit 8eeccd4.

coderabbitai bot reviewed Feb 24, 2025

View reviewed changes

lklimek added 2 commits March 3, 2025 09:59

Merge branch 'v1.5-dev' into feat/statesync-improvements

0ed310d

Merge branch 'v1.5-dev' into feat/statesync-improvements

645a378

lklimek mentioned this pull request Mar 4, 2025

feat!: state sync #1026

Closed

5 tasks

Merge branch 'v1.5-dev' into feat/statesync-improvements

c47e185

shumkov approved these changes Mar 19, 2025

View reviewed changes

github-actions bot added the Stale label Apr 20, 2025

Merge branch 'v1.5-dev' into feat/statesync-improvements

723556e

lklimek removed the Stale label Apr 24, 2025

chore(statesync): remove test code that decreased backfill time

d5d2adc

lklimek merged commit 33cda2d into v1.5-dev Apr 25, 2025
19 checks passed

lklimek deleted the feat/statesync-improvements branch April 25, 2025 07:28

coderabbitai bot mentioned this pull request Apr 25, 2025

build(deps): major update of mockery to 3.2.4 #1107

Merged

5 tasks

coderabbitai bot mentioned this pull request Sep 23, 2025

build(deps): update go to 1.25.1 and other dependencies #1167

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(statesync): statesync is unstable and doesn't time out #1059

fix(statesync): statesync is unstable and doesn't time out #1059

Uh oh!

lklimek commented Feb 19, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Feb 19, 2025 •

edited

Loading

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot left a comment

Uh oh!

github-actions bot commented Apr 20, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

fix(statesync): statesync is unstable and doesn't time out #1059

fix(statesync): statesync is unstable and doesn't time out #1059

Uh oh!

Conversation

lklimek commented Feb 19, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Issue being fixed or feature implemented

What was done?

How Has This Been Tested?

Breaking Changes

Checklist:

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Feb 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Possibly related PRs

Suggested reviewers

Poem

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Apr 20, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

lklimek commented Feb 19, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 19, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)