Skip to content

Conversation

@aaronc
Copy link
Member

@aaronc aaronc commented May 30, 2025

Description

Closes: #22731

This PR is essentially a rewrite of the core watcher/process management of Cosmovisor and will result in a Cosmovisor v2 release.

Summary by CodeRabbit

  • New Features

    • Added support for manual blockchain upgrades, allowing upgrades to be scheduled and triggered via environment variables or CLI commands.
    • Introduced enhanced Cosmovisor features, including new commands for managing manual and batch upgrades, improved configuration options, and support for monitoring upgrades via HTTP RPC endpoints.
    • System tests now cover Cosmovisor's manual and governance upgrade flows, verifying upgrade behavior and binary switching.
    • Provided a mock node and comprehensive integration tests for Cosmovisor upgrade scenarios.
  • Bug Fixes

    • Improved upgrade plan JSON handling to ensure compatibility with protobuf-generated types and correct serialization of fields.
  • Documentation

    • Updated and expanded Cosmovisor documentation, including new CLI commands, environment variables, manual upgrade scheduling, and improved usage instructions.
    • Added a state machine diagram to illustrate the Cosmovisor upgrade and run process.
  • Refactor

    • Major internal restructuring of Cosmovisor's upgrade logic, process management, and watcher mechanisms for improved reliability and maintainability.
    • Replaced legacy launcher and file watcher code with new modular components and generic watchers.
  • Chores

    • Updated changelogs to reflect new features and breaking changes.
    • Improved test organization and coverage for manual and batch upgrade scenarios.
  • Tests

    • Added new and refactored existing system and unit tests to validate manual upgrade handling, process restarts, and watcher behaviors.

@ironbird-prod
Copy link

ironbird-prod bot commented May 30, 2025

Ironbird - launch a network To use Ironbird, you can use the following commands:
  • /ironbird start OR /ironbird start --load-test-config= - Launch a testnet with the specified chain and load test configuration.
  • /ironbird chains - List of chain images that ironbird can use to spin-up testnet
  • /ironbird loadtests - List of load test modes that ironbird can run against testnet
Custom Load Test Configuration You can provide a custom load test configuration using the `--load-test-config=` flag:
/ironbird start cosmos --load-test-config={
  "block_gas_limit_target": 0.75,
  "num_of_blocks": 50,
  "msgs": [
    {"weight": 0.3, "type": "MsgSend"},
    {"weight": 0.3, "type": "MsgMultiSend"},
	{"weight": 0.4, "type": "MsgArr", "ContainedType": "MsgSend", "NumMsgs": 3300}
  ]
}

Use /ironbird loadtests to see more examples.

@github-actions github-actions bot added C:x/upgrade C:Cosmovisor Issues and PR related to Cosmovisor labels May 30, 2025
@aaronc aaronc marked this pull request as ready for review June 30, 2025 11:34
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jun 30, 2025

📝 Walkthrough

Walkthrough

This set of changes introduces a major refactor and enhancement of Cosmovisor's upgrade management, particularly for manual upgrades. The core process management, upgrade detection, and watcher logic have been rewritten with new internal modules, improved configuration handling, and new CLI commands. The upgrade flow now robustly supports both governance-triggered and manually scheduled upgrades, with extensive new tests and documentation updates.

Changes

File(s) Change Summary
tools/cosmovisor/process.go, tools/cosmovisor/scanner.go Entire legacy process management and file watcher logic removed.
tools/cosmovisor/internal/runner.go, tools/cosmovisor/internal/upgrader.go, tools/cosmovisor/internal/process.go, tools/cosmovisor/internal/backoff.go, tools/cosmovisor/internal/errors.go, tools/cosmovisor/internal/skip.go, tools/cosmovisor/internal/testing.go, tools/cosmovisor/internal/watchers/* New modular process runner, upgrade handling, backoff, error types, test helpers, and generic file/height watcher infrastructure introduced.
tools/cosmovisor/args.go, tools/cosmovisor/manual.go Configuration and manual upgrade batch handling refactored; protobuf JSON used for upgrade info, new methods for reading/writing/validating manual upgrades, and new config fields added.
tools/cosmovisor/cmd/cosmovisor/*.go CLI refactored to use new process runner and config methods; new commands for manual upgrades, batch upgrades, and improved help/version/config handling.
tools/cosmovisor/cmd/mock_node/main.go, tools/cosmovisor/cmd/cosmovisor/mockchain_test.go New mock node and integration test for simulating and validating upgrade flows.
systemtests/system.go, tests/systemtests/cosmovisor_test.go System test harness and new system test for Cosmovisor manual and governance upgrades added.
x/upgrade/keeper/keeper.go, x/upgrade/abci.go, simapp/upgrades.go Upgrade module extended to support manual upgrades via new keeper methods and ABCI logic; app supports manual upgrade trigger via env var.
tools/cosmovisor/README.md, tools/cosmovisor/CHANGELOG.md, tools/cosmovisor/state_machine.puml Documentation and diagrams updated to reflect new features, commands, and process flow.
tools/cosmovisor/args_test.go, tools/cosmovisor/internal/process_test.go, tools/cosmovisor/internal/watchers/data_watcher_test.go, tools/cosmovisor/internal/watchers/file_poll_watcher_test.go, tools/cosmovisor/internal/upgrade_test.go, tools/cosmovisor/scanner_test.go Tests updated or added for new runner, watcher, and upgrade logic.
Makefile Build and test targets updated for new system test binaries and Cosmovisor.
tests/systemtests/upgrade_test.go Constants for upgrade test localized to function scope.
x/upgrade/CHANGELOG.md Changelog updated for manual upgrade support and breaking changes.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Cosmovisor
    participant Node
    participant FileSystem
    participant RPC

    User->>Cosmovisor: add-upgrade (manual or batch)
    Cosmovisor->>FileSystem: Write manual upgrade batch (JSON)
    User->>Cosmovisor: run (start node)
    Cosmovisor->>Node: Launch node process
    loop Watch for upgrade
        Cosmovisor->>FileSystem: Watch/Read upgrade-info.json & manual batch
        Cosmovisor->>RPC: Poll /block endpoint for height
        alt Height matches manual upgrade
            Cosmovisor->>Node: Shutdown node
            Cosmovisor->>FileSystem: Set current binary to upgrade binary
            Cosmovisor->>Node: Restart node with upgraded binary
        else Height matches governance upgrade
            Cosmovisor->>Node: Shutdown node
            Cosmovisor->>FileSystem: Set current binary to upgrade binary
            Cosmovisor->>Node: Restart node with upgraded binary
        end
    end
Loading

Assessment against linked issues

Objective Addressed Explanation
Cosmovisor must replace the binary and switch to the new version at the configured manual upgrade height (#22731)
Manual upgrades must be detected and executed even for distant/far-future block heights (#22731)
Upgrade info must be read and handled robustly, including correct JSON/protobuf handling (#22731)
Directory structure and binary permissions must be handled correctly for upgrades (#22731)

Assessment against linked issues: Out-of-scope changes

Code Change Explanation
Addition of extensive new watcher and process management modules (tools/cosmovisor/internal/watchers/*, tools/cosmovisor/internal/runner.go, etc.) While these are architectural improvements, they go beyond the immediate scope of fixing the manual upgrade bug, but are justified as part of a major refactor to address reliability and maintainability.
New mock node and integration/system tests (tools/cosmovisor/cmd/mock_node/main.go, tools/cosmovisor/cmd/cosmovisor/mockchain_test.go, tests/systemtests/cosmovisor_test.go) These are test harness improvements and not strictly required for the bug fix, but they directly support validation of the new upgrade logic.
Documentation and diagram updates (tools/cosmovisor/README.md, tools/cosmovisor/state_machine.puml) These are supportive but not strictly required for the bug fix.
Refactoring of unrelated test helpers and code style (e.g., test fixture refactors) These are not directly related to the bug fix but improve code quality.
✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate Unit Tests
  • Create PR with Unit Tests
  • Post Copyable Unit Tests in Comment
  • Commit Unit Tests in branch aaronc/22731-cosmovisor-fixes

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

‼️ IMPORTANT
Auto-reply has been disabled for this repository in the CodeRabbit settings. The CodeRabbit bot will not respond to your replies unless it is explicitly tagged.

  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai auto-generate unit tests to generate unit tests for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 18

♻️ Duplicate comments (28)
simapp/upgrades.go (1)

40-54: Address integer conversion security issue.

The conversion from uint64 to int64 on line 48 lacks bounds checking, which could cause integer overflow if the parsed height exceeds math.MaxInt64.

Apply this fix to add bounds checking:

+import (
+	"math"
+	// ... other imports
+)

 if manualUpgrade, ok := os.LookupEnv("SIMAPP_MANUAL_UPGRADE_HEIGHT"); ok {
 	height, err := strconv.ParseUint(manualUpgrade, 10, 64)
 	if err != nil {
 		panic("invalid SIMAPP_MANUAL_UPGRADE_HEIGHT height: " + err.Error())
 	}
+	if height > math.MaxInt64 {
+		panic("SIMAPP_MANUAL_UPGRADE_HEIGHT exceeds maximum allowed value")
+	}
 	upgradeInfo = upgradetypes.Plan{
 		Name:   UpgradeName,
 		Height: int64(height),
 	}
tools/cosmovisor/internal/watchers/file_poll_watcher.go (1)

16-35: Validate file path to address security concern.

The file reading operation could be vulnerable to path traversal attacks if the filename parameter comes from untrusted input.

Consider adding path validation:

+import (
+	"path/filepath"
+	// ... other imports
+)

 func NewFilePollWatcher(ctx context.Context, errorHandler ErrorHandler, filename string, pollInterval time.Duration) Watcher[[]byte] {
+	// Validate and clean the file path to prevent path traversal
+	filename = filepath.Clean(filename)
+	if filepath.IsAbs(filename) {
+		// Additional validation for absolute paths if needed
+	}
+	
 	stat, err := os.Stat(filename)

Please verify that the filename parameter is properly validated at the caller level and cannot be manipulated by untrusted input.

tools/cosmovisor/cmd/cosmovisor/run.go (1)

47-53: Document the rationale for the shutdown goroutine.

The goroutine ensures process termination even if context cancellation doesn't exit cleanly. While this may trigger non-determinism warnings, it appears intentional for reliability. Consider adding a comment explaining why this fallback mechanism is necessary.

tools/cosmovisor/internal/watchers/data_watcher.go (1)

11-36: Well-designed generic watcher with proper resource management.

The implementation effectively combines generics with the watcher pattern. The goroutine management is appropriate despite the security scanner flag - it's properly bounded by context cancellation and channel closure.

The goroutine spawning is flagged by the security scanner but is necessary for the asynchronous watcher pattern and is properly managed with context cancellation.

tools/cosmovisor/internal/watchers/fsnotify_watcher.go (2)

38-74: Goroutine management is appropriate despite security scanner flag.

The goroutine spawning is flagged by the security scanner but is necessary for the asynchronous file watching pattern. The implementation properly handles context cancellation and resource cleanup.

The goroutine is properly bounded by context cancellation and channel closure, making it safe despite the non-determinism warning.


60-60: File inclusion is safe in this context.

While the security scanner flags potential file inclusion via variable, this is safe because:

  1. The filename comes from validated fsnotify events
  2. It's filtered against a predefined set of allowed filenames
  3. The file is only read, not executed

The file reading is safe as it's bounded by the filename whitelist and only performs read operations.

tools/cosmovisor/internal/process.go (1)

9-27: Well-designed process management abstraction.

The ProcessRunner implementation provides a clean abstraction for managing external processes with proper synchronization via channels and graceful lifecycle management.

tools/cosmovisor/internal/watchers/poll_watcher.go (1)

27-53: Proper async polling implementation with context handling.

The goroutine implementation correctly:

  • Handles context cancellation for graceful shutdown
  • Uses ticker for consistent polling intervals
  • Closes output channel on exit
  • Filters file not found errors appropriately
tools/cosmovisor/internal/watchers/hybrid_watcher.go (1)

20-41: Goroutine management is handled correctly.

The goroutine properly:

  1. Defers channel cleanup
  2. Handles context cancellation
  3. Exits on channel closure
  4. Filters directory updates by filename

This addresses potential non-determinism concerns through proper synchronization and cleanup.

x/upgrade/keeper/keeper.go (3)

44-44: Verify backward compatibility of codec interface change.

Changing from codec.BinaryCodec to codec.Codec could potentially be a breaking change for consumers who depend on the specific interface type.

Verify that all existing usage of the keeper's codec field is compatible with the more general Codec interface:

#!/bin/bash
# Search for direct access to keeper.cdc field
rg -A 3 -B 3 "\.cdc\." --type go

# Search for keeper construction to see what codec types are passed
rg -A 5 -B 5 "NewKeeper.*codec" --type go

540-540: LGTM: Improved JSON marshaling consistency.

Using the keeper's codec for JSON marshaling instead of the standard library ensures consistency with the rest of the codebase and proper protobuf handling.


610-622: Address static analysis warnings about unused assignments.

The static analysis correctly identifies that manualUpgradeInfo assignments are not read elsewhere, which could indicate missing functionality or dead code.

Verify that the manual upgrade functionality is complete:

#!/bin/bash
# Search for code that reads the manual upgrade info
rg -A 5 -B 5 "GetManualUpgrade\|manualUpgradeInfo" --type go

# Search for where manual upgrades are processed/applied
rg -A 5 -B 5 "manual.*upgrade.*apply\|apply.*manual.*upgrade" --type go
tools/cosmovisor/cmd/mock_node/main.go (5)

123-130: Validate file path to prevent directory traversal attacks.

The actualHeightFile path is constructed from user input without validation, which could allow directory traversal attacks.

Consider adding path validation:

 actualHeightFile := path.Join(n.homePath, "data", "actual-height")
+// Ensure the path is within the expected directory
+actualHeightFile = path.Clean(actualHeightFile)
+if !strings.HasPrefix(actualHeightFile, path.Clean(n.homePath)) {
+    return fmt.Errorf("invalid height file path")
+}

152-152: Use more restrictive file permissions.

-err := os.WriteFile(actualHeightFile, []byte(fmt.Sprintf("%d", n.height)), 0o644)
+err := os.WriteFile(actualHeightFile, []byte(fmt.Sprintf("%d", n.height)), 0o600)

178-178: Use more restrictive directory permissions.

-err = os.MkdirAll(path.Dir(upgradeInfoPath), 0o755)
+err = os.MkdirAll(path.Dir(upgradeInfoPath), 0o750)

182-182: Use more restrictive file permissions.

-err = os.WriteFile(upgradeInfoPath, []byte(out), 0o644)
+err = os.WriteFile(upgradeInfoPath, []byte(out), 0o600)

214-216: Configure ReadHeaderTimeout to prevent Slowloris attacks.

 srv := &http.Server{
-    Addr: n.httpAddr,
+    Addr:              n.httpAddr,
+    ReadHeaderTimeout: 10 * time.Second,
 }
tools/cosmovisor/manual.go (2)

67-71: Ensure deterministic ordering when iterating over map.

Map iteration in Go is non-deterministic, which could cause issues in a blockchain context where deterministic behavior is required.

The upgrades are sorted after appending, but consider using a slice-based approach from the start:

-var newUpgrades ManualUpgradeBatch
-for _, plan := range planMap {
-    newUpgrades = append(newUpgrades, plan)
-}
+// Extract keys and sort them first for deterministic iteration
+var keys []string
+for name := range planMap {
+    keys = append(keys, name)
+}
+sort.Strings(keys)
+
+var newUpgrades ManualUpgradeBatch
+for _, name := range keys {
+    newUpgrades = append(newUpgrades, planMap[name])
+}

103-103: Use more restrictive file permissions for upgrade configuration.

-return os.WriteFile(cfg.UpgradeInfoBatchFilePath(), manualUpgradesData, 0644)
+return os.WriteFile(cfg.UpgradeInfoBatchFilePath(), manualUpgradesData, 0600)
tools/cosmovisor/internal/upgrader.go (4)

154-154: Subprocess launched with variable.


223-224: Calling the system time.


207-207: Subprocess launched with variable.


235-235: Calling the system time.

systemtests/system.go (1)

225-228: Subprocess launched with variable.

tools/cosmovisor/internal/runner.go (1)

115-115: Subprocess launched with variable.

tools/cosmovisor/args.go (3)

459-482: Current binary upgrade info methods are well-implemented.

The methods properly handle missing upgrade info files by returning nil, and the convenience method for extracting the upgrade name is useful.

Regarding the security scan alert about file inclusion at line 462: The path is safely constructed from trusted configuration values (cfg.Root() and the currentLink constant), so this is not a security vulnerability.


501-514: Consider logging errors in ReadLastKnownHeight.

The method silently returns 0 on any error (file not found, parse error, etc.). While this provides robustness, it might hide configuration issues. Consider at least logging errors for debugging purposes.

Regarding the security scan alert: The file path is safely constructed from trusted configuration values.


516-519: Use more restrictive file permissions.

The file contains operational state data and doesn't need to be world-readable. Use 0600 permissions as suggested by the security scan.

Apply this fix:

-	return os.WriteFile(filename, []byte(strconv.FormatUint(height, 10)), 0644)
+	return os.WriteFile(filename, []byte(strconv.FormatUint(height, 10)), 0600)
🧹 Nitpick comments (21)
tools/cosmovisor/internal/errors.go (1)

7-11: Consider simplifying the error message implementation.

The error type is well-structured and follows Go conventions. However, since there's no formatting involved, you can simplify the implementation:

 func (e ErrRestartNeeded) Error() string {
-	return fmt.Sprintf("upgrade needed")
+	return "upgrade needed"
 }

Alternatively, consider a more descriptive message like "daemon restart required due to upgrade".

tools/cosmovisor/internal/watchers/data_watcher_test.go (1)

38-55: Consider using testify's Eventually for more reliable timing.

The test uses fixed sleep durations which could lead to flaky tests on slower systems. Consider using require.Eventually to wait for conditions instead of fixed sleeps.

Example refactor for waiting on file updates:

-		// wait a bit to ensure the watcher has time to pick up the change
-		// then cancel the context
-		time.Sleep(time.Second)
+		// wait for the watcher to pick up the change
+		require.Eventually(t, func() bool {
+			select {
+			case <-dataWatcher.Updated():
+				return true
+			default:
+				return false
+			}
+		}, 5*time.Second, 100*time.Millisecond)
		cancel()
tools/cosmovisor/internal/skip.go (1)

12-12: Address the TODO comment.

The TODO comment suggests uncertainty about whether this functionality should be retained. This needs to be resolved before merging.

Do you want me to help analyze the usage of this functionality across the codebase to determine if it's still needed, or should this be tracked in a separate issue?

tools/cosmovisor/cmd/cosmovisor/batch_upgrade.go (1)

105-107: Simplify the defer statement for file closing.

The anonymous function wrapper is unnecessary for a simple method call.

-	defer func(file *os.File) {
-		_ = file.Close()
-	}(file)
+	defer file.Close()
tools/cosmovisor/internal/watchers/http_block_checker.go (1)

82-91: Improve error handling for height parsing.

The strconv.ParseUint error is not wrapped, making debugging harder.

func getHeightFromRPCBlockResponse(bz []byte) (uint64, error) {
	var response Response
	err := json.Unmarshal(bz, &response)
	if err != nil {
		return 0, fmt.Errorf("failed to unmarshal block response: %w", err)
	}

	height := response.Result.Block.Header.Height
-	return strconv.ParseUint(height, 10, 64)
+	parsed, err := strconv.ParseUint(height, 10, 64)
+	if err != nil {
+		return 0, fmt.Errorf("failed to parse height '%s': %w", height, err)
+	}
+	return parsed, nil
}
tools/cosmovisor/state_machine.puml (1)

1-68: Consider completing or removing commented states.

The diagram provides good high-level flow but has extensive commented-out sections (lines 28-66) that detail important upgrade steps like backup, pre-upgrade customization, and signal watching.

Consider either:

  1. Uncommenting and completing these detailed states if they represent the actual implementation
  2. Removing them if they're outdated or not implemented
  3. Adding a note explaining why they're commented

This would improve the diagram's utility as documentation for the actual cosmovisor behavior.

x/upgrade/keeper/keeper.go (1)

624-626: Consider adding validation for manual upgrade retrieval.

The getter method returns the raw pointer without any validation or defensive copying.

Consider adding documentation about the lifetime and mutability expectations:

+// GetManualUpgrade returns the current manual upgrade plan, if any.
+// Returns nil if no manual upgrade is set.
+// The returned plan should not be modified by the caller.
func (k *Keeper) GetManualUpgrade() *types.Plan {
	return k.manualUpgradeInfo
}
tools/cosmovisor/internal/watchers/height_watcher.go (1)

23-26: Remove the extra blank line.

 watcher.PollWatcher = NewPollWatcher[uint64](errorHandler, func() (uint64, error) {
   return watcher.ReadNow()
-
 }, pollInterval)
tools/cosmovisor/README.md (6)

55-56: Add missing comma for clarity.

-Release branches have the following format `release/cosmovisor/vA.B.x`, where A and B are a number (e.g.
+Release branches have the following format `release/cosmovisor/vA.B.x`, where A and B are a number (e.g.,

121-122: Add missing comma after "Note".

-  is only after the upgrade and does not auto-restart the subprocess after an error occurs. Note restart
+  is only after the upgrade and does not auto-restart the subprocess after an error occurs. Note, restart

244-245: Add comma after introductory phrase.

-  upgrade name.
-* If neither `cosmovisor/current/upgrade-info.json` nor `data/upgrade-info.json` exist, then `cosmovisor` will wait for
+  upgrade name.
+* If neither `cosmovisor/current/upgrade-info.json` nor `data/upgrade-info.json` exist, then `cosmovisor` will wait for
   `data/upgrade-info.json` file to trigger an upgrade.
 * If `cosmovisor/current/upgrade-info.json` doesn't exist but `data/upgrade-info.json` exists, then `cosmovisor` assumes
-  that whatever is in `data/upgrade-info.json` is a valid upgrade request. In this case `cosmovisor` tries immediately
+  that whatever is in `data/upgrade-info.json` is a valid upgrade request. In this case, `cosmovisor` tries immediately

298-299: Fix grammar in infinitive construction.

-Using the `--upgrade-height` flag allows to specify at which height the binary should be switched, without going via a
+Using the `--upgrade-height` flag allows you to specify at which height the binary should be switched, without going via a

349-350: Add comma after conditional clause.

-   When submitting this as a proposal ensure there are no spaces. An example command using `gaiad` could look like:
+   When submitting this as a proposal, ensure there are no spaces. An example command using `gaiad` could look like:

380-381: Add comma after introductory phrase.

-ensures that no false binary is run, even if someone hacks the server or hijacks the DNS. `go-getter` will always ensure
-the downloaded file matches the checksum if it is provided. `go-getter` will also handle unpacking archives into
-directories (in this case the download link should point to a `zip` file of all data in the `bin` directory).
+ensures that no false binary is run, even if someone hacks the server or hijacks the DNS. `go-getter` will always ensure
+the downloaded file matches the checksum if it is provided. `go-getter` will also handle unpacking archives into
+directories (in this case, the download link should point to a `zip` file of all data in the `bin` directory).
tools/cosmovisor/manual.go (1)

83-88: Simplify conditional logic.

The else block after continue is unnecessary.

 for _, existing := range manualUpgrades {
     if existing.Height == height {
         continue
-    } else {
-        newUpgrades = append(newUpgrades, existing)
     }
+    newUpgrades = append(newUpgrades, existing)
 }
tests/systemtests/cosmovisor_test.go (2)

38-38: Consider making the legacy binary path configurable.

The hardcoded path to the legacy binary could break if the directory structure changes. Consider making this configurable through an environment variable or test parameter.

-legacyBinary := systest.WorkDir + "/binaries/v0.53/simd"
+legacyBinary := os.Getenv("LEGACY_BINARY_PATH")
+if legacyBinary == "" {
+    legacyBinary = systest.WorkDir + "/binaries/v0.53/simd"
+}

42-43: Voting period might be too short for CI environments.

A 5-second voting period could be fragile in slow or loaded CI environments, potentially causing test flakiness.

-votingPeriod := 5 * time.Second // enough time to vote
+votingPeriod := 10 * time.Second // enough time to vote in CI environments
tools/cosmovisor/internal/upgrader.go (1)

145-145: Use named constant for file permissions.

The magic number 0o100 should be replaced with a named constant for better readability.

+const executePermission = 0o100 // Execute permission for owner
+
 // Set the execute bit for only the current user
 // Given:  Current user - Group - Everyone
 //       0o     RWX     - RWX   - RWX
 oldMode := info.Mode().Perm()
-newMode := oldMode | 0o100
+newMode := oldMode | executePermission
tools/cosmovisor/cmd/cosmovisor/mockchain_test.go (1)

200-200: Consider making the expected callback count more flexible.

The hardcoded expected callback count could make the test fragile if timing changes or if the test logic is modified.

-require.Equal(t, 8, callbackCount)
+require.GreaterOrEqual(t, callbackCount, 8, "should have at least 8 callbacks for all upgrade stages")
+require.LessOrEqual(t, callbackCount, 10, "should not have excessive callbacks indicating a problem")
systemtests/system.go (1)

219-220: Consider making the Cosmovisor path configurable.

The hardcoded path to the Cosmovisor binary could break if the directory structure changes or in different environments.

 func (s *SystemUnderTest) cosmovisorPath() string {
-    return filepath.Join(WorkDir, "binaries", "cosmovisor")
+    path := os.Getenv("COSMOVISOR_PATH")
+    if path == "" {
+        path = filepath.Join(WorkDir, "binaries", "cosmovisor")
+    }
+    return path
 }
tools/cosmovisor/internal/runner.go (1)

175-176: Consider logging errors from cleanup operations.

Errors from GetLatestBlockHeight and Shutdown are ignored in the defer block, which could hide important issues during cleanup.

 defer func() {
     // always check for the latest block height before shutting down so that we have it in the last known height file
-    _, _ = heightChecker.GetLatestBlockHeight()
-    _ = processRunner.Shutdown(r.cfg.ShutdownGrace)
+    if _, err := heightChecker.GetLatestBlockHeight(); err != nil {
+        r.logger.Warn("Failed to get latest block height during shutdown", "error", err)
+    }
+    if err := processRunner.Shutdown(r.cfg.ShutdownGrace); err != nil {
+        r.logger.Warn("Failed to shutdown process gracefully", "error", err)
+    }
 }()
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Knowledge Base: Disabled due to Reviews > Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between da39d1b and 3122f9e.

⛔ Files ignored due to path filters (2)
  • tools/cosmovisor/go.mod is excluded by !**/*.mod
  • tools/cosmovisor/go.sum is excluded by !**/*.sum, !**/*.sum
📒 Files selected for processing (53)
  • Makefile (1 hunks)
  • simapp/upgrades.go (2 hunks)
  • systemtests/system.go (3 hunks)
  • tests/systemtests/cosmovisor_test.go (1 hunks)
  • tests/systemtests/upgrade_test.go (1 hunks)
  • tools/cosmovisor/CHANGELOG.md (3 hunks)
  • tools/cosmovisor/README.md (14 hunks)
  • tools/cosmovisor/args.go (10 hunks)
  • tools/cosmovisor/args_test.go (1 hunks)
  • tools/cosmovisor/cmd/cosmovisor/add_upgrade.go (4 hunks)
  • tools/cosmovisor/cmd/cosmovisor/batch_upgrade.go (5 hunks)
  • tools/cosmovisor/cmd/cosmovisor/config.go (3 hunks)
  • tools/cosmovisor/cmd/cosmovisor/help.go (1 hunks)
  • tools/cosmovisor/cmd/cosmovisor/help_test.go (1 hunks)
  • tools/cosmovisor/cmd/cosmovisor/init.go (2 hunks)
  • tools/cosmovisor/cmd/cosmovisor/init_test.go (1 hunks)
  • tools/cosmovisor/cmd/cosmovisor/main.go (1 hunks)
  • tools/cosmovisor/cmd/cosmovisor/mockchain_test.go (1 hunks)
  • tools/cosmovisor/cmd/cosmovisor/prepare_upgrade.go (3 hunks)
  • tools/cosmovisor/cmd/cosmovisor/root.go (2 hunks)
  • tools/cosmovisor/cmd/cosmovisor/run.go (3 hunks)
  • tools/cosmovisor/cmd/cosmovisor/run_config.go (2 hunks)
  • tools/cosmovisor/cmd/cosmovisor/show_upgrade.go (1 hunks)
  • tools/cosmovisor/cmd/cosmovisor/version.go (3 hunks)
  • tools/cosmovisor/cmd/mock_node/main.go (1 hunks)
  • tools/cosmovisor/internal/backoff.go (1 hunks)
  • tools/cosmovisor/internal/errors.go (1 hunks)
  • tools/cosmovisor/internal/process.go (1 hunks)
  • tools/cosmovisor/internal/process_test.go (7 hunks)
  • tools/cosmovisor/internal/runner.go (1 hunks)
  • tools/cosmovisor/internal/skip.go (1 hunks)
  • tools/cosmovisor/internal/testing.go (1 hunks)
  • tools/cosmovisor/internal/upgrade.go (2 hunks)
  • tools/cosmovisor/internal/upgrade_test.go (6 hunks)
  • tools/cosmovisor/internal/upgrader.go (1 hunks)
  • tools/cosmovisor/internal/watchers/data_watcher.go (1 hunks)
  • tools/cosmovisor/internal/watchers/data_watcher_test.go (1 hunks)
  • tools/cosmovisor/internal/watchers/file_poll_watcher.go (1 hunks)
  • tools/cosmovisor/internal/watchers/file_poll_watcher_test.go (1 hunks)
  • tools/cosmovisor/internal/watchers/fsnotify_watcher.go (1 hunks)
  • tools/cosmovisor/internal/watchers/height_watcher.go (1 hunks)
  • tools/cosmovisor/internal/watchers/http_block_checker.go (1 hunks)
  • tools/cosmovisor/internal/watchers/hybrid_watcher.go (1 hunks)
  • tools/cosmovisor/internal/watchers/poll_watcher.go (1 hunks)
  • tools/cosmovisor/internal/watchers/watcher.go (1 hunks)
  • tools/cosmovisor/manual.go (1 hunks)
  • tools/cosmovisor/process.go (0 hunks)
  • tools/cosmovisor/scanner.go (0 hunks)
  • tools/cosmovisor/scanner_test.go (3 hunks)
  • tools/cosmovisor/state_machine.puml (1 hunks)
  • x/upgrade/CHANGELOG.md (1 hunks)
  • x/upgrade/abci.go (1 hunks)
  • x/upgrade/keeper/keeper.go (5 hunks)
💤 Files with no reviewable changes (2)
  • tools/cosmovisor/process.go
  • tools/cosmovisor/scanner.go
🧰 Additional context used
🧬 Code Graph Analysis (18)
tools/cosmovisor/cmd/cosmovisor/root.go (1)
tools/cosmovisor/cmd/cosmovisor/show_upgrade.go (1)
  • NewShowManualUpgradesCmd (10-36)
tools/cosmovisor/cmd/cosmovisor/prepare_upgrade.go (1)
tools/cosmovisor/internal/upgrade.go (1)
  • GetBinaryURL (71-81)
tools/cosmovisor/cmd/cosmovisor/run_config.go (1)
tools/cosmovisor/internal/runner.go (1)
  • RunConfig (260-264)
tools/cosmovisor/cmd/cosmovisor/config.go (2)
tools/cosmovisor/args.go (2)
  • Config (58-82)
  • GetConfigFromFile (182-223)
tools/cosmovisor/flags.go (1)
  • FlagCosmovisorConfig (9-9)
tools/cosmovisor/internal/watchers/file_poll_watcher.go (2)
tools/cosmovisor/internal/watchers/watcher.go (2)
  • ErrorHandler (17-22)
  • Watcher (11-14)
tools/cosmovisor/internal/watchers/poll_watcher.go (1)
  • NewPollWatcher (17-25)
simapp/upgrades.go (3)
version/version.go (1)
  • Name (31-31)
log/logger.go (1)
  • Logger (39-63)
math/uint.go (1)
  • ParseUint (240-246)
tools/cosmovisor/internal/watchers/fsnotify_watcher.go (1)
tools/cosmovisor/internal/watchers/watcher.go (1)
  • Watcher (11-14)
tools/cosmovisor/cmd/cosmovisor/run.go (3)
tools/cosmovisor/cmd/cosmovisor/run_config.go (1)
  • RunOption (19-19)
tools/cosmovisor/args.go (1)
  • GetConfigFromFile (182-223)
tools/cosmovisor/internal/runner.go (1)
  • NewRunner (26-32)
tools/cosmovisor/internal/watchers/data_watcher_test.go (4)
tools/cosmovisor/internal/watchers/watcher.go (1)
  • DebugLoggerErrorHandler (38-40)
log/testing.go (1)
  • NewTestLogger (23-25)
tools/cosmovisor/internal/watchers/file_poll_watcher.go (1)
  • NewFilePollWatcher (10-39)
tools/cosmovisor/internal/watchers/data_watcher.go (1)
  • NewDataWatcher (11-36)
tools/cosmovisor/internal/skip.go (1)
tools/cosmovisor/flags.go (1)
  • FlagSkipUpgradeHeight (5-5)
tools/cosmovisor/cmd/cosmovisor/add_upgrade.go (3)
tools/cosmovisor/flags.go (2)
  • FlagForce (7-7)
  • FlagUpgradeHeight (8-8)
x/upgrade/client/cli/tx.go (1)
  • FlagUpgradeHeight (23-23)
tools/cosmovisor/args.go (1)
  • Config (58-82)
tools/cosmovisor/internal/watchers/poll_watcher.go (1)
tools/cosmovisor/internal/watchers/watcher.go (1)
  • ErrorHandler (17-22)
tools/cosmovisor/internal/watchers/data_watcher.go (1)
tools/cosmovisor/internal/watchers/watcher.go (2)
  • ErrorHandler (17-22)
  • Watcher (11-14)
tools/cosmovisor/internal/watchers/hybrid_watcher.go (3)
tools/cosmovisor/internal/watchers/watcher.go (2)
  • Watcher (11-14)
  • ErrorHandler (17-22)
tools/cosmovisor/internal/watchers/fsnotify_watcher.go (1)
  • FSNotifyWatcher (12-15)
tools/cosmovisor/internal/watchers/file_poll_watcher.go (1)
  • NewFilePollWatcher (10-39)
tools/cosmovisor/internal/watchers/http_block_checker.go (1)
tools/cosmovisor/internal/watchers/height_watcher.go (1)
  • HeightChecker (8-10)
tools/cosmovisor/internal/watchers/watcher.go (4)
tools/cosmovisor/internal/watchers/fsnotify_watcher.go (1)
  • FSNotifyWatcher (12-15)
tools/cosmovisor/internal/watchers/hybrid_watcher.go (1)
  • NewHybridWatcher (15-47)
tools/cosmovisor/internal/watchers/data_watcher.go (1)
  • NewDataWatcher (11-36)
tools/cosmovisor/internal/watchers/file_poll_watcher.go (1)
  • NewFilePollWatcher (10-39)
tools/cosmovisor/internal/watchers/height_watcher.go (2)
tools/cosmovisor/internal/watchers/poll_watcher.go (2)
  • PollWatcher (10-15)
  • NewPollWatcher (17-25)
tools/cosmovisor/internal/watchers/watcher.go (1)
  • ErrorHandler (17-22)
tools/cosmovisor/args.go (1)
x/upgrade/types/plan.go (1)
  • UpgradeInfoFilename (12-12)
🪛 GitHub Actions: System Tests
tests/systemtests/upgrade_test.go

[error] 1-1: Makefile test target failed with exit code 1 due to system test failure.

Makefile

[error] Make target 'test-system' failed with exit code 2 due to system test failures.

systemtests/system.go

[error] 900-900: Panic error: executable file '/home/runner/work/cosmos-sdk/cosmos-sdk/tests/systemtests/binaries/simd' not found (stat no such file or directory).

tests/systemtests/cosmovisor_test.go

[error] 1-1: Makefile test target failed with exit code 1 due to system test failure.

🪛 LanguageTool
tools/cosmovisor/CHANGELOG.md

[uncategorized] ~49-~49: You might be missing the article “the” here.
Context: ...emove duplicate binary downloads during auto-download process ## v1.7.0 - 2024-11...

(AI_EN_LECTOR_MISSING_DETERMINER_THE)


[uncategorized] ~80-~80: You might be missing the article “the” here.
Context: ...(other cmds with flags) .... * Add --cosmovisor-configflag to provideco...

(AI_EN_LECTOR_MISSING_DETERMINER_THE)


[uncategorized] ~80-~80: You might be missing the article “the” here.
Context: ...tomlpath to the configuration file in root command used by add-upgrade` and...

(AI_EN_LECTOR_MISSING_DETERMINER_THE)


[uncategorized] ~178-~178: You might be missing the article “a” here.
Context: ...a/upgrade-info.json` file updates using polling mechanism. * [#9999](https://github.co...

(AI_EN_LECTOR_MISSING_DETERMINER_A)


[uncategorized] ~196-~196: You might be missing the article “the” here.
Context: ...om//pull/10128) Change default value of DAEMON_RESTART_AFTER_UPGRADE...

(AI_EN_LECTOR_MISSING_DETERMINER_THE)


[uncategorized] ~201-~201: Use a comma before ‘and’ if it connects two independent clauses (unless they are closely connected and short).
Context: ... - 2021-08-06 This is the first release and we started this changelog on 2021-07-01...

(COMMA_COMPOUND_SENTENCE)

x/upgrade/CHANGELOG.md

[uncategorized] ~35-~35: Possible missing comma found.
Context: ...e app's JSON codec rather than encoding/json which is the correct behavior for emitt...

(AI_HYDRA_LEO_MISSING_COMMA)

tools/cosmovisor/README.md

[uncategorized] ~55-~55: Possible missing comma found.
Context: ...e. Release branches have the following format release/cosmovisor/vA.B.x, where A an...

(AI_HYDRA_LEO_MISSING_COMMA)


[uncategorized] ~121-~121: Possible missing comma found.
Context: ...m administrator to manually restart it. Note restart is only after the upgrade and...

(AI_HYDRA_LEO_MISSING_COMMA)


[style] ~146-~146: ‘prior to’ might be wordy. Consider a shorter alternative.
Context: ...osmovisor/$COSMOVISOR_CUSTOM_PREUPGRADE prior to upgrade with the arguments [ upgrade....

(EN_WORDINESS_PREMIUM_PRIOR_TO)


[style] ~147-~147: ‘prior to’ might be wordy. Consider a shorter alternative.
Context: ... Executes a custom script (separate and prior to the chain daemon pre-upgrade command)...

(EN_WORDINESS_PREMIUM_PRIOR_TO)


[grammar] ~149-~149: The modal verb ‘will’ requires the verb’s base form.
Context: ...set to true, the upgrade directory will expected to match the upgrade plan name withou...

(MD_BASEFORM)


[uncategorized] ~180-~180: “being dependent on so/sth” requires the preposition “(up)on”.
Context: ...th the --home flag. $DAEMON_HOME is dependent of the data directory and must be set to t...

(DEPENDENT_ADJ)


[style] ~204-~204: This phrase is redundant. Consider writing “points” or “times”.
Context: ...andle switching binaries at the correct points in time so that the system administrator can pr...

(MOMENT_IN_TIME)


[style] ~206-~206: Consider a more concise word here.
Context: ... in advance and relax at upgrade time. In order to support downloadable binaries, a tarbal...

(IN_ORDER_TO_PREMIUM)


[formatting] ~244-~244: Consider inserting a comma after an introductory phrase for better readability.
Context: ...-info.jsonis a valid upgrade request. In this casecosmovisor` tries immediately to mak...

(IN_THAT_CASE_COMMA)


[uncategorized] ~257-~257: Possible missing comma found.
Context: ...commands can be used to schedule manual upgrades which will set the node's `--halt-heigh...

(AI_HYDRA_LEO_MISSING_COMMA)


[uncategorized] ~270-~270: Use a comma before ‘and’ if it connects two independent clauses (unless they are closely connected and short).
Context: ... condition requiring manual intervention and it will shut down the node. #### Expec...

(COMMA_COMPOUND_SENTENCE)


[style] ~295-~295: Try moving the adverb to make the sentence clearer.
Context: ...isorhas anadd-upgrade` command that allows to easily link a binary to an upgrade. It creates a new folder ...

(SPLIT_INFINITIVE)


[grammar] ~298-~298: Did you mean “specifying”? Or maybe you should add a pronoun? In active voice, ‘allow’ + ‘to’ takes an object, usually a pronoun.
Context: ...sing the --upgrade-height flag allows to specify at which height the binary should be sw...

(ALLOW_TO)


[grammar] ~322-~322: You should probably use: “are”.
Context: ...data/upgrade-info.json` file. The files is constructed by the x/upgrade module and...

(AGREEMENT_SENT_START)


[grammar] ~349-~349: Possibly, a comma is missing after an introductory clause, the verb inflection is not correct, or a question mark is missing. Did you mean “proposal, ensure” or “proposal ensures”?
Context: ...} ``` When submitting this as a proposal ensure there are no spaces. An example command...

(IF_DT_NN_VBZ)


[style] ~366-~366: Consider using a synonym to be more concise.
Context: ...ove format (e.g. if you want to specify lots of binaries, changelog info, etc. witho...

(A_LOT_OF)


[misspelling] ~377-~377: Use “an” instead of ‘a’ if the following word starts with a vowel sound, e.g. ‘an article’, ‘an hour’.
Context: ...ity guarantees, all URLs should include a SHA 256/512 checksum. This ensures that...

(EN_A_VS_AN)


[uncategorized] ~380-~380: Possible missing comma found.
Context: ...king archives into directories (in this case the download link should point to a `zi...

(AI_HYDRA_LEO_MISSING_COMMA)

🪛 markdownlint-cli2 (0.17.2)
tools/cosmovisor/CHANGELOG.md

76-76: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


78-78: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


80-80: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


82-82: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


108-108: Multiple headings with the same content
null

(MD024, no-duplicate-heading)

tools/cosmovisor/README.md

135-135: Spaces inside link text
null

(MD039, no-space-in-links)

⏰ Context from checks skipped due to timeout of 90000ms (13)
  • GitHub Check: tests (01)
  • GitHub Check: tests (03)
  • GitHub Check: tests (02)
  • GitHub Check: tests (00)
  • GitHub Check: test-cosmovisor
  • GitHub Check: test-sim-nondeterminism
  • GitHub Check: test-integration
  • GitHub Check: test-e2e
  • GitHub Check: Analyze
  • GitHub Check: golangci-lint
  • GitHub Check: Gosec
  • GitHub Check: Check docs build
  • GitHub Check: Summary
🔇 Additional comments (61)
tools/cosmovisor/cmd/cosmovisor/help.go (1)

6-6: LGTM: Import path updated for v2 refactor.

The import path change to cosmossdk.io/tools/cosmovisor/v2 is consistent with the major refactor to Cosmovisor v2.

tools/cosmovisor/cmd/cosmovisor/main.go (1)

8-8: Context handling preserved in Execute()

The default Execute() on a *cobra.Command internally calls ExecuteContext(context.Background()), so switching from ExecuteContext(context.Background()) to Execute() does not change signal handling or shutdown behavior. No further action required.

tools/cosmovisor/cmd/cosmovisor/help_test.go (1)

8-8: LGTM: Import path updated for v2 refactor.

The import path change to cosmossdk.io/tools/cosmovisor/v2 is consistent with the v2 refactor and matches the corresponding change in help.go.

tools/cosmovisor/cmd/cosmovisor/init.go (2)

14-14: LGTM: Import path updated for v2 refactor.

The import path change to cosmossdk.io/tools/cosmovisor/v2 is consistent with the v2 refactor across all files.


24-26: LGTM: Improved documentation clarity.

The addition of explicit environment variable requirements (DAEMON_HOME and DAEMON_NAME) in the long description improves user experience by clearly stating prerequisites for the init command.

tools/cosmovisor/cmd/cosmovisor/version.go (3)

11-11: LGTM: Import path updated for v2 refactor.

The import path change to cosmossdk.io/tools/cosmovisor/v2 is consistent with the v2 refactor.


65-65: LGTM: Context parameter added consistently.

The addition of cmd.Context() as the first parameter is consistent with the change on line 50, ensuring proper context propagation in the v2 refactor.


50-50: Confirmed: run Invocation Is Correct

The run function in tools/cosmovisor/cmd/cosmovisor/run.go is defined as:

  • func run(ctx context.Context, cfgPath string, args []string, options ...RunOption) error

The call in version.go at line 50 (run(cmd.Context(), "", append([]string{"version"}, args...))) correctly passes cmd.Context() as the first argument. No changes are needed.

tests/systemtests/upgrade_test.go (1)

25-28: LGTM! Good refactoring for better encapsulation.

Moving the constants from package-level to function-level scope reduces global namespace pollution and improves test isolation without changing functionality.

tools/cosmovisor/cmd/cosmovisor/root.go (2)

22-22: Command change from show-upgrade-info to show-manual-upgrades looks good.

The command change aligns with the shift to manual upgrade management. The new command implementation in tools/cosmovisor/cmd/cosmovisor/show_upgrade.go correctly reads manual upgrade data and formats it as JSON.


6-6: Import path v2 consistently applied – approved

  • All imports of cosmossdk.io/tools/cosmovisor have been updated to …/v2. No occurrences of the old path remain.
  • Please verify that the CLI command change from show-upgrade-info (NewShowUpgradeInfoCmd) to show-manual-upgrades (NewShowManualUpgradesCmd) is reflected in any documentation, CI scripts, integrations, or user guides.
tools/cosmovisor/internal/errors.go (1)

13-13: LGTM! Proper interface compliance check.

The compile-time interface compliance check ensures the error type correctly implements the error interface.

tools/cosmovisor/args_test.go (1)

475-475: LGTM! RPC address configuration supports new upgrade polling features.

The addition of RPCAddress with a sensible default value aligns with the enhanced manual upgrade polling capabilities. The default http://localhost:26657 is appropriate for testing environments.

tools/cosmovisor/cmd/cosmovisor/init_test.go (1)

19-19: LGTM! Consistent v2 import path update.

The import path update is consistent with the v2 migration seen across the codebase. This ensures the test suite uses the correct v2 module.

tools/cosmovisor/cmd/cosmovisor/run_config.go (1)

7-7: Excellent refactoring approach using type aliases.

The use of a type alias to internal.RunConfig is a clean way to centralize the struct definition while maintaining backward compatibility. This approach allows the internal package to own the core types while keeping the public API unchanged.

Also applies to: 17-17

tools/cosmovisor/CHANGELOG.md (1)

39-41: Changelog appropriately documents the breaking changes.

The consolidation under "Breaking Changes" accurately reflects the scope of this major refactoring. The reference to the core logic reimplementation provides clear context for users.

x/upgrade/CHANGELOG.md (1)

32-35: Changelog entries accurately document the manual upgrade support and breaking changes.

The addition of manual upgrade methods and the change to use the app's JSON codec for upgrade plans are well-documented. The breaking change note about height field serialization is particularly important for users to understand the impact.

tools/cosmovisor/cmd/cosmovisor/prepare_upgrade.go (2)

37-40: Clean refactoring that simplifies configuration handling.

The replacement of manual config retrieval with the getConfigFromCmd helper function eliminates boilerplate code and centralizes configuration handling. This improves maintainability and consistency across commands.


64-64: Correct usage of internal package function.

The update to use internal.GetBinaryURL aligns with the new package structure and maintains the same functionality as shown in the relevant code snippets.

tools/cosmovisor/internal/upgrade_test.go (2)

3-3: Correct package migration to internal structure.

The migration from cosmovisor_test to internal package aligns with the new architecture and follows Go best practices for testing internal packages. The import path update to v2 is also correct.

Also applies to: 19-19


145-145: Function calls correctly updated for same-package context.

The removal of package prefixes from UpgradeBinary and OSArch calls is correct since the tests now reside in the same internal package as these functions.

Also applies to: 160-160, 232-232, 235-235, 254-254

tools/cosmovisor/cmd/cosmovisor/config.go (3)

4-4: LGTM: Clean import updates for v2 refactor.

The addition of the fmt import and update to the v2 cosmovisor package align well with the broader refactoring effort mentioned in the PR objectives.

Also applies to: 8-8


18-18: LGTM: Good refactoring to extract common config loading logic.

Replacing the inline config loading with the helper function improves code reusability and maintainability.


28-40: LGTM: Well-implemented helper function with proper error handling.

The getConfigFromCmd function properly extracts the config flag, handles errors with descriptive messages, and delegates to the appropriate config loading function.

tools/cosmovisor/internal/watchers/file_poll_watcher_test.go (1)

14-64: LGTM: Comprehensive test coverage for FilePollWatcher.

The test properly covers the complete lifecycle of the file poll watcher:

  • Creates temporary test environment
  • Uses goroutines to simulate asynchronous file updates
  • Verifies content detection and channel closure
  • Includes proper cleanup with context cancellation

The test structure and use of channels/select statements effectively validates the watcher's behavior.

simapp/upgrades.go (1)

36-38: LGTM: Good logging of upgrade info.

Adding logging for upgrade info read from disk provides useful debugging information.

tools/cosmovisor/internal/testing.go (1)

1-19: LGTM: Clean testing utility implementation.

The context-based callback mechanism follows Go best practices:

  • Uses unexported key type to prevent context key collisions
  • Proper type assertion with safety check
  • Simple, focused API for test hooks
  • Safe fallback behavior when callback is not present

This provides a clean way to inject test callbacks into the execution flow.

tools/cosmovisor/internal/watchers/file_poll_watcher.go (3)

10-15: LGTM: Proper initialization of file modification tracking.

The initial stat check and mod time tracking setup is handled correctly, including proper error handling when the file doesn't exist initially.


22-33: LGTM: Sound file change detection logic.

The file change detection properly:

  • Checks file size to avoid reading empty files
  • Compares modification times for change detection
  • Updates tracking state after successful reads
  • Handles read errors appropriately

The logic correctly identifies when file content has actually changed.


36-38: LGTM: Proper watcher initialization and startup.

The generic poll watcher creation and startup sequence is handled correctly with proper context propagation.

x/upgrade/abci.go (1)

65-86: LGTM! Well-structured manual upgrade implementation.

The manual upgrade logic is correctly positioned to execute only when no scheduled upgrade exists. The implementation properly:

  • Executes the upgrade with infinite gas meter
  • Clears the manual upgrade plan after execution
  • Returns the appropriate consensus parameter change response
tools/cosmovisor/internal/upgrade.go (1)

1-25: LGTM! Clean refactoring to internal package.

The migration to the internal package and added logging statements improve the code structure and observability. The clarified comment better explains the function's purpose.

tools/cosmovisor/cmd/cosmovisor/run.go (1)

67-68: LGTM! Clean integration with the new Runner.

The replacement of Launcher with the internal Runner simplifies the code and aligns well with the overall refactoring.

tools/cosmovisor/internal/watchers/data_watcher_test.go (1)

20-81: Well-structured test with good coverage.

The test comprehensively covers the DataWatcher functionality including error cases (invalid JSON) and proper cleanup on context cancellation. The channel reading pattern in the anonymous function is a good practice.

tools/cosmovisor/cmd/cosmovisor/show_upgrade.go (1)

10-36: LGTM! Clean refactoring to support structured manual upgrades.

The command now properly integrates with the new manual upgrade management system. The use of getConfigFromCmd centralizes config loading, and the structured data handling with indented JSON output improves usability.

tools/cosmovisor/internal/watchers/data_watcher.go (1)

38-41: Clean interface implementation.

The Updated() method correctly implements the Watcher[T] interface with proper channel exposure.

tools/cosmovisor/scanner_test.go (2)

16-76: Test updates align well with the refactoring.

The changes from value types to pointer types for upgrade plans are consistent with the broader protobuf-based refactoring. The updated error expectations and messages appear more accurate.


95-104: Good addition of helper function.

The parseUpgradeInfoFile helper function improves test maintainability and provides a clean interface for testing upgrade info parsing with different configurations.

tools/cosmovisor/internal/watchers/fsnotify_watcher.go (1)

19-80: Well-structured file watcher implementation.

The FSNotifyWatcher provides a robust file monitoring solution with proper error handling, resource management, and integration with the generic watcher interface.

tools/cosmovisor/internal/process.go (1)

38-68: Robust graceful shutdown implementation.

The shutdown logic properly handles:

  • Early exit if process already finished
  • Graceful termination with SIGTERM
  • Fallback to force kill after timeout
  • Proper synchronization via the done channel
tools/cosmovisor/internal/backoff.go (1)

30-63: Comprehensive command change detection and backoff management.

The BeforeRun method implements thorough logic for:

  • Detecting changes in command or arguments to reset backoff
  • Enforcing maximum restart limits with permanent errors
  • Applying exponential backoff with proper logging

The slice comparison logic correctly handles length differences and element-wise comparison.

tools/cosmovisor/internal/watchers/poll_watcher.go (2)

10-25: Well-designed generic polling watcher.

The PollWatcher[T] provides a clean generic abstraction for periodic polling with configurable checker functions and error handling.


44-48: Appropriate use of reflection for generic zero value detection.

The use of reflect.DeepEqual to check against the zero value of type T is necessary for supporting generic types including []byte. This enables the watcher to filter out empty/zero responses while maintaining type safety.

tools/cosmovisor/cmd/cosmovisor/add_upgrade.go (2)

31-74: Improved separation of concerns in upgrade handling.

The refactored addUpgrade function now focuses on binary management and plan creation, delegating persistence to centralized configuration methods. The conditional plan creation based on upgradeHeight > 0 is appropriate.


95-103: Clean integration with centralized upgrade management.

The command handler now properly delegates upgrade plan persistence to cfg.AddManualUpgrades, improving modularity and consistency across the codebase.

tools/cosmovisor/cmd/cosmovisor/batch_upgrade.go (1)

73-97: Efficient batch upgrade processing with centralized management.

The refactored logic correctly:

  • Collects upgrade plans in memory before batch persistence
  • Uses the same addUpgrade logic for consistency
  • Delegates final persistence to cfg.AddManualUpgrades
tools/cosmovisor/internal/watchers/http_block_checker.go (1)

14-19: LGTM: Constructor follows good practices.

The constructor properly initializes the struct and returns the interface type, which is a good practice for dependency injection.

tools/cosmovisor/internal/watchers/watcher.go (3)

10-14: LGTM: Clean generic interface design.

The generic Watcher[T any] interface is well-designed with a clear contract and proper use of Go generics.


44-52: LGTM: Well-structured factory function.

The InitFileWatcher function properly handles both hybrid and polling-only scenarios with clean conditional logic and consistent error handling.


28-34: Behavior is intentional and appropriate

The doc comment on DebugLoggerErrorHandler in
tools/cosmovisor/internal/watchers/watcher.go explicitly states that errors are logged as warnings and warnings as debug. This handler is only used by file/HTTP watchers to avoid treating every read or parse error as fatal. No changes needed.

tools/cosmovisor/internal/process_test.go (1)

1-501: LGTM! Well-structured test refactoring.

The introduction of launchProcessFixture and setupTestLaunchProcessFixture effectively consolidates test setup logic and reduces code duplication across test cases. The transition from launcher.Run to runner.Start aligns with the new internal runner architecture.

tests/systemtests/cosmovisor_test.go (1)

172-180: Well-implemented symlink verification helper.

The helper function correctly verifies that Cosmovisor's current symlink points to the expected upgrade directory across all nodes.

tools/cosmovisor/internal/upgrader.go (1)

23-70: Well-structured upgrade detection logic.

The function correctly checks for both governance and manual upgrades, with proper error handling for edge cases like missed halt heights.

systemtests/system.go (1)

642-673: Well-implemented Cosmovisor integration in node startup.

The modified startNodesAsync method cleanly separates Cosmovisor and regular execution paths with proper environment setup.

tools/cosmovisor/internal/runner.go (1)

34-94: Well-structured main control loop with robust error handling.

The Start method implements a clean control loop with proper upgrade detection, retry management, and error handling for different restart scenarios.

tools/cosmovisor/args.go (6)

4-4: Import additions look good.

The addition of bytes and jsonpb imports properly supports the new protobuf-based upgrade info parsing functionality.

Also applies to: 15-15


39-39: RPC configuration and restart retry limit are well-implemented.

The new RPCAddress field with environment variable support enables the manual upgrade polling mechanism, and the MaxRestartRetries field provides important protection against infinite restart loops.

Also applies to: 70-70, 77-81, 310-314


115-123: Good refactoring of upgrade info path handling.

Centralizing the upgrade info directory path through UpgradeInfoDir() improves maintainability and ensures consistent path construction across the codebase.


437-442: Protobuf JSON marshaling is a good improvement.

Using jsonpb.Marshaler for upgrade plan serialization provides better schema compatibility and evolution compared to standard JSON marshaling.


445-457: Well-structured method for reading pending upgrade info.

The method properly uses the centralized file path and delegates parsing to the dedicated ParseUpgradeInfo method, following good separation of concerns.


641-641: Explicit error ignoring is appropriate here.

The use of _, _ = fmt.Fprintf and the deferred close with error ignoring are acceptable patterns for these non-critical operations.

Also applies to: 652-652, 680-682

Comment on lines +16 to +24
func IsSkipUpgradeHeight(args []string, upgradeInfo upgradetypes.Plan) bool {
skipUpgradeHeights := UpgradeSkipHeights(args)
for _, h := range skipUpgradeHeights {
if h == int(upgradeInfo.Height) {
return true
}
}
return false
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix potential integer overflow and improve type safety.

The conversion from int64 (upgrade plan height) to int (parsed heights) could cause issues on 32-bit systems or with very large block heights.

Apply this diff to improve type safety:

-func IsSkipUpgradeHeight(args []string, upgradeInfo upgradetypes.Plan) bool {
+func IsSkipUpgradeHeight(args []string, upgradeInfo upgradetypes.Plan) bool {
 	skipUpgradeHeights := UpgradeSkipHeights(args)
 	for _, h := range skipUpgradeHeights {
-		if h == int(upgradeInfo.Height) {
+		if int64(h) == upgradeInfo.Height {
 			return true
 		}
 	}
 	return false
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
func IsSkipUpgradeHeight(args []string, upgradeInfo upgradetypes.Plan) bool {
skipUpgradeHeights := UpgradeSkipHeights(args)
for _, h := range skipUpgradeHeights {
if h == int(upgradeInfo.Height) {
return true
}
}
return false
}
func IsSkipUpgradeHeight(args []string, upgradeInfo upgradetypes.Plan) bool {
skipUpgradeHeights := UpgradeSkipHeights(args)
for _, h := range skipUpgradeHeights {
if int64(h) == upgradeInfo.Height {
return true
}
}
return false
}
🤖 Prompt for AI Agents
In tools/cosmovisor/internal/skip.go around lines 16 to 24, the function
converts upgradeInfo.Height from int64 to int, which risks integer overflow on
32-bit systems or with large heights. To fix this, ensure both
skipUpgradeHeights and upgradeInfo.Height use the same int64 type for
comparison, avoiding unsafe conversions and improving type safety.

Comment on lines +28 to +50
func UpgradeSkipHeights(args []string) []int {
var heights []int
for i, arg := range args {
if arg == fmt.Sprintf("--%s", cosmovisor.FlagSkipUpgradeHeight) {
j := i + 1

for j < len(args) {
tArg := args[j]
if strings.HasPrefix(tArg, "-") {
break
}
h, err := strconv.Atoi(tArg)
if err == nil {
heights = append(heights, h)
}
j++
}

break
}
}
return heights
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add validation for parsed heights.

The function should validate that parsed heights are positive integers to prevent invalid skip heights.

Apply this diff to add validation:

				h, err := strconv.Atoi(tArg)
-				if err == nil {
+				if err == nil && h > 0 {
 					heights = append(heights, h)
+				} else if err == nil && h <= 0 {
+					// Log warning about invalid height but continue parsing
 				}
🤖 Prompt for AI Agents
In tools/cosmovisor/internal/skip.go lines 28 to 50, the UpgradeSkipHeights
function parses heights but does not validate that they are positive integers.
Modify the code to check that each parsed height is greater than zero before
appending it to the heights slice, ensuring only valid positive skip heights are
included.

Comment on lines 508 to 522
# build-system-test-current builds the binaries necessary for running system tests, but only those on the current branch
# this is useful if you are iterating on tests which rely on changes to the current branch only (which is most common in development)
build-system-test-current: build cosmovisor
mkdir -p ./tests/systemtests/binaries/
cp $(BUILDDIR)/simd ./tests/systemtests/binaries/
cp tools/cosmovisor/cosmovisor ./tests/systemtests/binaries/

# build-system-test builds the binaries necessary for runnings system tests and places them in the correct locations
build-system-test: build-v53
mkdir -p ./tests/systemtests/binaries/v0.53
mv $(BUILDDIR)/simdv53 ./tests/systemtests/binaries/v0.53/simd

test-system: build-system-test
$(MAKE) -C tests/systemtests test
.PHONY: test-system
.PHONY: build-system-test-current build-system-test test-system
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Build targets look good, but investigate system test failures.

The new build targets are well-structured and properly handle binary preparation for system tests. However, there's a pipeline failure mentioned for system tests.

Run the following script to investigate the system test failures:


🏁 Script executed:

#!/bin/bash
# Description: Check system test setup and identify potential issues

# Check if the system test directory structure exists
fd -t d systemtests

# Check if the test files exist and are executable
fd -t f -e go . tests/systemtests/

# Verify the binaries directory structure after build
echo "Checking expected binary locations..."
ls -la tests/systemtests/binaries/ 2>/dev/null || echo "Binaries directory not found"

Length of output: 493


Fix build-system-test to include current branch binaries

The test-system target depends on build-system-test, which only invokes build-v53 and never creates the root tests/systemtests/binaries/ directory or copies the current branch simd and cosmovisor binaries. As your verification showed, the binaries/ folder is missing, causing system tests to fail.

· Makefile (lines 508–522): update build-system-test dependencies
· Ensure both current and v0.53 binaries are prepared before running tests

Proposed diff:

- build-system-test: build-v53
+ build-system-test: build-system-test-current build-v53

This change makes build-system-test first run build-system-test-current (creating binaries/ and copying simd & cosmovisor), then produce the v0.53 binary, so all tests have the expected binaries available.

🧰 Tools
🪛 GitHub Actions: System Tests

[error] Make target 'test-system' failed with exit code 2 due to system test failures.

🤖 Prompt for AI Agents
In the Makefile around lines 508 to 522, the build-system-test target currently
only runs build-v53 and does not prepare the binaries directory or copy the
current branch binaries, causing system tests to fail. To fix this, modify
build-system-test to first depend on or invoke build-system-test-current so that
the binaries directory is created and current branch binaries (simd and
cosmovisor) are copied, then proceed with building v0.53 binaries. This ensures
all necessary binaries are available before running tests.

Comment on lines +27 to +49
func (j *httpRPCBlockChecker) GetLatestBlockHeight() (uint64, error) {
if j.subUrl != "" {
return j.getLatestBlockHeight(j.subUrl)
}

height, err1 := j.getLatestBlockHeight("/v1/block")
if err1 == nil {
j.logger.Info("Successfully resolved latest block height from /v1/block", "url", j.baseUrl+"/v1/block")
// If we successfully got the height from /v1/block, we can cache the subUrl
j.subUrl = "/v1/block"
return height, nil
}

height, err2 := j.getLatestBlockHeight("/block")
if err2 == nil {
j.logger.Info("Successfully resolved latest block height from /block", "url", j.baseUrl+"/block")
// If we successfully got the height from /block, we can cache the subUrl
j.subUrl = "/block"
return height, nil
}

return 0, fmt.Errorf("failed to get latest block height from both /block and /v1/block RPC endpoints: %w", errors.Join(err1, err2))
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Verify URL validation and consider adding timeout configuration.

The caching strategy is good for performance, but consider these improvements:

  1. The baseUrl is not validated - malformed URLs could cause runtime errors
  2. HTTP requests lack timeout configuration, which could cause hanging requests

Consider adding URL validation in the constructor:

func NewHTTPRPCBLockChecker(baseUrl string, logger log.Logger) HeightChecker {
+	if _, err := url.Parse(baseUrl); err != nil {
+		logger.Error("Invalid base URL provided", "url", baseUrl, "error", err)
+	}
	return &httpRPCBlockChecker{
		baseUrl: baseUrl,
		logger:  logger,
	}
}

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In tools/cosmovisor/internal/watchers/http_block_checker.go lines 27 to 49, the
code lacks validation for the baseUrl and does not set timeouts for HTTP
requests, risking runtime errors and hanging calls. Add URL validation in the
constructor or initialization method to ensure baseUrl is well-formed before
use. Also, configure HTTP client timeouts for all requests made in
getLatestBlockHeight to prevent indefinite blocking.

Comment on lines +224 to +225
ymd := fmt.Sprintf("%d-%d-%d", st.Year(), st.Month(), st.Day())
dst := filepath.Join(u.cfg.DataBackupPath, fmt.Sprintf("data"+"-backup-%s", ymd))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Backup directory naming could cause collisions.

Using only the date (YYYY-MM-DD) for backup directories could cause collisions if multiple backups occur on the same day. Consider adding timestamp or sequence number.

 // a destination directory, Format YYYY-MM-DD
 st := time.Now()
-ymd := fmt.Sprintf("%d-%d-%d", st.Year(), st.Month(), st.Day())
-dst := filepath.Join(u.cfg.DataBackupPath, fmt.Sprintf("data"+"-backup-%s", ymd))
+timestamp := st.Format("2006-01-02-150405") // YYYY-MM-DD-HHMMSS
+dst := filepath.Join(u.cfg.DataBackupPath, fmt.Sprintf("data-backup-%s", timestamp))
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
ymd := fmt.Sprintf("%d-%d-%d", st.Year(), st.Month(), st.Day())
dst := filepath.Join(u.cfg.DataBackupPath, fmt.Sprintf("data"+"-backup-%s", ymd))
// a destination directory, Format YYYY-MM-DD
st := time.Now()
timestamp := st.Format("2006-01-02-150405") // YYYY-MM-DD-HHMMSS
dst := filepath.Join(u.cfg.DataBackupPath, fmt.Sprintf("data-backup-%s", timestamp))
🤖 Prompt for AI Agents
In tools/cosmovisor/internal/upgrader.go around lines 224 to 225, the backup
directory name uses only the date (YYYY-MM-DD), which can cause collisions if
multiple backups happen on the same day. Modify the naming scheme to include a
timestamp with hours, minutes, and seconds or a unique sequence number to ensure
each backup directory name is unique and avoid overwriting previous backups.

Comment on lines +131 to +151
info, err := os.Stat(preupgradeFile)
if err != nil {
u.logger.Error("COSMOVISOR_CUSTOM_PREUPGRADE file missing", "file", preupgradeFile)
return err
}
if !info.Mode().IsRegular() {
_, f := filepath.Split(preupgradeFile)
return fmt.Errorf("COSMOVISOR_CUSTOM_PREUPGRADE: %s is not a regular file", f)
}

// Set the execute bit for only the current user
// Given: Current user - Group - Everyone
// 0o RWX - RWX - RWX
oldMode := info.Mode().Perm()
newMode := oldMode | 0o100
if oldMode != newMode {
if err := os.Chmod(preupgradeFile, newMode); err != nil {
u.logger.Info("COSMOVISOR_CUSTOM_PREUPGRADE could not add execute permission")
return errors.New("COSMOVISOR_CUSTOM_PREUPGRADE could not add execute permission")
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

TOCTOU race condition in file permission check.

There's a time-of-check to time-of-use race condition where the file could be modified between the Stat call and execution. Consider using atomic operations or accepting the race condition with documentation.

 info, err := os.Stat(preupgradeFile)
 if err != nil {
     u.logger.Error("COSMOVISOR_CUSTOM_PREUPGRADE file missing", "file", preupgradeFile)
     return err
 }
 if !info.Mode().IsRegular() {
     _, f := filepath.Split(preupgradeFile)
     return fmt.Errorf("COSMOVISOR_CUSTOM_PREUPGRADE: %s is not a regular file", f)
 }
+
+// Note: There's an inherent TOCTOU race here, but it's acceptable for this use case
+// as the custom pre-upgrade script is expected to be under the operator's control
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
info, err := os.Stat(preupgradeFile)
if err != nil {
u.logger.Error("COSMOVISOR_CUSTOM_PREUPGRADE file missing", "file", preupgradeFile)
return err
}
if !info.Mode().IsRegular() {
_, f := filepath.Split(preupgradeFile)
return fmt.Errorf("COSMOVISOR_CUSTOM_PREUPGRADE: %s is not a regular file", f)
}
// Set the execute bit for only the current user
// Given: Current user - Group - Everyone
// 0o RWX - RWX - RWX
oldMode := info.Mode().Perm()
newMode := oldMode | 0o100
if oldMode != newMode {
if err := os.Chmod(preupgradeFile, newMode); err != nil {
u.logger.Info("COSMOVISOR_CUSTOM_PREUPGRADE could not add execute permission")
return errors.New("COSMOVISOR_CUSTOM_PREUPGRADE could not add execute permission")
}
}
info, err := os.Stat(preupgradeFile)
if err != nil {
u.logger.Error("COSMOVISOR_CUSTOM_PREUPGRADE file missing", "file", preupgradeFile)
return err
}
if !info.Mode().IsRegular() {
_, f := filepath.Split(preupgradeFile)
return fmt.Errorf("COSMOVISOR_CUSTOM_PREUPGRADE: %s is not a regular file", f)
}
// Note: There's an inherent TOCTOU race here, but it's acceptable for this use case
// as the custom pre-upgrade script is expected to be under the operator's control
// Set the execute bit for only the current user
// Given: Current user - Group - Everyone
// 0o RWX - RWX - RWX
oldMode := info.Mode().Perm()
newMode := oldMode | 0o100
if oldMode != newMode {
if err := os.Chmod(preupgradeFile, newMode); err != nil {
u.logger.Info("COSMOVISOR_CUSTOM_PREUPGRADE could not add execute permission")
return errors.New("COSMOVISOR_CUSTOM_PREUPGRADE could not add execute permission")
}
}
🤖 Prompt for AI Agents
In tools/cosmovisor/internal/upgrader.go around lines 131 to 151, there is a
TOCTOU race condition because the file's mode is checked with os.Stat and then
changed with os.Chmod, allowing the file to be modified between these calls. To
fix this, avoid separate stat and chmod calls by attempting to set the execute
bit directly with os.Chmod and handle errors accordingly, or document and accept
the race condition explicitly if atomic operations are not feasible.

Comment on lines +104 to +118
time.Sleep(pollInterval * 3) // wait a bit
rootCmd := NewRootCmd()
rootCmd.SetArgs([]string{
"add-upgrade",
"manual20",
filepath.Join(mockchainDir, "manual-upgrades", "manual20"),
"--upgrade-height",
"20",
"--cosmovisor-config",
cfgFile,
})
rootCmd.SetOut(os.Stdout)
rootCmd.SetErr(os.Stderr)
require.NoError(t, rootCmd.Execute())
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add validation after executing add-upgrade command.

The test doesn't verify that the add-upgrade command succeeded, which could lead to silent test failures.

 rootCmd.SetOut(os.Stdout)
 rootCmd.SetErr(os.Stderr)
 require.NoError(t, rootCmd.Execute())
+
+// Verify the upgrade was added
+upgradePath := filepath.Join(mockchainDir, "cosmovisor", "upgrades", "manual20")
+_, err := os.Stat(upgradePath)
+require.NoError(t, err, "manual upgrade should exist after add-upgrade")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
time.Sleep(pollInterval * 3) // wait a bit
rootCmd := NewRootCmd()
rootCmd.SetArgs([]string{
"add-upgrade",
"manual20",
filepath.Join(mockchainDir, "manual-upgrades", "manual20"),
"--upgrade-height",
"20",
"--cosmovisor-config",
cfgFile,
})
rootCmd.SetOut(os.Stdout)
rootCmd.SetErr(os.Stderr)
require.NoError(t, rootCmd.Execute())
}
time.Sleep(pollInterval * 3) // wait a bit
rootCmd := NewRootCmd()
rootCmd.SetArgs([]string{
"add-upgrade",
"manual20",
filepath.Join(mockchainDir, "manual-upgrades", "manual20"),
"--upgrade-height",
"20",
"--cosmovisor-config",
cfgFile,
})
rootCmd.SetOut(os.Stdout)
rootCmd.SetErr(os.Stderr)
require.NoError(t, rootCmd.Execute())
// Verify the upgrade was added
upgradePath := filepath.Join(mockchainDir, "cosmovisor", "upgrades", "manual20")
_, err := os.Stat(upgradePath)
require.NoError(t, err, "manual upgrade should exist after add-upgrade")
}
🤖 Prompt for AI Agents
In tools/cosmovisor/cmd/cosmovisor/mockchain_test.go around lines 104 to 118,
after executing the add-upgrade command with rootCmd.Execute(), add assertions
to validate the command succeeded as expected. This can include checking the
command output, verifying the upgrade was added to the configuration, or any
relevant state changes to ensure the test does not silently pass without
confirming the command's effect.

Comment on lines 484 to 497
// ParseUpgradeInfo parses the upgrade info from the given byte slice.
func (cfg *Config) ParseUpgradeInfo(bz []byte) (*upgradetypes.Plan, error) {
var upgradePlan upgradetypes.Plan
if err := jsonpb.Unmarshal(bytes.NewReader(bz), &upgradePlan); err != nil {
return nil, fmt.Errorf("error unmarshalling upgrade info: %w", err)
}
if err := upgradePlan.ValidateBasic(); err != nil {
return nil, fmt.Errorf("upgrade info failed validation upgrade inof: %w", err)
}
if !cfg.DisableRecase {
upgradePlan.Name = strings.ToLower(upgradePlan.Name)
}
return &upgradePlan, nil
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix typo in error message.

There's a typo in the error message: "upgrade inof" should be "upgrade info".

Apply this fix:

-		return nil, fmt.Errorf("upgrade info failed validation upgrade inof: %w", err)
+		return nil, fmt.Errorf("upgrade info failed validation: %w", err)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// ParseUpgradeInfo parses the upgrade info from the given byte slice.
func (cfg *Config) ParseUpgradeInfo(bz []byte) (*upgradetypes.Plan, error) {
var upgradePlan upgradetypes.Plan
if err := jsonpb.Unmarshal(bytes.NewReader(bz), &upgradePlan); err != nil {
return nil, fmt.Errorf("error unmarshalling upgrade info: %w", err)
}
if err := upgradePlan.ValidateBasic(); err != nil {
return nil, fmt.Errorf("upgrade info failed validation upgrade inof: %w", err)
}
if !cfg.DisableRecase {
upgradePlan.Name = strings.ToLower(upgradePlan.Name)
}
return &upgradePlan, nil
}
func (cfg *Config) ParseUpgradeInfo(bz []byte) (*upgradetypes.Plan, error) {
var upgradePlan upgradetypes.Plan
if err := jsonpb.Unmarshal(bytes.NewReader(bz), &upgradePlan); err != nil {
return nil, fmt.Errorf("error unmarshalling upgrade info: %w", err)
}
if err := upgradePlan.ValidateBasic(); err != nil {
return nil, fmt.Errorf("upgrade info failed validation: %w", err)
}
if !cfg.DisableRecase {
upgradePlan.Name = strings.ToLower(upgradePlan.Name)
}
return &upgradePlan, nil
}
🤖 Prompt for AI Agents
In tools/cosmovisor/args.go around lines 484 to 497, fix the typo in the error
message returned by ValidateBasic by changing "upgrade inof" to "upgrade info"
to ensure the error message is clear and correct.

return cfg.currentUpgrade, fmt.Errorf("failed to read %q: %w", filename, err)
func (cfg Config) WriteLastKnownHeight(height uint64) error {
filename := filepath.Join(cfg.UpgradeInfoDir(), LastKnownHeightFile)
return os.WriteFile(filename, []byte(strconv.FormatUint(height, 10)), 0o644)

Check failure

Code scanning / gosec

Expect WriteFile permissions to be 0600 or less Error

Expect WriteFile permissions to be 0600 or less
return err
}

return os.WriteFile(cfg.UpgradeInfoBatchFilePath(), manualUpgradesData, 0o644)

Check failure

Code scanning / gosec

Expect WriteFile permissions to be 0600 or less Error

Expect WriteFile permissions to be 0600 or less
Copy link
Contributor

@Eric-Warehime Eric-Warehime left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some initial questions/comments. Still need to go through the main run loops/upgrade logic.

Height: height,
Info: p.Info,
}
info, err := json.Marshal(upgradeInfo)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this going to cause a problem during the initial upgrade? Not sure of the specifics, but I'm concerned about the existing version dumping a something to disk using one codec, and then after the upgrade we attempt to unmarshal using a different codec and it fails to parse. Let me know if this isn't an issue though.

Comment on lines +475 to +480
if err != nil {
return ""
}
cfg.currentUpgrade = u
return cfg.currentUpgrade, nil
if upgradeInfo == nil {
return ""
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if err != nil || upgradeInfo == nil {
    return ""
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't seem like anything in here really needs to be a member of Config since all of the calls to the object are just function calls and any state is being loaded and stored from disk and not really in memory on our struct as far as I can tell.

Comment on lines +16 to +40
// ErrorHandler is an interface for handling errors and warnings in watchers.
type ErrorHandler interface {
// Error handles an error as an error.
Error(msg string, err error)
// Warn handles an error as a warning.
Warn(msg string, err error)
}

type debugLoggerErrorHandler struct {
logger log.Logger
}

func (h *debugLoggerErrorHandler) Error(msg string, err error) {
h.logger.Warn(msg, "error", err)
}

func (h *debugLoggerErrorHandler) Warn(msg string, err error) {
h.logger.Debug(msg, "error", err)
}

// DebugLoggerErrorHandler returns an ErrorHandler that logs errors and warnings using the provided logger,
// but downgrades errors to warnings and warnings to debug logs.
func DebugLoggerErrorHandler(logger log.Logger) ErrorHandler {
return &debugLoggerErrorHandler{logger: logger}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there another implementation of this interface somewhere? I'm struggling to understand why this is useful instead of just using a logger directly.

Comment on lines +39 to +42
if err != nil {
if !os.IsNotExist(err) {
w.errorHandler.Error("failed to check for updates", err)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if err != nil {
if !os.IsNotExist(err) {
w.errorHandler.Error("failed to check for updates", err)
}
if err != nil && !os.IsNotExist(err) {
w.errorHandler.Error("failed to check for updates", err)

Comment on lines +44 to +48
// to make PollWatcher generic on any type T (including []byte), we use reflect.DeepEqual and the default zero value of T
var zero T
if !reflect.DeepEqual(x, zero) {
w.outChan <- x
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the conditions under which we would receive zero valued notifications?

Comment on lines +53 to +55
func (h HybridWatcher) Errors() <-chan error {
return h.errChan
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems correct--the error channel is not used in this implementation.

Comment on lines +32 to +46
height, err1 := j.getLatestBlockHeight("/v1/block")
if err1 == nil {
j.logger.Info("Successfully resolved latest block height from /v1/block", "url", j.baseUrl+"/v1/block")
// If we successfully got the height from /v1/block, we can cache the subUrl
j.subUrl = "/v1/block"
return height, nil
}

height, err2 := j.getLatestBlockHeight("/block")
if err2 == nil {
j.logger.Info("Successfully resolved latest block height from /block", "url", j.baseUrl+"/block")
// If we successfully got the height from /block, we can cache the subUrl
j.subUrl = "/block"
return height, nil
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be nicer to just specify a var which contains all subUrls to check and iterate through them since the logic for each is the same here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you run a hybrid watcher, do you have data on which watcher ends up propagating the updates? Is it always one or the other, or do both of them sometimes get the updates first?

My question is mainly--why is a single watcher implementation not good enough?

@github-actions
Copy link
Contributor

github-actions bot commented Nov 1, 2025

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days-before-close if no further activity occurs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Cosmovisor Upgrade Bug in v1.7.0

6 participants