Skip to content

Blocking tipset validation#6898

Merged
LesnyRumcajs merged 5 commits intomainfrom
blocking-tipset-validation
Apr 14, 2026
Merged

Blocking tipset validation#6898
LesnyRumcajs merged 5 commits intomainfrom
blocking-tipset-validation

Conversation

@LesnyRumcajs
Copy link
Copy Markdown
Member

@LesnyRumcajs LesnyRumcajs commented Apr 13, 2026

Summary of changes

Changes introduced in this pull request:

  • fixed an occasional lock contention scenario during tipset validation. This occurs from time to time during RPC tests, somehow more often when done on mainnet. The main culprit is that that the outer rayon parallel loop we introduced introduces a chance for wedges in the crates we're using; for example, in filecoin-proofs we end up with a deadlock on an SRS file. There's also a rayon-heavy loop in bellperson. Giving up on parallelization incurs a small performance penalty (200 tipsets on mainnet, ~20% slower) but it gets rid of entire class of issues. On machines with fewer cores (my test was done on 32 cores), the penalty should be smaller or it might even get faster.
  • put the validation on a blocking task given it's CPU-bound. Not critical, but if the validation starves the tokio executor, everything else is down - including the ctrl_c handler 🤡

Reference issue to close (if applicable)

Closes #6893

Other information and links

Change checklist

  • I have performed a self-review of my own code,
  • I have made corresponding changes to the documentation. All new code adheres to the team's documentation standards,
  • I have added tests that prove my fix is effective or that my feature works (if possible),
  • I have made sure the CHANGELOG is up-to-date. All user-facing changes should be reflected in this document.

Outside contributions

  • I have read and agree to the CONTRIBUTING document.
  • I have read and agree to the AI Policy document. I understand that failure to comply with the guidelines will lead to rejection of the pull request.

Summary by CodeRabbit

  • Bug Fixes
    • Fixed occasional lock contention during tipset/snapshot validation; validation now runs off the async path and processes ranges sequentially to reduce synchronization and resource contention, with safer bounds checks and improved error propagation.
  • Tests
    • Added unit tests covering edge and invalid head epochs and range handling.
  • Documentation
    • Added an unreleased changelog entry and removed an outdated known-issues note.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 13, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 13295e40-5b85-41d3-8841-407140efe4b6

📥 Commits

Reviewing files that changed from the base of the PR and between a41ef10 and c45ef81.

📒 Files selected for processing (1)
  • src/daemon/mod.rs
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/daemon/mod.rs

Walkthrough

Compute an explicit validation start epoch for snapshot import, offload range validation to a blocking worker via tokio::spawn_blocking, convert tipset outer-loop validation from parallel to sequential while keeping inner rayon parallelism, and add a CHANGELOG entry for issue #6893.

Changes

Cohort / File(s) Summary
Changelog
CHANGELOG.md
Added an unreleased "Fixed" entry for issue #6893 noting a fix for occasional lock contention during tipset validation.
Daemon snapshot import
src/daemon/mod.rs
maybe_import_snapshot now computes a concrete inclusive start..=current_height via a validation_range helper and calls state_manager.validate_range(...) on a tokio::task::spawn_blocking worker, awaiting and propagating both join and validation errors; added unit tests for various from cases and invalid head epochs.
State manager tipset validation
src/state_manager/mod.rs
Replaced parallel outer-loop (par_bridge/parallel iteration) with a sequential for (child, parent) loop in validate_tipsets; documentation updated to state outer-loop is sequential while inner compute remains rayon-parallelized; removed unused ParallelBridge import and deleted an obsolete "Known issues" section.

Sequence Diagram(s)

sequenceDiagram
    participant Daemon
    participant Tokio as "Tokio runtime\n(spawn_blocking)"
    participant StateMgr as "StateManager\n(validate_range)"
    Daemon->>Tokio: spawn_blocking(|| state_mgr.validate_range(range))
    Tokio->>StateMgr: call validate_range(range)
    StateMgr-->>Tokio: return Result<()>
    Tokio-->>Daemon: join and return Result (propagate errors)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested reviewers

  • sudo-shashank
  • akaladarshi
  • hanabi1224
🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Blocking tipset validation' accurately describes the main change: converting parallel tipset validation to sequential blocking validation.
Linked Issues check ✅ Passed The PR addresses #6893 by eliminating lock contention/deadlocks during tipset validation that caused intermittent RPC test failures.
Out of Scope Changes check ✅ Passed All changes are in-scope: validation logic refactoring, unit tests, documentation updates, and changelog entry directly support the objective of fixing RPC test failures.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch blocking-tipset-validation
✨ Simplify code
  • Create PR with simplified code
  • Commit simplified code in branch blocking-tipset-validation

Comment @coderabbitai help to get the list of available commands and usage tips.

@LesnyRumcajs LesnyRumcajs force-pushed the blocking-tipset-validation branch from 3033c2d to 2c9b847 Compare April 13, 2026 14:10
@LesnyRumcajs LesnyRumcajs marked this pull request as ready for review April 13, 2026 14:19
@LesnyRumcajs LesnyRumcajs requested a review from a team as a code owner April 13, 2026 14:19
@LesnyRumcajs LesnyRumcajs requested review from hanabi1224 and sudo-shashank and removed request for a team April 13, 2026 14:19
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@CHANGELOG.md`:
- Line 58: Update the CHANGELOG entry to reference the issue `#6893` instead of PR
`#6898`, correct the typo “occassional” to “occasional”, and replace “snapshot
validation” with “tipset validation” so the line reads using the issue link
format [`#6893`](https://github.com/ChainSafe/forest/issues/6893): Fixed
occasional lock contention during tipset validation.

In `@src/daemon/mod.rs`:
- Around line 179-184: The computed start value (assigned to start from
validate_from and current_height) must be clamped and validated: when
validate_from is negative compute start =
current_height.saturating_add(validate_from) but clamp to 0 (genesis) so it
cannot underflow, and when validate_from is positive ensure start <=
current_height and return or error if start > current_height so
validate_range(start..=current_height) is not a silent no-op; update the logic
around validate_from/current_height/start and add an explicit rejection path
(error/Exit) for start > current_height.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: d857ec5d-aeab-49ee-bca6-ba84a0ea6826

📥 Commits

Reviewing files that changed from the base of the PR and between be0d0db and dea3da4.

📒 Files selected for processing (3)
  • CHANGELOG.md
  • src/daemon/mod.rs
  • src/state_manager/mod.rs

Comment thread CHANGELOG.md Outdated
Comment thread src/daemon/mod.rs Outdated
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 13, 2026

Codecov Report

❌ Patch coverage is 34.21053% with 25 lines in your changes missing coverage. Please review.
✅ Project coverage is 64.12%. Comparing base (be0d0db) to head (c45ef81).
⚠️ Report is 1 commits behind head on main.
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
src/state_manager/mod.rs 0.00% 21 Missing ⚠️
src/daemon/mod.rs 76.47% 4 Missing ⚠️
Additional details and impacted files
Files with missing lines Coverage Δ
src/daemon/mod.rs 29.57% <76.47%> (+1.79%) ⬆️
src/state_manager/mod.rs 66.59% <0.00%> (+0.13%) ⬆️

... and 9 files with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update be0d0db...c45ef81. Read the comment docs.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@LesnyRumcajs LesnyRumcajs added this pull request to the merge queue Apr 14, 2026
Merged via the queue into main with commit e839f42 Apr 14, 2026
59 of 61 checks passed
@LesnyRumcajs LesnyRumcajs deleted the blocking-tipset-validation branch April 14, 2026 08:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[automated] RPC parity test failure @ 12/4/26 01:09

2 participants