fix(rpc): backfill tipset messages from p2p network in a few RPC methods#6421
fix(rpc): backfill tipset messages from p2p network in a few RPC methods#6421LesnyRumcajs merged 6 commits intomainfrom
Conversation
WalkthroughRemoved Changes
Sequence Diagram(s)sequenceDiagram
participant Client as RPC Client
participant RPC as RPC handler (RPCState)
participant ChainSync as chain_sync (get_full_tipset / load_full_tipset)
participant Store as Blockstore / ChainStore
participant Network as P2P Network
Client->>RPC: ChainGetParentMessages / ChainGetMessagesInTipset request
RPC->>ChainSync: resolve full tipset (tipset_key, ctx)
alt SHOULD_BACKFILL (env enabled)
ChainSync->>Network: request full tipset / missing blocks
Network-->>ChainSync: full tipset (blocks + messages)
else local-only
ChainSync->>Store: load persisted full tipset
Store-->>ChainSync: full tipset (blocks + messages)
end
ChainSync-->>RPC: FullTipset
RPC->>RPC: extract messages, dedupe -> ApiMessage list
RPC-->>Client: respond with messages
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Suggested reviewers
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧹 Recent nitpick comments
📜 Recent review detailsConfiguration used: Repository UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (2)
🧰 Additional context used📓 Path-based instructions (1)**/*.rs📄 CodeRabbit inference engine (AGENTS.md)
Files:
🧠 Learnings (5)📓 Common learnings📚 Learning: 2025-08-28T12:52:46.927ZApplied to files:
📚 Learning: 2025-08-08T12:11:55.266ZApplied to files:
📚 Learning: 2026-01-05T12:54:40.850ZApplied to files:
📚 Learning: 2026-01-05T12:56:13.802ZApplied to files:
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
🔇 Additional comments (4)
✏️ Tip: You can disable this entire section by setting Comment |
There was a problem hiding this comment.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/rpc/methods/chain.rs (1)
1305-1344: Add error context and bound timeout for p2p fetch in RPC path
load_api_messages_from_tipsetcallsget_full_tipset, which triggers a p2p fetch and persist without proper error context or timeout handling. Two issues:
- Add
.context(...)aroundget_full_tipset(...).awaitto include theTipsetKeyin error messages, following coding guidelines.- Wrap the
get_full_tipsetcall withtokio::time::timeoutto prevent indefinite hangs on the RPC path. The underlyingchain_exchange_full_tipsethas no timeout, which is unsafe for RPC usage.Consider using
tokio::task::spawn_blockingif the blockstorepersistoperation blocks the async runtime.
🧹 Nitpick comments (2)
src/chain_sync/mod.rs (1)
14-21: Confirmget_full_tipsetis meant to be part of the public API surfaceRe-exporting
get_full_tipsetfromsrc/chain_sync/mod.rsmakes it publicly reachable ascrate::chain_sync::get_full_tipset. If this is only intended for internal RPC backfill wiring, consider reducing topub(crate) use ...to avoid locking in an API contract.Also ensure
get_full_tipsetis documented at its definition site (it’s now effectively part of the public API).src/rpc/methods/chain.rs (1)
284-298: Avoid loading the parentTipsetjust to get its keyIn
ChainGetParentMessages, you already have the parentTipsetKeyatblock_header.parents, soTipset::load_required(store, &block_header.parents)?is redundant and can fail even though backfill-by-key would work.Proposed change
if block_header.epoch == 0 { Ok(vec![]) } else { - let parent_tipset = Tipset::load_required(store, &block_header.parents)?; - load_api_messages_from_tipset(&ctx, parent_tipset.key()).await + load_api_messages_from_tipset(&ctx, &block_header.parents).await }
📜 Review details
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
src/blocks/tipset.rssrc/chain_sync/mod.rssrc/rpc/methods/chain.rs
💤 Files with no reviewable changes (1)
- src/blocks/tipset.rs
🧰 Additional context used
📓 Path-based instructions (2)
**/*.rs
📄 CodeRabbit inference engine (AGENTS.md)
**/*.rs: Useanyhow::Result<T>for most operations and add context with.context()when operations fail
Usetokio::task::spawn_blockingfor CPU-intensive work such as VM execution and cryptography operations
Use.get()instead of indexing with[index]for safe element access
Do not useunwrap()in production code; use?orexpect()with descriptive messages instead
Do not usedbg!macro in non-test Rust code
Do not usetodo!macro in non-test Rust code
Usestrumcrate for enum string conversions
Usederive_morecrate for common trait implementations
Document all public functions and structs with doc comments
Use#[cfg(test)]or#[cfg(feature = "doctest-private")]for test-only code
Use channel-based communication (flume, tokio channels) between async tasks
Use consistent formatting followingcargo fmtstandards
Ensure code passescargo clippy --all-targets --no-deps -- --deny=warningslinter checks
Files:
src/chain_sync/mod.rssrc/rpc/methods/chain.rs
**/mod.rs
📄 CodeRabbit inference engine (AGENTS.md)
Each module should have a
mod.rsfile that exports the public API with private submodules for implementation details
Files:
src/chain_sync/mod.rs
🧠 Learnings (3)
📓 Common learnings
Learnt from: hanabi1224
Repo: ChainSafe/forest PR: 5944
File: src/chain/store/index.rs:0-0
Timestamp: 2025-08-18T03:09:47.932Z
Learning: In Forest's tipset_by_height caching implementation, hanabi1224 prefers performance-conscious solutions that leverage finality guarantees rather than expensive chain walking for fork detection. The approach of constraining cache lookups to finalized epochs (using CHECKPOINT_INTERVAL >= CHAIN_FINALITY) provides fork safety without the performance cost of ancestry verification.
Learnt from: hanabi1224
Repo: ChainSafe/forest PR: 5930
File: build.rs:64-77
Timestamp: 2025-08-13T09:43:20.301Z
Learning: hanabi1224 prefers hard compile-time errors in build scripts rather than runtime safeguards or collision detection, believing it's better to fail fast and fix root causes of issues like malformed snapshot names.
Learnt from: hanabi1224
Repo: ChainSafe/forest PR: 6057
File: src/cli/subcommands/f3_cmd.rs:0-0
Timestamp: 2025-09-09T10:37:17.947Z
Learning: hanabi1224 prefers having default timeouts (like 10m for --no-progress-timeout) to prevent commands from hanging indefinitely, even when the timeout flag isn't explicitly provided by users. This fail-fast approach is preferred over requiring explicit flag usage.
📚 Learning: 2026-01-05T12:54:40.850Z
Learnt from: hanabi1224
Repo: ChainSafe/forest PR: 6381
File: src/lotus_json/actors/states/cron_state.rs:8-8
Timestamp: 2026-01-05T12:54:40.850Z
Learning: In Rust code reviews, do not derive Eq for a struct if any field does not implement Eq (e.g., types from external dependencies). If a type like CronStateLotusJson includes fields wrapping external dependencies that lack Eq, derive PartialEq (or implement PartialEq manually) but avoid deriving Eq. This ensures comparisons compile and reflect actual equivalence semantics. When needed, consider implementing custom PartialEq (and possibly Eq) only after ensuring all fields (or wrappers) implement Eq, or keep PartialEq-only if full equality semantics cannot be expressed.
Applied to files:
src/chain_sync/mod.rssrc/rpc/methods/chain.rs
📚 Learning: 2026-01-05T12:56:13.802Z
Learnt from: hanabi1224
Repo: ChainSafe/forest PR: 6381
File: src/lotus_json/actors/states/evm_state.rs:41-44
Timestamp: 2026-01-05T12:56:13.802Z
Learning: In Rust codebases (e.g., Forest), do not add #[cfg(test)] to functions already annotated with #[test]. The #[test] attribute ensures the function is compiled only for tests, so a separate #[cfg(test)] is redundant and can be removed if present. Apply this check to all Rust files that contain #[test] functions.
Applied to files:
src/chain_sync/mod.rssrc/rpc/methods/chain.rs
🧬 Code graph analysis (2)
src/chain_sync/mod.rs (1)
src/chain_sync/chain_follower.rs (3)
chain_follower(130-323)get_full_tipset(426-444)load_full_tipset(469-493)
src/rpc/methods/chain.rs (1)
src/chain_sync/chain_follower.rs (1)
get_full_tipset(426-444)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
- GitHub Check: Coverage
- GitHub Check: tests-release
- GitHub Check: Build forest binaries on Linux AMD64
- GitHub Check: Build Ubuntu
- GitHub Check: cargo-publish-dry-run
- GitHub Check: Build MacOS
- GitHub Check: All lint checks
- GitHub Check: rubocop
🔇 Additional comments (1)
src/rpc/methods/chain.rs (1)
358-366: Double-check the intended “unknown tipset key” behavior after switching to backfill
ChainGetMessagesInTipsetstill callsload_required_tipset_or_heaviestbefore backfilling messages. That likely prevents the RPC from fetching/persisting tipsets that aren’t already known to the local chain (good for avoiding arbitrary network reads), but it also means “missing tipset headers” won’t be recovered here.If the goal is strictly “messages are missing but tipset exists locally”, this is perfect—otherwise consider whether you want to pass
&tipset_keydirectly whenSome(...)and only fall back to heaviest whenNone.
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
Codecov Report❌ Patch coverage is
Additional details and impacted files
... and 7 files with indirect coverage changes Continue to review full report in Codecov by Sentry.
🚀 New features to boost your workflow:
|
| ) -> Result<Self::Ok, ServerError> { | ||
| let tipset = ctx | ||
| .chain_store() | ||
| .load_required_tipset_or_heaviest(&tipset_key)?; |
There was a problem hiding this comment.
Small doubt: Is it possible here that tipset_key is available in the store but the messages (full-tipset) is not available?
There was a problem hiding this comment.
Yes. Messages are periodically garbage collected.
There was a problem hiding this comment.
So the tipset is not garbage collected 👍🏼.
There was a problem hiding this comment.
Block headers are the spine of the chain that are never garbage collected
1c748fb to
991411a
Compare
| | `FOREST_ZSTD_FRAME_CACHE_DEFAULT_MAX_SIZE` | positive integer | 268435456 | 536870912 | The default zstd frame cache max size in bytes | | ||
| | `FOREST_JWT_DISABLE_EXP_VALIDATION` | 1 or true | empty | 1 | Whether or not to disable JWT expiration validation | | ||
| | `FOREST_ETH_BLOCK_CACHE_SIZE` | positive integer | 500 | 1 | The size of Eth block cache | | ||
| | `FOREST_RPC_BACKFILL_FULL_TIPSET_FROM_NETWORK` | 1 or true | false | 1 | Whether or not to backfill full tipsets from the p2p network | |
There was a problem hiding this comment.
Actually, let's add some warning that enabling this might lead to excessive disk and bandwidth usage.
Also, let's add a CHANGELOG entry.
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/rpc/methods/chain.rs (1)
1306-1354: Move the backfill warning toLazyLockinitialization to log once instead of per-request.The
tracing::warn!is currently logged on every RPC request whenFOREST_RPC_BACKFILL_FULL_TIPSET_FROM_NETWORKis enabled. Sinceload_api_messages_from_tipsetis called for each RPC request (fromChainGetParentMessages,ChainGetParentReceipts, etc.), the warning creates spam in logs. Moving the warning into theLazyLock::new()closure ensures it fires only once during initialization while preserving the important message for users:Proposed change
async fn load_api_messages_from_tipset<DB: Blockstore + Send + Sync + 'static>( ctx: &crate::rpc::RPCState<DB>, tipset_keys: &TipsetKey, ) -> Result<Vec<ApiMessage>, ServerError> { - static SHOULD_BACKFILL: LazyLock<bool> = - LazyLock::new(|| is_env_truthy("FOREST_RPC_BACKFILL_FULL_TIPSET_FROM_NETWORK")); + static SHOULD_BACKFILL: LazyLock<bool> = LazyLock::new(|| { + let enabled = is_env_truthy("FOREST_RPC_BACKFILL_FULL_TIPSET_FROM_NETWORK"); + if enabled { + tracing::warn!( + "Full tipset backfilling from network is enabled via FOREST_RPC_BACKFILL_FULL_TIPSET_FROM_NETWORK; excessive disk and bandwidth usage is expected." + ); + } + enabled + }); let full_tipset = if *SHOULD_BACKFILL { - tracing::warn!( - "Full tipset backfilling from network is enabled via FOREST_RPC_BACKFILL_FULL_TIPSET_FROM_NETWORK, excessive disk and bandwidth usage is expected." - ); get_full_tipset( &ctx.sync_network_context, ctx.chain_store(), None, tipset_keys, ) .await? } else { load_full_tipset(ctx.chain_store(), tipset_keys)? };The network layer uses adaptive timeouts (
CHAIN_EXCHANGE_TIMEOUT_MILLISwithrecv_timeout()) to prevent indefinite hangs, andload_full_tipsetis purely local. Both code paths are safe.
🤖 Fix all issues with AI agents
In `@CHANGELOG.md`:
- Around line 46-47: Fix the typo in the changelog entry text for PR `#6421`:
replace "metheods" with "methods" in the sentence "-
[`#6421`](https://github.com/ChainSafe/forest/pull/6421) Add an environment
variable `FOREST_RPC_BACKFILL_FULL_TIPSET_FROM_NETWORK` to enable backfilling
full tipsets from network in a few RPC metheods." so the word reads "methods".
📜 Review details
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
CHANGELOG.mdsrc/rpc/methods/chain.rs
🧰 Additional context used
📓 Path-based instructions (1)
**/*.rs
📄 CodeRabbit inference engine (AGENTS.md)
**/*.rs: Useanyhow::Result<T>for most operations and add context with.context()when operations fail
Usetokio::task::spawn_blockingfor CPU-intensive work such as VM execution and cryptography operations
Use.get()instead of indexing with[index]for safe element access
Do not useunwrap()in production code; use?orexpect()with descriptive messages instead
Do not usedbg!macro in non-test Rust code
Do not usetodo!macro in non-test Rust code
Usestrumcrate for enum string conversions
Usederive_morecrate for common trait implementations
Document all public functions and structs with doc comments
Use#[cfg(test)]or#[cfg(feature = "doctest-private")]for test-only code
Use channel-based communication (flume, tokio channels) between async tasks
Use consistent formatting followingcargo fmtstandards
Ensure code passescargo clippy --all-targets --no-deps -- --deny=warningslinter checks
Files:
src/rpc/methods/chain.rs
🧠 Learnings (3)
📓 Common learnings
Learnt from: hanabi1224
Repo: ChainSafe/forest PR: 5930
File: build.rs:64-77
Timestamp: 2025-08-13T09:43:20.301Z
Learning: hanabi1224 prefers hard compile-time errors in build scripts rather than runtime safeguards or collision detection, believing it's better to fail fast and fix root causes of issues like malformed snapshot names.
Learnt from: hanabi1224
Repo: ChainSafe/forest PR: 6057
File: src/cli/subcommands/f3_cmd.rs:0-0
Timestamp: 2025-09-09T10:37:17.947Z
Learning: hanabi1224 prefers having default timeouts (like 10m for --no-progress-timeout) to prevent commands from hanging indefinitely, even when the timeout flag isn't explicitly provided by users. This fail-fast approach is preferred over requiring explicit flag usage.
📚 Learning: 2026-01-05T12:54:40.850Z
Learnt from: hanabi1224
Repo: ChainSafe/forest PR: 6381
File: src/lotus_json/actors/states/cron_state.rs:8-8
Timestamp: 2026-01-05T12:54:40.850Z
Learning: In Rust code reviews, do not derive Eq for a struct if any field does not implement Eq (e.g., types from external dependencies). If a type like CronStateLotusJson includes fields wrapping external dependencies that lack Eq, derive PartialEq (or implement PartialEq manually) but avoid deriving Eq. This ensures comparisons compile and reflect actual equivalence semantics. When needed, consider implementing custom PartialEq (and possibly Eq) only after ensuring all fields (or wrappers) implement Eq, or keep PartialEq-only if full equality semantics cannot be expressed.
Applied to files:
src/rpc/methods/chain.rs
📚 Learning: 2026-01-05T12:56:13.802Z
Learnt from: hanabi1224
Repo: ChainSafe/forest PR: 6381
File: src/lotus_json/actors/states/evm_state.rs:41-44
Timestamp: 2026-01-05T12:56:13.802Z
Learning: In Rust codebases (e.g., Forest), do not add #[cfg(test)] to functions already annotated with #[test]. The #[test] attribute ensures the function is compiled only for tests, so a separate #[cfg(test)] is redundant and can be removed if present. Apply this check to all Rust files that contain #[test] functions.
Applied to files:
src/rpc/methods/chain.rs
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
- GitHub Check: tests-release
- GitHub Check: Build MacOS
- GitHub Check: Build Ubuntu
- GitHub Check: cargo-publish-dry-run
- GitHub Check: Build forest binaries on Linux AMD64
- GitHub Check: Coverage
- GitHub Check: rubocop
- GitHub Check: Deploy to Cloudflare Pages
- GitHub Check: All lint checks
🔇 Additional comments (2)
src/rpc/methods/chain.rs (2)
13-14: Imports look consistent with the new backfill path.Also applies to: 31-32
285-300: RPC handler bound changes (Send + Sync + 'static) are reasonable for async tipset loading.One thing to double-check: the backfill path can introduce long tail latency for these endpoints; make sure
get_full_tipsetenforces a timeout and fails fast rather than hanging.Also applies to: 359-368
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
Summary of changes
Changes introduced in this pull request:
FOREST_RPC_BACKFILL_FULL_TIPSET_FROM_NETWORK=1Reference issue to close (if applicable)
Closes
Other information and links
Change checklist
Summary by CodeRabbit
Refactor
New Features
API
✏️ Tip: You can customize this high-level summary in your review settings.