fix: reduce allocations in chain traversal by hanabi1224 · Pull Request #6009 · ChainSafe/forest

hanabi1224 · 2025-08-28T12:18:24Z

Summary of changes

This PR tries to reduce heap allocations in chain traversal (fn stream_chain) by replacing Vec with SmallVec. This would benefit chain export and snapshot validatation, etc.

BTW, I believe it could be further improved by refactoring fn extract_cids into returning an iterator of CIDs

Perf gains on my laptop:
calibnet: ~4m40s -> ~3m50s
mainnet: ~47m -> ~37m

PR branch:
➜  snapshots forest-tool -V && forest-tool benchmark graph-traversal forest_snapshot_calibnet_2023-07-30_height_780000.forest.car.zst
forest-filecoin 0.29.0+git.0c00fb4c
  traversed 2.65 GiB at 65.20 MiB/s in 00:00:41                                                                         ➜  snapshots forest-tool -V && forest-tool benchmark graph-traversal forest_snapshot_calibnet_2023-07-30_height_780000.forest.car.zst
forest-filecoin 0.29.0+git.0c00fb4c
  traversed 2.65 GiB at 66.89 MiB/s in 00:00:40                                                                         ➜  snapshots forest-tool -V && forest-tool benchmark graph-traversal forest_snapshot_calibnet_2023-07-30_height_780000.forest.car.zst
forest-filecoin 0.29.0+git.0c00fb4c
  traversed 2.65 GiB at 66.89 MiB/s in 00:00:40               
  
main branch:
➜  snapshots forest-tool -V && forest-tool benchmark graph-traversal forest_snapshot_calibnet_2023-07-30_height_780000.forest.car.zst
forest-filecoin 0.29.0+git.15cdbb31
  traversed 2.65 GiB at 64.23 MiB/s in 00:00:42                                                                         ➜  snapshots forest-tool -V && forest-tool benchmark graph-traversal forest_snapshot_calibnet_2023-07-30_height_780000.forest.car.zst
forest-filecoin 0.29.0+git.15cdbb31
  traversed 2.65 GiB at 66.59 MiB/s in 00:00:40                                                                         ➜  snapshots forest-tool -V && forest-tool benchmark graph-traversal forest_snapshot_calibnet_2023-07-30_height_780000.forest.car.zst
forest-filecoin 0.29.0+git.15cdbb31
  traversed 2.65 GiB at 62.37 MiB/s in 00:00:43

Changes introduced in this pull request:

Reference issue to close (if applicable)

Closes

Change checklist

I have performed a self-review of my own code,
I have made corresponding changes to the documentation. All new code adheres to the team's documentation standards,
I have added tests that prove my fix is effective or that my feature works (if possible),
I have made sure the CHANGELOG is up-to-date. All user-facing changes should be reflected in this document.

Summary by CodeRabbit

Performance
- Reduced memory allocations in CID extraction for faster, more efficient CBOR decoding and benchmarking workflows.
API Changes
- Updated the CID extraction function to return a small-vector type instead of a standard vector; update integrations to use the new return type.
Refactor
- Minor cleanup to leverage type inference in benchmarking logic with no behavioral changes.

coderabbitai · 2025-08-28T12:18:30Z

Walkthrough

Refactors CID deserialization to use a small-vector optimization via a new SmallCidVec type and updates extract_cids to return it. Adjusts benchmark command to rely on type inference when collecting CIDs from CBOR blocks. No other control flow or exported declarations are modified beyond the return type change.

Changes

Cohort / File(s)	Summary
CID CBOR extraction internals `src/utils/encoding/cid_de_cbor.rs`	Added `pub type SmallCidVec = SmallVec<[Cid; 8]>`. Changed `extract_cids` return type to `anyhow::Result<SmallCidVec>`. Updated internal collectors (`CidVec`, `FilterCids`) and serde seeds to accumulate into `SmallCidVec` instead of `Vec<Cid>`.
Benchmark command adaptation `src/tool/subcommands/benchmark_cmd.rs`	Removed explicit `Cid` import. Replaced explicit `Vec<Cid>` annotation with type inference for `extract_cids(&block.data)?` result. Logic for codec checks, uniqueness, and sink writes unchanged.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant CLI as CLI
  participant Bench as benchmark_cmd
  participant CID as extract_cids (CBOR→SmallCidVec)

  CLI->>Bench: Run benchmark (CAR streaming inspect)
  Bench->>CID: extract_cids(block.data)
  CID-->>Bench: SmallCidVec of CIDs
  Bench->>Bench: Check codec, compute unique counts
  Bench->>Sink: Write block data
  Bench-->>CLI: Report results

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested reviewers

LesnyRumcajs
elmattic

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 3b0a663 and a41c6c2.

📒 Files selected for processing (1)

src/tool/subcommands/benchmark_cmd.rs (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

src/tool/subcommands/benchmark_cmd.rs

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)

GitHub Check: cargo-publish-dry-run
GitHub Check: Build forest binaries on Linux AMD64
GitHub Check: tests-release
GitHub Check: Build MacOS
GitHub Check: tests
GitHub Check: Build Ubuntu
GitHub Check: All lint checks
GitHub Check: Analyze (rust)
GitHub Check: Analyze (go)

✨ Finishing Touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch hm/reduce-alloc-in-chain-traversal

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

Add @coderabbitai ignore or @coderabbit ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

Visit our Status Page to check the current availability of CodeRabbit.
Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

LesnyRumcajs · 2025-08-28T12:21:02Z

it's a draft but I already want it

hanabi1224 · 2025-08-28T13:12:26Z

@CodeRabbit review

coderabbitai · 2025-08-28T13:12:32Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

src/ipld/util.rs (1)

430-460: Reduce per-iteration allocations and unify type alias usage.

Good switch to SmallVec. Two follow-ups:

Reuse the buffer across iterations to avoid re-allocating cid_vec for every received CID.
Prefer the shared alias to keep the inline capacity consistent across the codebase.

Apply this diff within the worker loop:

-                    'main: while let Ok(cid) = extract_receiver.recv_async().await {
-                        let mut cid_vec: smallvec::SmallVec<[_; 8]> = smallvec::smallvec![cid];
+                    let mut cid_vec: crate::utils::encoding::cid_de_cbor::SmallCidVec = smallvec::SmallVec::new();
+                    'main: while let Ok(cid) = extract_receiver.recv_async().await {
+                        cid_vec.push(cid);
                         while let Some(cid) = cid_vec.pop() {
                             if should_save_block_to_snapshot(cid) && seen.lock().insert(cid) {
                                 if let Some(data) = db.get(&cid)? {
                                     if cid.codec() == fvm_ipld_encoding::DAG_CBOR {
                                         let mut new_values = extract_cids(&data)?;
                                         cid_vec.append(&mut new_values);
                                     }
                                     // Break out of the loop if the receiving end quit.
                                     if block_sender
                                         .send_async(Ok(CarBlock { cid, data }))
                                         .await
                                         .is_err()
                                     {
                                         break 'main;
                                     }
                                 } else if fail_on_dead_links {
                                     // If the receiving end has already quit - just ignore it and
                                     // break out of the loop.
                                     let _ = block_sender
                                         .send_async(Err(anyhow::anyhow!(
                                             "[Send] missing key: {cid}"
                                         )))
                                         .await;
                                     break 'main;
                                 }
                             }
                         }
+                        cid_vec.clear();
                     }

If you prefer a minimal change, at least switch the type on initialization:

-                        let mut cid_vec: smallvec::SmallVec<[_; 8]> = smallvec::smallvec![cid];
+                        let mut cid_vec: crate::utils::encoding::cid_de_cbor::SmallCidVec = smallvec::smallvec![cid];

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 15cdbb3 and 0c00fb4.

📒 Files selected for processing (3)

src/ipld/util.rs (1 hunks)
src/tool/subcommands/benchmark_cmd.rs (1 hunks)
src/utils/encoding/cid_de_cbor.rs (4 hunks)

🧰 Additional context used

🧠 Learnings (2)

📓 Common learnings

Learnt from: hanabi1224
PR: ChainSafe/forest#5930
File: build.rs:64-77
Timestamp: 2025-08-13T09:43:20.301Z
Learning: hanabi1224 prefers hard compile-time errors in build scripts rather than runtime safeguards or collision detection, believing it's better to fail fast and fix root causes of issues like malformed snapshot names.

📚 Learning: 2025-08-08T12:11:55.266Z

Learnt from: hanabi1224
PR: ChainSafe/forest#5867
File: src/ipld/util.rs:461-487
Timestamp: 2025-08-08T12:11:55.266Z
Learning: Forest (src/ipld/util.rs, Rust): In UnorderedChainStream::poll_next, dropping `extract_sender` (when no more tipsets and the extract queue is empty) is the intended shutdown signal for workers. Any subsequent attempt to enqueue work after this drop is a logic error and should be treated as an error; do not change `send()` to ignore a missing sender.

Applied to files:

src/ipld/util.rs

🔇 Additional comments (4)

src/utils/encoding/cid_de_cbor.rs (3)

177-180: LGTM: zero-allocation init path maintained.

CidVec(SmallCidVec::new()) aligns with the new storage and avoids upfront heap allocation.

9-13: Dependency smallvec already declared in Cargo.toml; no action required.

12-19: No explicit Vec expectations from extract_cids
All call sites infer or handle SmallCidVec without relying on Vec annotations—no further changes needed.

src/tool/subcommands/benchmark_cmd.rs (1)

150-152: LGTM: rely on type inference for extract_cids.

Clean and future-proof against container type changes.

…ain-traversal

hanabi1224 · 2025-09-02T08:20:11Z

I had to resolve a merge conflict. Please re-approve. @LesnyRumcajs @akaladarshi

fix: reduce allocations in chain traversal

0c00fb4

coderabbitai Bot reviewed Aug 28, 2025

View reviewed changes

Comment thread src/utils/encoding/cid_de_cbor.rs Outdated

apply AI suggestions

3b0a663

hanabi1224 marked this pull request as ready for review August 28, 2025 14:16

hanabi1224 requested a review from a team as a code owner August 28, 2025 14:16

hanabi1224 requested review from LesnyRumcajs and akaladarshi and removed request for a team August 28, 2025 14:16

LesnyRumcajs previously approved these changes Aug 28, 2025

View reviewed changes

akaladarshi previously approved these changes Sep 2, 2025

View reviewed changes

hanabi1224 added this pull request to the merge queue Sep 2, 2025

github-merge-queue Bot removed this pull request from the merge queue due to a conflict with the base branch Sep 2, 2025

Merge remote-tracking branch 'origin/main' into hm/reduce-alloc-in-ch…

a41c6c2

…ain-traversal

hanabi1224 dismissed stale reviews from akaladarshi and LesnyRumcajs via a41c6c2 September 2, 2025 08:17

hanabi1224 requested review from LesnyRumcajs and akaladarshi September 2, 2025 08:19

hanabi1224 enabled auto-merge September 2, 2025 08:19

LesnyRumcajs approved these changes Sep 2, 2025

View reviewed changes

akaladarshi approved these changes Sep 2, 2025

View reviewed changes

hanabi1224 added this pull request to the merge queue Sep 2, 2025

Merged via the queue into main with commit 8420991 Sep 2, 2025
44 checks passed

hanabi1224 deleted the hm/reduce-alloc-in-chain-traversal branch September 2, 2025 09:09

coderabbitai Bot mentioned this pull request Feb 16, 2026

chore: limit Vec preallocation to follow serde convention #6616

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: reduce allocations in chain traversal#6009

fix: reduce allocations in chain traversal#6009
hanabi1224 merged 3 commits intomainfrom
hm/reduce-alloc-in-chain-traversal

hanabi1224 commented Aug 28, 2025 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Aug 28, 2025 •

edited

Loading

Chat

Support

CodeRabbit Commands (Invoked using PR/Issue comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Status, Documentation and Community

Uh oh!

LesnyRumcajs commented Aug 28, 2025

Uh oh!

hanabi1224 commented Aug 28, 2025

Uh oh!

coderabbitai Bot commented Aug 28, 2025

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

hanabi1224 commented Sep 2, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

hanabi1224 commented Aug 28, 2025 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary of changes

Reference issue to close (if applicable)

Other information and links

Change checklist

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Aug 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Suggested reviewers

Chat

Support

CodeRabbit Commands (Invoked using PR/Issue comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Status, Documentation and Community

Uh oh!

LesnyRumcajs commented Aug 28, 2025

Uh oh!

hanabi1224 commented Aug 28, 2025

Uh oh!

coderabbitai Bot commented Aug 28, 2025

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

hanabi1224 commented Sep 2, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

hanabi1224 commented Aug 28, 2025 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Aug 28, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)