fix: size calculation for zstd frame cache#5859
Conversation
WalkthroughThe changes refactor the zstd frame cache to use a size-limited, atomically tracked LRU cache, configurable via an environment variable. Mutex-based synchronization is removed in favor of atomic operations, and the cache API is updated accordingly. Documentation is updated to reflect the new environment variable and terminology. Changes
Sequence Diagram(s)sequenceDiagram
participant Env as Environment
participant App as Application Startup
participant Cache as ZstdFrameCache
participant LRU as SizeTrackingLruCache
Env->>App: FOREST_ZSTD_FRAME_CACHE_DEFAULT_MAX_SIZE
App->>Cache: Initialize with max size from env (default 256 MiB)
App->>LRU: Create size-tracking LRU cache
Cache->>LRU: get(offset, key, cid)
Cache->>LRU: put(offset, key, index)
LRU-->>Cache: Evict if over max size, update atomic size
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~45 minutes Assessment against linked issues
Assessment against linked issues: Out-of-scope changesNo out-of-scope changes detected. Possibly related PRs
Suggested reviewers
Note ⚡️ Unit Test Generation is now available in beta!Learn more here, or try it out under "Finishing Touches" below. 📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (3)
✅ Files skipped from review due to trivial changes (1)
🚧 Files skipped from review as they are similar to previous changes (2)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
✨ Finishing Touches
🧪 Generate unit tests
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
65b123e to
1f17daa
Compare
1f17daa to
d0ddf50
Compare
d0ddf50 to
2cdded9
Compare
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (1)
src/db/car/forest.rs (1)
232-234: Consider adding type annotation for clarity.The
.into()conversions suggest the HashMap now uses a wrapped key type (likelyCidWrapperfor size tracking). Consider adding an explicit type annotation to theblock_mapdeclaration for better code clarity.-let mut block_map = HashMap::new(); +let mut block_map: HashMap<CidWrapper, Vec<u8>> = HashMap::new();
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
docs/dictionary.txt(1 hunks)docs/docs/users/reference/env_variables.md(1 hunks)src/db/car/any.rs(1 hunks)src/db/car/forest.rs(6 hunks)src/db/car/many.rs(2 hunks)src/db/car/mod.rs(2 hunks)src/utils/cache/lru.rs(3 hunks)
🧠 Learnings (1)
src/db/car/mod.rs (1)
Learnt from: hanabi1224
PR: #5841
File: src/utils/get_size/mod.rs:10-10
Timestamp: 2025-07-17T15:21:40.753Z
Learning: The get_size2 crate's GetSize trait provides default implementations: get_stack_size() uses std::mem::size_of, get_heap_size() returns 0, and get_size() returns their sum. An empty impl like impl GetSize for MyType {} is valid and uses these defaults, making it suitable for simple wrapper types that don't allocate heap memory.
🧬 Code Graph Analysis (1)
src/db/car/mod.rs (2)
src/db/car/forest.rs (3)
std(107-107)new(104-122)get(207-245)src/utils/cache/lru.rs (2)
cache(117-119)unbounded_with_default_metrics_registry(113-115)
🧰 Additional context used
🧠 Learnings (1)
src/db/car/mod.rs (1)
Learnt from: hanabi1224
PR: #5841
File: src/utils/get_size/mod.rs:10-10
Timestamp: 2025-07-17T15:21:40.753Z
Learning: The get_size2 crate's GetSize trait provides default implementations: get_stack_size() uses std::mem::size_of, get_heap_size() returns 0, and get_size() returns their sum. An empty impl like impl GetSize for MyType {} is valid and uses these defaults, making it suitable for simple wrapper types that don't allocate heap memory.
🧬 Code Graph Analysis (1)
src/db/car/mod.rs (2)
src/db/car/forest.rs (3)
std(107-107)new(104-122)get(207-245)src/utils/cache/lru.rs (2)
cache(117-119)unbounded_with_default_metrics_registry(113-115)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
- GitHub Check: Build forest binaries on Linux AMD64
- GitHub Check: tests
- GitHub Check: tests-release
- GitHub Check: cargo-publish-dry-run
- GitHub Check: Build MacOS
- GitHub Check: Build Ubuntu
- GitHub Check: All lint checks
🔇 Additional comments (15)
docs/dictionary.txt (1)
120-120: LGTM!The addition of "zstd" to the dictionary is appropriate given the PR's focus on zstd frame cache functionality.
docs/docs/users/reference/env_variables.md (1)
52-52: Verify the default cache size choice.The documentation shows a default of 1073741824 bytes (1 GiB), but the PR objectives suggest considering 256 MiB or 512 MiB as the default. Please confirm if 1 GiB is the intended default value.
src/db/car/any.rs (1)
92-98: LGTM!The removal of the
Mutexwrapper aligns with the refactoring to makeZstdFrameCacheinternally thread-safe.src/db/car/many.rs (1)
26-26: LGTM!The removal of
Mutexwrapper fromshared_cacheis consistent with the refactoring across the codebase to use an internally thread-safeZstdFrameCache.Also applies to: 67-67, 75-75
src/db/car/forest.rs (1)
65-65: LGTM!The removal of
Mutexwrapper and direct method calls onZstdFrameCacheare consistent with the refactoring to use an internally thread-safe cache implementation.Also applies to: 98-98, 118-118, 186-191, 215-215, 235-235
src/utils/cache/lru.rs (4)
62-74: LGTM! Clean refactoring with idiomatic Rust patterns.The
new_innerhelper method effectively consolidates the cache creation logic, usingOption<NonZeroUsize>to elegantly handle both bounded and unbounded cases.
76-81: Good refactoring to use the new helper method.The method maintains its original signature while delegating to
new_inner, preserving backward compatibility.
100-115: Well-structured API additions for unbounded cache support.The three new unbounded constructors follow the established naming pattern and delegation hierarchy, providing a consistent API for both bounded and unbounded cache creation.
141-143: Essential addition for LRU eviction support.The
pop_lrumethod correctly exposes the underlying cache's eviction functionality with proper thread-safety through the write lock.src/db/car/mod.rs (6)
10-10: Appropriate imports for the new functionality.The added imports support atomic size tracking, lazy initialization, and size calculations needed for the refactored cache implementation.
Also applies to: 17-25
55-56: Correct atomic types for thread-safe size tracking.The use of
AtomicUsizeforcurrent_sizeandSizeTrackingLruCacheenables lock-free size tracking, whileCidWrapperprovides the necessaryGetSizeimplementation.
69-72: Proper initialization with metrics support.Creating an unbounded cache is appropriate since size limits are enforced manually through the eviction logic. The metrics registration provides valuable observability.
78-84: Thread-safe implementation with proper type conversion.The method correctly takes
&self(enabling shared access) and acquires a write lock since LRU caches update access order on reads. TheCidtoCidWrapperconversion is handled properly.
87-115: Well-implemented size-aware insertion with atomic operations.The method correctly:
- Skips oversized entries to prevent thrashing
- Uses atomic operations for thread-safe size tracking
- Implements proper eviction to maintain the size limit
- Uses saturating arithmetic to prevent overflow
Note: There's a benign race condition where the cache might temporarily exceed
max_sizebetween threads, but this is acceptable for a cache implementation.
36-49: Environment variable properly documented and implementation approvedThe environment variable
FOREST_ZSTD_FRAME_CACHE_DEFAULT_MAX_SIZEis listed indocs/docs/users/reference/env_variables.mdwith a clear description and default value. The code usesLazyLock, parses intoNonZeroUsize, logs successes and failures, and falls back to 1 GiB—no further changes needed.
- Documentation location:
- docs/docs/users/reference/env_variables.md: entry for
FOREST_ZSTD_FRAME_CACHE_DEFAULT_MAX_SIZE
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (2)
src/db/car/mod.rs (2)
78-84: Consider using read lock for get operations.The method correctly changes to
&selffor concurrent access, but acquiring a write lock for read operations will prevent concurrent reads and impact performance.Consider using a read lock if the
SizeTrackingLruCachesupports it:- self.lru - .cache() - .write() - .get(&(offset, key)) + self.lru + .cache() + .read() + .get(&(offset, key))However, if LRU updates require write access for "recently used" tracking, the current implementation may be necessary.
87-115: Well-implemented put method with minor concurrency considerations.The size-based insertion and eviction logic is well-structured and handles edge cases correctly.
Consider these improvements for robustness:
- Memory ordering: For critical size updates, consider using
Ordering::AcqRelinstead ofRelaxedto ensure consistency:- self.current_size.fetch_add(entry_size, Ordering::Relaxed); + self.current_size.fetch_add(entry_size, Ordering::AcqRel);
- Eviction loop bounds: Add a safety limit to prevent excessive eviction iterations:
+ let mut eviction_count = 0; + const MAX_EVICTIONS: usize = 1000; while self.current_size.load(Ordering::Relaxed) > self.max_size { + if eviction_count >= MAX_EVICTIONS { + tracing::warn!("Eviction loop exceeded maximum iterations"); + break; + } if let Some((prev_key, prev_entry)) = self.lru.pop_lru() { self.current_size.fetch_sub( prev_key.get_size().saturating_add(prev_entry.get_size()), Ordering::Relaxed, ); + eviction_count += 1; } else { break; } }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
docs/docs/users/reference/env_variables.md(1 hunks)src/db/car/mod.rs(2 hunks)
🧠 Learnings (1)
src/db/car/mod.rs (1)
Learnt from: hanabi1224
PR: #5841
File: src/utils/get_size/mod.rs:10-10
Timestamp: 2025-07-17T15:21:40.753Z
Learning: The get_size2 crate's GetSize trait provides default implementations: get_stack_size() uses std::mem::size_of, get_heap_size() returns 0, and get_size() returns their sum. An empty impl like impl GetSize for MyType {} is valid and uses these defaults, making it suitable for simple wrapper types that don't allocate heap memory.
🚧 Files skipped from review as they are similar to previous changes (1)
- docs/docs/users/reference/env_variables.md
🧰 Additional context used
🧠 Learnings (1)
src/db/car/mod.rs (1)
Learnt from: hanabi1224
PR: #5841
File: src/utils/get_size/mod.rs:10-10
Timestamp: 2025-07-17T15:21:40.753Z
Learning: The get_size2 crate's GetSize trait provides default implementations: get_stack_size() uses std::mem::size_of, get_heap_size() returns 0, and get_size() returns their sum. An empty impl like impl GetSize for MyType {} is valid and uses these defaults, making it suitable for simple wrapper types that don't allocate heap memory.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
- GitHub Check: cargo-publish-dry-run
- GitHub Check: Build MacOS
- GitHub Check: Build Ubuntu
- GitHub Check: Check
- GitHub Check: All lint checks
- GitHub Check: Build forest binaries on Linux AMD64
- GitHub Check: tests
- GitHub Check: tests-release
- GitHub Check: Deploy to Cloudflare Pages
🔇 Additional comments (5)
src/db/car/mod.rs (5)
10-10: LGTM: Import changes support the new cache implementation.The new imports are appropriate for the atomic-based cache implementation with size tracking.
Also applies to: 17-25
36-49: LGTM: Well-implemented configurable cache size.The lazy initialization with environment variable support is well-designed:
- Proper error handling with logging
- Uses
NonZeroUsizeto prevent invalid values- Reasonable 256 MiB default as mentioned in PR objectives
55-56: LGTM: Struct fields updated for atomic operations.The change to
AtomicUsizeandSizeTrackingLruCacheproperly supports the new thread-safe, size-aware cache implementation.
61-61: LGTM: Default implementation uses the new configurable size.Correctly references the new static configuration value.
69-73: LGTM: Constructor properly initializes atomic fields.The initialization of
AtomicUsizeandSizeTrackingLruCacheis correct for the new implementation.
| pub fn put(&mut self, offset: FrameOffset, key: CacheKey, index: HashMap<Cid, Vec<u8>>) { | ||
| fn size_of_entry(entry: &HashMap<Cid, Vec<u8>>) -> usize { | ||
| entry.values().map(Vec::len).sum::<usize>() | ||
| pub fn put(&self, offset: FrameOffset, key: CacheKey, index: HashMap<CidWrapper, Vec<u8>>) { |
There was a problem hiding this comment.
I'd love it if we could have some coverage here, both on the logic level and size calculation correctness.
| cache: Arc::new(RwLock::new( | ||
| capacity | ||
| .map(LruCache::new) | ||
| .unwrap_or_else(LruCache::unbounded), |
There was a problem hiding this comment.
Can we ban the LruCache::unbounded and mark this as exception with proper comment?
Co-authored-by: Hubert <hubert@chainsafe.io>
…he-size-calculation
There was a problem hiding this comment.
Actionable comments posted: 0
♻️ Duplicate comments (2)
src/db/car/mod.rs (2)
36-51: Environment variable configuration looks good, but consider the past review suggestion.The implementation properly handles environment variable parsing with appropriate error handling and logging. However, there's a past review comment suggesting a specific format for the error message.
The error message format should match the suggestion from the previous review to maintain consistency.
89-117: Complex but well-implemented size-aware cache insertion with proper eviction logic.The implementation correctly handles:
- Size calculation for both keys and values
- Atomic size tracking for thread safety
- Differentiated logic for replacements vs new insertions
- Proper eviction when cache exceeds max size
However, this is complex logic that benefits from thorough testing as noted in the past review comments.
The size calculation correctness and logic-level coverage mentioned in the previous review would be valuable for this implementation.
🧹 Nitpick comments (1)
src/db/car/mod.rs (1)
80-85: Method signature change enables concurrent access, but consider read performance.The change to
&selfand internal locking enables concurrent cache usage. However, using a write lock for read operations (even for LRU updates) may impact read performance under high concurrency.Consider whether the LRU update on reads is worth the write lock overhead, or if read-only cache lookups might be acceptable in some use cases.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
.clippy.toml(1 hunks)src/db/car/mod.rs(2 hunks)src/utils/cache/lru.rs(4 hunks)
✅ Files skipped from review due to trivial changes (1)
- .clippy.toml
🚧 Files skipped from review as they are similar to previous changes (1)
- src/utils/cache/lru.rs
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
- GitHub Check: tests
- GitHub Check: tests-release
- GitHub Check: Build MacOS
- GitHub Check: cargo-publish-dry-run
- GitHub Check: Build Ubuntu
- GitHub Check: All lint checks
- GitHub Check: Build forest binaries on Linux AMD64
- GitHub Check: Check
- GitHub Check: Deploy to Cloudflare Pages
🔇 Additional comments (5)
src/db/car/mod.rs (5)
10-10: LGTM! Import changes support the new atomic cache implementation.The new imports are appropriate for the refactored cache implementation using atomic operations and size tracking.
Also applies to: 17-25
57-58: Struct refactoring supports thread-safe cache operations.The change from regular size tracking to
AtomicUsizeand the switch toSizeTrackingLruCacheproperly supports the new concurrent access pattern without requiring mutex synchronization.
63-63: Default implementation correctly uses the configurable max size.The change to use the environment-configurable default max size is implemented correctly.
71-74: Constructor properly initializes the new cache implementation.The initialization of
AtomicUsizeandSizeTrackingLruCachewith metrics is implemented correctly and provides good observability.
120-160: Excellent test coverage addressing previous review concerns.The test implementation effectively addresses the past review comment about needing coverage for size calculation correctness and logic-level testing. The randomized testing approach with both insertion and replacement scenarios provides good coverage of the complex cache logic.
The test correctly verifies that the atomic
current_sizestays consistent with the actual LRU cache size, which is the critical invariant for this implementation.
|
@elmattic this needs your review |
elmattic
left a comment
There was a problem hiding this comment.
LGTM. If we introduce a new environment variable, it should be added to the CHANGELOG though.
…he-size-calculation
Summary of changes
Changes introduced in this pull request:
SizeTrackingLruCachecache size is ~120MiB after running Forest on mainnet for a while. How about changing the default max size to 512MiB or even 256MiB @LesnyRumcajs
Reference issue to close (if applicable)
Closes #5858
Other information and links
Change checklist
Summary by CodeRabbit
Summary by CodeRabbit
New Features
Documentation
FOREST_ZSTD_FRAME_CACHE_DEFAULT_MAX_SIZEenvironment variable, including its purpose, default, and example values.Refactor
Chores