[TieredStorage] Enable file hash in TieredStorage by yhchiang-sol · Pull Request #170 · anza-xyz/agave

yhchiang-sol · 2024-03-11T06:12:28Z

Problem

File hash feature isn't implemented in tiered-storage.

Summary of Changes

This PR enables file hash in TieredStorage with the following changes.

Test Plan

Added new unit-tests for file hash match and mismatch cases.
Updated existing tests to also cover the file hash verification.

codecov-commenter · 2024-03-11T08:23:21Z

Codecov Report

❌ Patch coverage is 94.40559% with 8 lines in your changes missing coverage. Please review.
✅ Project coverage is 81.9%. Comparing base (30eecd6) to head (caa8af6).
⚠️ Report is 7050 commits behind head on master.

Additional details and impacted files

@@            Coverage Diff            @@
##           master     #170     +/-   ##
=========================================
- Coverage    81.9%    81.9%   -0.1%     
=========================================
  Files         840      841      +1     
  Lines      228107   228357    +250     
=========================================
+ Hits       186851   187047    +196     
- Misses      41256    41310     +54

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

brooksprumo

Can you split up this PR please, it appears to contain multiple orthogonal changes.

In particular these:

Deprecate tiered_storage/writer.rs as we currently don't have plans
to develop writers other than HotStorageWriter for TieredStorage.

Should be its own PR. Not fully on board with this change though.

Bump the footer format version as the older file will have hash mismatch.

Should be it's own PR

The others appear related.

yhchiang-sol · 2024-03-11T23:15:41Z

Can you split up this PR please, it appears to contain multiple orthogonal changes.

Sure. I feel the same way as well. Will split the PR!

yhchiang-sol · 2024-03-20T08:32:37Z

Rebased on top of #195 so that file-hash check are also done inside the constructor of TieredReadableFile.

yhchiang-sol · 2024-03-20T17:56:36Z

+        self.hash = file.hash();
+        file.write_pod(&self.hash)?;
+
+        file.write_pod(&self.format_version)?;
+        file.write_pod(&self.footer_size)?;


@brooksprumo: I think we can consider reordering the items in the footer tail so that the format version and footer-size are also hashed.

And, together with your previous comment #195 (comment). If we also exclude hash or even footer tail from the footer, then we can directly do file.write_type(footer) here.

What do you think?

Seems reasonable to me. So like a two-part footer. The top is hashed, and the bottom is not hashed. The bottom is where the hash itself lives, along with the magic number/etc. Is that right?

yhchiang-sol · 2024-03-20T17:57:13Z

Rebased to include the refactoring open-path PR.

brooksprumo · 2024-03-20T20:01:07Z

+    pub fn hash(&self) -> Hash {
+        Hash::new_from_array(self.hasher.finalize().into())
+    }


What happens if this function is called multiple times? Iow, is it safe to call finalize() or into() repeatedly? And if it's safe, is it expensive or free?

brooksprumo · 2024-03-20T20:02:21Z

    pub fn write_bytes(&mut self, bytes: &[u8]) -> IoResult<usize> {
-        self.0.write_all(bytes)?;
+        self.file.write_all(bytes)?;
+        self.hasher.update(bytes);


I think it'll be good to benchmark how much time this adds. A normal mnb slot probably has ~1000 accounts written, and each account is about 200 bytes of account data. I think a benchmark comparing with and without the hash would be good. If this hash is relatively expensive, we may need to consider an alternative impl.

brooksprumo · 2024-03-20T20:03:06Z

+    #[error("FileHashMismatch: {0} {1}")]
+    FileHashMismatch(Hash, Hash),


Can this error message be updated to indicate which one was read from the file and which one was computed?

brooksprumo · 2024-03-20T20:03:44Z

+        self.seek(0)?;
+
+        let len = self.0.metadata()?.len() as usize;
+        let hashed_len = len - std::mem::size_of::<Hash>() - FOOTER_TAIL_SIZE;


This needs to handle wrapping/overflow. Please use safe math here.

brooksprumo · 2024-03-20T20:04:44Z

+        (hash == hash_from_file)
+            .then(|| Ok(()))
+            .unwrap_or_else(|| Err(TieredStorageError::FileHashMismatch(hash, hash_from_file)))


Probably simpler to do:

Suggested change

(hash == hash_from_file)

.then(|| Ok(()))

.unwrap_or_else(|| Err(TieredStorageError::FileHashMismatch(hash, hash_from_file)))

if hash == hash_from_file {

Ok(())

} else {

Err(TieredStorageError::FileHashMismatch(hash, hash_from_file))

}

brooksprumo · 2024-03-20T20:07:46Z

-            hash: Hash::new_unique(),
            min_account_address: Pubkey::default(),
            max_account_address: Pubkey::default(),
+            hash: Hash::default(),


Up on line 111, where the struct is defined, I think we should use our own newtype for the hash. i.e.:

struct TieredStorageFooterHash(Hash)

(or some other name)

And likely derive all the same things that TieredStorageMagicNumber does (plus the static asserts).

brooksprumo · 2024-03-20T20:09:46Z

+        self.hash = file.hash();
+        file.write_pod(&self.hash)?;
+
+        file.write_pod(&self.format_version)?;
+        file.write_pod(&self.footer_size)?;


Seems reasonable to me. So like a two-part footer. The top is hashed, and the bottom is not hashed. The bottom is where the hash itself lives, along with the magic number/etc. Is that right?

yhchiang-sol · 2024-03-22T08:40:08Z

Rebase to address merge conflict.

yhchiang-sol · 2024-03-22T09:07:03Z

Converting this one to draft. Addressing comments and collecting perf numbers.

yhchiang-sol · 2024-03-26T16:14:45Z

Rebased on top of recent master to include bench result.

From my local bench runs, the cost seems high. Will try several ways and see which one leads to a better result.

yhchiang-sol · 2024-03-26T16:19:46Z

Here're the results from bigger samples. Looks like the hashing overhead is around +140% with the current implementation.

Found 4 outliers among 100 measurements (4.00%)
  1 (1.00%) low mild
  3 (3.00%) high severe
write_accounts_file/hot_storage/1000
                        time:   [473.22 µs 474.85 µs 477.72 µs]
                        thrpt:  [2.0933 Melem/s 2.1059 Melem/s 2.1132 Melem/s]
                 change:
                        time:   [+136.29% +138.10% +140.00%] (p = 0.00 < 0.05)
                        thrpt:  [-58.334% -58.001% -57.679%]
                        Performance has regressed.
Found 8 outliers among 100 measurements (8.00%)
  5 (5.00%) high mild
  3 (3.00%) high severe
write_accounts_file/hot_storage/10000
                        time:   [4.7196 ms 4.7296 ms 4.7406 ms]
                        thrpt:  [2.1094 Melem/s 2.1143 Melem/s 2.1188 Melem/s]
                 change:
                        time:   [+139.84% +141.41% +142.82%] (p = 0.00 < 0.05)
                        thrpt:  [-58.817% -58.577% -58.305%]
                        Performance has regressed.
Found 6 outliers among 100 measurements (6.00%)

steviez · 2025-04-09T14:57:10Z

Closing this PR since it is more than 1 year old

yhchiang-sol force-pushed the ts-file-hash branch 2 times, most recently from e6300f7 to ea04c44 Compare March 11, 2024 06:50

yhchiang-sol requested a review from brooksprumo March 11, 2024 07:44

brooksprumo reviewed Mar 11, 2024

View reviewed changes

yhchiang-sol marked this pull request as draft March 11, 2024 23:15

yhchiang-sol force-pushed the ts-file-hash branch 5 times, most recently from 002b063 to a73c0ea Compare March 18, 2024 21:35

yhchiang-sol marked this pull request as ready for review March 19, 2024 02:10

yhchiang-sol force-pushed the ts-file-hash branch 2 times, most recently from a584276 to bc11e3a Compare March 20, 2024 02:18

yhchiang-sol marked this pull request as draft March 20, 2024 04:34

yhchiang-sol force-pushed the ts-file-hash branch 2 times, most recently from 3275073 to c2ab194 Compare March 20, 2024 05:39

yhchiang-sol force-pushed the ts-file-hash branch from c2ab194 to f295c19 Compare March 20, 2024 17:47

yhchiang-sol marked this pull request as ready for review March 20, 2024 17:47

yhchiang-sol commented Mar 20, 2024

View reviewed changes

yhchiang-sol requested a review from brooksprumo March 20, 2024 17:57

brooksprumo reviewed Mar 20, 2024

View reviewed changes

yhchiang-sol marked this pull request as draft March 22, 2024 09:06

[TieredStorage] Enable file hash in TieredStorage

caa8af6

yhchiang-sol force-pushed the ts-file-hash branch from f295c19 to caa8af6 Compare March 26, 2024 16:08

mergify Bot added community need:merge-assist labels Mar 31, 2024

steviez closed this Apr 9, 2025

		#[error("FileHashMismatch: {0} {1}")]
		FileHashMismatch(Hash, Hash),

Conversation

yhchiang-sol commented Mar 11, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Summary of Changes

Test Plan

Uh oh!

codecov-commenter commented Mar 11, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

brooksprumo left a comment

Choose a reason for hiding this comment

Uh oh!

yhchiang-sol commented Mar 11, 2024

Uh oh!

yhchiang-sol commented Mar 20, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yhchiang-sol commented Mar 20, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yhchiang-sol commented Mar 22, 2024

Uh oh!

yhchiang-sol commented Mar 22, 2024

Uh oh!

yhchiang-sol commented Mar 26, 2024

Uh oh!

yhchiang-sol commented Mar 26, 2024

Uh oh!

steviez commented Apr 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

yhchiang-sol commented Mar 11, 2024 •

edited

Loading

codecov-commenter commented Mar 11, 2024 •

edited

Loading