Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Switch from SHA3-256 to BLAKE3-256 #306

Merged
merged 17 commits into from
Jul 21, 2023
Merged

Conversation

matheus23
Copy link
Member

@matheus23 matheus23 commented Jul 10, 2023

Summary

  • Switched any sha3 dependencies out for blake3 with the traits-preview feature. That feature made the switch pretty seamless.
  • Switched prime_hash impl to using little-endian big integer encoding, otherwise some tests would run out of stack space. BigUInt::from_bytes_be actually simply copies over the bytes into a new array that it reverses and runs BigUint::from_bytes_le on, which causes quite a lot of allocation.
  • Added a test to make sure future changes to prime_hash don't unintentionally change hash results.
  • Adjusted the way private file blocks are referenced to make it easier to construct the names
  • Added base_name to ExternalFileContent. This allows users with snapshot access only to derive the necessary block labels (they wouldn't be able to access the header.name).
  • Did some minor changes to keys.rs, making the API slimmer (the SnapshotKey and TemporalKey newtypes are now private), and removing From<> impls in favor of documented, (hopefully hard to misuse?) special-purpose functions.

Still having the "tests run out of stack" issue every now and then. After a couple of complicated debugging sessions with lldb, it turns out the issue is mostly having functions with lots of await points (e.g. unit tests!). All of the Futures are allocated on the stack, blowing it. There's no good solution that I'm aware of, except for RUST_MIN_STACK=2500000 cargo test.

@matheus23 matheus23 self-assigned this Jul 10, 2023
@matheus23 matheus23 requested a review from a team as a code owner July 10, 2023 19:28
@codecov
Copy link

codecov bot commented Jul 10, 2023

Codecov Report

Merging #306 (05529f5) into main (c17f6bb) will decrease coverage by 0.04%.
The diff coverage is 86.07%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #306      +/-   ##
==========================================
- Coverage   56.25%   56.22%   -0.04%     
==========================================
  Files          43       43              
  Lines        3180     3189       +9     
  Branches      769      770       +1     
==========================================
+ Hits         1789     1793       +4     
- Misses        900      904       +4     
- Partials      491      492       +1     
Impacted Files Coverage Δ
wnfs-hamt/src/diff.rs 19.29% <ø> (ø)
wnfs-hamt/src/hamt.rs 27.50% <ø> (ø)
wnfs-hamt/src/node.rs 46.66% <ø> (ø)
wnfs-hamt/src/pointer.rs 6.25% <ø> (ø)
wnfs/examples/mnemonic_based.rs 0.00% <0.00%> (ø)
wnfs/src/private/forest/traits.rs 100.00% <ø> (ø)
wnfs/src/private/keys/access.rs 42.10% <0.00%> (-4.96%) ⬇️
wnfs/src/private/node/keys.rs 78.94% <85.71%> (-3.11%) ⬇️
wnfs/src/private/file.rs 78.37% <88.00%> (-0.32%) ⬇️
wnfs-nameaccumulator/src/name.rs 82.51% <91.66%> (-0.29%) ⬇️
... and 9 more

@expede
Copy link
Member

expede commented Jul 10, 2023

I haven't actually like plotted the benchmarks or anything, but just looking at the raw numbers it looks like this PR mostly runs as-fast-or-faster than the previous PR 🙌

@expede
Copy link
Member

expede commented Jul 10, 2023

Actually, do we have a tool that plots the benches set up? Save me from tabbing back and forth between the plaintext results 😛

@matheus23
Copy link
Member Author

We have this: https://wnfs-wg.github.io/rs-wnfs/dev/bench/

But I don't think we have anything like that for branches.

@expede
Copy link
Member

expede commented Jul 10, 2023

Oh these benchmarks are for native code though, right? Do we have Wasm benches?

@matheus23
Copy link
Member Author

No WASM benches currently :/

@expede
Copy link
Member

expede commented Jul 10, 2023

(I really dislike the lack of threading in main thread PR comments)

We should at minimum run some manual tests for this PR I think, especially given that GH Issue on the BLAKE3 repo. Setting up Wasm benches would be nice in general (we have wasm-js tests, so maybe extend that for rough microbenches?)

@expede
Copy link
Member

expede commented Jul 11, 2023

Was chatting with @zeeshanlakhani about this, and he says that the right way to bench Wasm is via JS wrappers. It would be nice to have them for all targets (so... two for now, but who knows down the road)

@matheus23 matheus23 mentioned this pull request Jul 11, 2023
6 tasks
@matheus23
Copy link
Member Author

matheus23 commented Jul 12, 2023

Remaining TODOs:

@matheus23
Copy link
Member Author

matheus23 commented Jul 19, 2023

I set up this very crude benchmark:

Code
import { PrivateForest, PrivateDirectory } from "./pkg/wnfs_wasm.js";
import { CID } from "multiformats/cid";
import { sha256 } from "multiformats/hashes/sha2";
import { webcrypto } from "one-webcrypto";

class MemoryBlockStore {
    /** Creates a new in-memory block store. */
    constructor() {
        this.store = new Map();
    }

    /** Stores an array of bytes in the block store. */
    async getBlock(cid) {
        const decoded_cid = CID.decode(cid);
        return this.store.get(decoded_cid.toString());
    }

    /** Retrieves an array of bytes from the block store with given CID. */
    async putBlock(bytes, codec) {
        const hash = await sha256.digest(bytes);
        const cid = CID.create(1, codec, hash);
        this.store.set(cid.toString(), bytes);
        return cid.bytes;
    }
}

const rng = {
    /** Returns random bytes of specified length */
    randomBytes(count) {
        const array = new Uint8Array(count);
        webcrypto.getRandomValues(array);
        return array;
    }
}


async function test() {
    const something = new Uint8Array(1024 * 1024);

    const store = new MemoryBlockStore();
    const forest = new PrivateForest(rng);
    const dir = new PrivateDirectory(forest.emptyName(), new Date(), rng);

    const reps = 100;
    const before = performance.now();
    
    for (let i = 0; i < reps; i++) {
        await dir.write(["Hello", "World", "file"], false, something, new Date(), forest, store, rng);
        await dir.store(forest, store, rng);
    }

    const time = performance.now() - before;
    console.log("Time per write & store: ", time / reps);


    await forest.store(store);
}

test();

And the variance between runs is bigger than the variance between main and this PR.
So no performance regression @expede :)

EDIT: LOL should've added the benchmark results:

Two runs on c17f6bb (main):

$ node test.js 
Time per write & store:  351.0440690800175

$ node test.js 
Time per write & store:  338.9089437599853

Two runs on 05529f5 (this PR):

$ node test.js 
Time per write & store:  338.4276692800038

$ node test.js 
Time per write & store:  315.1771848800033

@matheus23
Copy link
Member Author

matheus23 commented Jul 19, 2023

@appcypher This is ready for review :)

Copy link
Member

@appcypher appcypher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants