Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HAMT diff #92

Closed
walkah opened this issue Nov 28, 2022 · 5 comments · Fixed by #94
Closed

HAMT diff #92

walkah opened this issue Nov 28, 2022 · 5 comments · Fixed by #94
Assignees

Comments

@walkah
Copy link

walkah commented Nov 28, 2022

Allows us to see differences between two HAMT nodes.
This allows us to see what's changed between changes in the filesystem and enables checking writes.

@cdata
Copy link

cdata commented Dec 14, 2022

I want to share a use case for this feature:

For Noosphere, we keep a simple changelog that expresses the delta changes to a HAMT in terms of add and remove operations, where add encompasses both insert and update. We use this to infer a sparse CAR that represents the blocks needed to recreate a change to the data at any given revision assuming you already have all the blocks for the previous revision. This allows us to send fairly compact bundles of blocks when sync'ing changes to our data structure.

Short of CAR Mirror existing in a form we can use, we would need some way to do this same thing with WNFS in order for it to be a viable alternative to our current approach to storing content. The ability to diff two versions of the filesystem would probably satisfy this need for us.

@matheus23
Copy link
Member

matheus23 commented Dec 14, 2022

@cdata Thanks for the input!

There's a slight difference between that version of diffing and the diffing on HAMTs we're building.
We're specifically figuring out which key/value pairs were added in a HAMT.
It looks like the diffing you're interested in (which is... pretty similar? to #87 ), is finding the actual blocks that changed between two DAG roots.
I think such operations may live on another layer in the future.
E.g. this may be a feature of blockstore implementations such as Iroh, who can use graph queries to find these diffs fast and efficiently.
Thoughts?

@cdata
Copy link

cdata commented Dec 14, 2022

We're specifically figuring out which key/value pairs were added in a HAMT.

That's exactly what we do today with the changelog, so this is the feature I'm looking for (I think).

I vaguely assumed that CAR Mirror could spare us from this book keeping, but maybe it has other uses (or maybe my assumption was wrong).

@matheus23
Copy link
Member

I vaguely assumed that CAR Mirror could spare us from this book keeping, but maybe it has other uses (or maybe my assumption was wrong).

Which book keeping are you referring to?
CAR mirror is meant to solve DAG synchronization between two peers with deduplication, and acceptable trade-offs for latency and round-trips, so yes, I guess?

@cdata
Copy link

cdata commented Dec 14, 2022

Per my original comment, this is what we do today:

For Noosphere, we keep a simple changelog that expresses the delta changes to a HAMT in terms of add and remove operations, where add encompasses both insert and update. We use this to infer a sparse CAR that represents the blocks needed to recreate a change to the data at any given revision assuming you already have all the blocks for the previous revision. This allows us to send fairly compact bundles of blocks when sync'ing changes to our data structure.

In other words, we are using a changelog to track the delta e.g., which key/value pairs were added or removed from a HAMT between any two versions.

Today this is essential to our synchronization strategy because we don't have a sophisticated solution to synchronize blocks (like CAR Mirror), and instead rely on these HAMT deltas to decide which bits of user content need to be transferred given a range of versions of our data structure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants