Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Streaming write for PrivateFile #163

Merged
merged 46 commits into from
Feb 22, 2023
Merged

Conversation

matheus23
Copy link
Member

@matheus23 matheus23 commented Feb 8, 2023

Closes #152

This is a minimal effort PR for importing big data into WNFS without insane memory requirements by streaming it in during file creation.
The incoming stream is split into private file chunks and encrypted & serialized one-by-one. Each chunk gets written before the next chunk gets fetched, thus this should take only roughly constant memory.

In the future we should look two more things:

  1. We want to provide better APIs in PrivateDirectory::write, where currently that only takes a Vec<u8> of bytes. We probably need to figure out something akin to PrivateDirectory::open which returns a PrivateFile, which can later be used in PrivateDirectory::write (perhaps keeping the old version, too, because it's a nice shortcut for small files).
  2. The most flexible API will be something like a PrivateFile that implements AsyncRead + AsyncWrite + AsyncSeek. However, to make that performant on big files we need content-defined chunking.

appcypher and others added 30 commits January 25, 2023 23:37
It can always be derived from `revision_key`.
Storing it will only make it possible for `revision_key` and
`content_key` to get out-of-sync.
instead of an encrypted set of Cids
So:
- PrivateNodeHeader gets its own block
- PrivateFile and PrivateDirectory refer back to the header via a CID
- PrivateRef gets its own "disambiguation pointer" content_cid
- PrivateForest now resolves PrivateRefs
- PrivateRefs always refer to pre-existing content, never to "open slots"
instead of `RevisionKey` and `ContentKey`, respectively.
Also, make use of `Rc::make_mut`, accordingly.
@matheus23 matheus23 marked this pull request as ready for review February 9, 2023 11:53
@matheus23 matheus23 requested a review from a team as a code owner February 9, 2023 11:53
@matheus23
Copy link
Member Author

Flipping this to ready for review.
That said, the merge order should be:
#148
#159
Then this.

Base automatically changed from matheus23/nonnested to main February 20, 2023 16:26
@matheus23 matheus23 changed the title Streaming write for PrivateFile feat: Streaming write for PrivateFile Feb 20, 2023
wnfs/src/private/file.rs Outdated Show resolved Hide resolved
wnfs/src/private/file.rs Outdated Show resolved Hide resolved
@matheus23
Copy link
Member Author

matheus23 commented Feb 22, 2023

For the file to test in the fixture, I went with a recording of Clara Schumann's Scherzo No. 2. The recording is from Luis Sarro and I downloaded it from musopen, so should be public domain. It's a classical music piece and Clara Schumann rocks (even though we've forgotten about her nowadays and her husband took the spotlight in today's history books).

Honestly, I'm up for other ideas though! If anyone knows some cool fixture file to use that is >1MB and <10MB, let me know.

Copy link
Member

@appcypher appcypher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! 🎉

@matheus23 matheus23 merged commit 1bfe89b into main Feb 22, 2023
@matheus23 matheus23 deleted the matheus23/streaming-write branch February 22, 2023 16:14
This was referenced Feb 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Streaming Write for PrivateFile
2 participants