-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
integrate DAG store and CARv2 in deal-making #6671
Conversation
996feea
to
de13b6e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking really good so far 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
((looks like itests/fstmp889313799
got committed by accident))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to be writing CAR v2 files to the hard drive on the client side. I thought we'd discussed avoiding storing a second copy of the clients data on the HDD. @magik6k are you ok with every import for regular files on the client having a CarV2 copy on the hard drive (which will of course be 32 additional GB in many cases).
The current code jumped through a lot of hoops to avoid ever storing a CAR file on the client's machine, and always using OS-pipes when we did file i/o needed for CommP calculation. So I was wondering if the intent is in fact to no longer do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aarshkshah1992 @dirkmc I think there's a misunderstanding in the changes to client import
; this part should be mostly unchanged, as @hannahhoward and @magik6k pointed out above.
IUC, the current state of the system is as follows:
- User provides a raw file.
- Lotus instantiates a new Filestore using the multistore. The multistore combines two things: an actual blockstore (Badger in this case) and a CID => file offset/length tracker. See https://github.com/filecoin-project/go-multistore/blob/master/store.go#L33. There's special-casing for a special type of block called
PosInfo
that encodes the offset of the block within the file, and its length.- What's not clear to me is how Lotus client import leverages that special casing, because we seem to use a normal DAG service that feeds normal IPLD blocks to the blockstore. @hannahhoward -- illuminate?
- When chunking, we do not import blocks into the Badger blockstore, but somehow the PosInfo route is used, so we record CID => (offset, length) in the datastore, thus avoiding creating a copy of the file.
Later, when performing the deal, we simulate forming the CARv1 and feed it to the commP writer as a stream, to calculate the commP. But the CARv1 is never actually written on disk.
The footprint of this operation is just the cost of the CID => (offset, length) positional tracking, which likely wouldn't exceed 3MiB for a 64GiB file chunked with the 1MiB UnixFS chunking policy.
The Graphsync transfer is conducted using the positional mapping (it does not use a Badger store), which is why the original file is needed.
Conversely, this PR introduces these changes:
- User provides a raw file.
- File is chunked into a CARv2 to calculate its root.
- Another CARv2 is generated with the right root.
For a 64GiB raw file, the footprint of the new code is in the order of 128GiB, which is a regression we can't afford.
I suggest to revert the changes in the client import
process.
f648a72
to
72391cc
Compare
8c90489
to
892e146
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, awesome! Yay for this huge change. Most of my comments are suggestions.
However, I have three blocking comments:
- The use of RetrievalProviderNode as a dependency to get the IsUnsealed/UnsealSector methods is super awkward and the dependencies need untangling. I've put suggestions on how to do it.
- The IPFS blockstore integration is super awkward for retrieval, and I'm pretty sure it's broken completely for storage. We either need to fix it or work closely with partners before we ship with this feature removed. I know PowerGate is the main user, I don't know how important that is.
- Unless I am reading the code wrong, it appears that we are still writing second full copies of files out to the hard drive when we import them -- see comments on
doImport
innode/impl/client/import.go
, despite the existence of ReadWriteFilestore which certainly appears like it could be used to avoid doing these writes. This isn't blocking if someone can explain to me why I'm reading the code wrong -- it's fine if I just am not seeing it.
I wonder: Blocking issues 2 & 3 have to do with the person on the client side of deals, not the miner. What is our testing plan for working with this code with people who are making deals? (cause I know we've been focused on beta testing with miners). I realize the dealbot is testing the dealmaking, but it also has space do deal with extra file copies AND it doesn't test the IPFS blockstore functionality.
@@ -169,7 +169,7 @@ func ConfigFullNode(c interface{}) Option { | |||
If(cfg.Client.UseIpfs, | |||
Override(new(dtypes.ClientBlockstore), modules.IpfsClientBlockstore(ipfsMaddr, cfg.Client.IpfsOnlineMode)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is still here but it appears that ClientBlockstore is being ignored by the StorageClient
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nod, we can't lose this, good catch!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
node/impl/client/client.go
Outdated
|
||
carV2FilePath = resp.CarFilePath | ||
// remove the temp CARv2 file when retrieval is complete | ||
defer os.Remove(carV2FilePath) //nolint:errcheck |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I put this in the markets PR but if this code is deleting CarV2 files, it should be managing creating them to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pre-emptive approval to unblock merge while I'm on vacation. Will review tomorrow to see if there are any changes before I leave.
@raulk need to audit if blockstore provider to loader and storer in node/modules/graphsync.go are actually being used |
if err := a.imgr().AddLabel(id, "source", "import-local"); err != nil { | ||
return cid.Cid{}, err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aarshkshah1992 removing this label causes a functional regression.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
node/impl/client/client.go
Outdated
prefix, err := merkledag.PrefixForCidVersion(1) | ||
func (a *API) ClientImportLocal(ctx context.Context, r io.Reader) (cid.Cid, error) { | ||
// write payload to temp file | ||
tmpPath, err := a.imgr().NewTempFile(imports.ID(rand.Uint64())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should be using an actual ID generated by the import manager.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
node/impl/client/client.go
Outdated
nd, err := balanced.Layout(db) | ||
if err != nil { | ||
defer tmpF.Close() //nolint:errcheck | ||
if _, err := io.Copy(tmpF, r); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a performance regression. We just replaced direct UnixFS chunking into a "heavy" CAR file (containing leaf data) with two files (the temp file where we wrote the upload, and the position CAR file). I don't think that's right.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should import straight into a heavy CAR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
node/impl/client/client.go
Outdated
id := imports.ID(rand.Uint64()) | ||
tmp, err := a.imgr().NewTempFile(id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this even on the ImportMgr, if it's not managed by it nor it uses an actual ID.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
e3f4136
to
e4c4699
Compare
This commit removes badger from the deal-making processes, and moves to a new architecture with the dagstore as the cental component on the miner-side, and CARv2s on the client-side. Every deal that has been handed off to the sealing subsystem becomes a shard in the dagstore. Shards are mounted via the LotusMount, which teaches the dagstore how to load the related piece when serving retrievals. When the miner starts the Lotus for the first time with this patch, we will perform a one-time migration of all active deals into the dagstore. This is a lightweight process, and it consists simply of registering the shards in the dagstore. Shards are backed by the unsealed copy of the piece. This is currently a CARv1. However, the dagstore keeps CARv2 indices for all pieces, so when it's time to acquire a shard to serve a retrieval, the unsealed CARv1 is joined with its index (safeguarded by the dagstore), to form a read-only blockstore, thus taking the place of the monolithic badger. Data transfers have been adjusted to interface directly with CARv2 files. On inbound transfers (client retrievals, miner storage deals), we stream the received data into a CARv2 ReadWrite blockstore. On outbound transfers (client storage deals, miner retrievals), we serve the data off a CARv2 ReadOnly blockstore. Client-side imports are managed by the refactored *imports.Manager component (when not using IPFS integration). Just like it before, we use the go-filestore library to avoid duplicating the data from the original file in the resulting UnixFS DAG (concretely the leaves). However, the target of those imports are what we call "ref-CARv2s": CARv2 files placed under the `$LOTUS_PATH/imports` directory, containing the intermediate nodes in full, and the leaves as positional references to the original file on disk. Client-side retrievals are placed into CARv2 files in the location: `$LOTUS_PATH/retrievals`. A new set of `Dagstore*` JSON-RPC operations and `lotus-miner dagstore` subcommands have been introduced on the miner-side to inspect and manage the dagstore. Despite moving to a CARv2-backed system, the IPFS integration has been respected, and it continues to be possible to make storage deals with data held in an IPFS node, and to perform retrievals directly into an IPFS node. NOTE: because the "staging" and "client" Badger blockstores are no longer used, existing imports on the client will be rendered useless. On startup, Lotus will enumerate all imports and print WARN statements on the log for each import that needs to be reimported. These log lines contain these messages: - import lacks carv2 path; import will not work; please reimport - import has missing/broken carv2; please reimport At the end, we will print a "sanity check completed" message indicating the count of imports found, and how many were deemed broken. Co-authored-by: Aarsh Shah <[email protected]> Co-authored-by: Dirk McCormick <[email protected]>
e4c4699
to
9d9fd9d
Compare
) | ||
|
||
replace github.com/multiformats/go-multihash => github.com/multiformats/go-multihash v0.0.14 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please do not do this without a very good reason. If you do, leave a comment explaining why. As far as I can tell, there was no reason to do this and it makes it impossible to update other dependencies.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Post merge review pass - looks really good, only 3 nitpicks
// It returns a stream of events to report progress. | ||
DagstoreInitializeAll(ctx context.Context, params DagstoreInitializeAllParams) (<-chan DagstoreInitializeAllEvent, error) //perm:write | ||
|
||
// DagstoreGC runs garbage collection on the DAG store. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some basic description (or a docs link) on what this operation does would be great.
colors := map[string]color.Attribute{ | ||
"ShardStateAvailable": color.FgGreen, | ||
"ShardStateServing": color.FgBlue, | ||
"ShardStateErrored": color.FgRed, | ||
"ShardStateNew": color.FgYellow, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would drop the ShardState
prefix, since it provides no information to the user, and makes it slightly harder to parse visually
|
||
var dagstoreGcCmd = &cli.Command{ | ||
Name: "gc", | ||
Usage: "Garbage collect the dagstore", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like with API, some more detailed description would be really useful
(Description added by @raulk)
Technical description
This commit removes badger from the deal-making processes, and moves to a new architecture with the dagstore as the cental component on the miner-side, and CARv2s on the client-side.
Every deal that has been handed off to the sealing subsystem becomes a shard in the dagstore. Shards are mounted via the
LotusMount
, which teaches the dagstore how to load the related piece when serving retrievals.When the miner starts the Lotus for the first time with this patch, we will perform a one-time migration of all active deals into the dagstore. This is a lightweight process, and it consists simply of registering the shards in the dagstore.
Shards are backed by the unsealed copy of the piece. This is currently a CARv1. However, the dagstore keeps CARv2 indices for all pieces, so when it's time to acquire a shard to serve a retrieval, the unsealed CARv1 is joined with its index (safeguarded by the dagstore), to form a read-only blockstore, thus taking the place of the monolithic badger.
Data transfers have been adjusted to interface directly with CARv2 files. On inbound transfers (client retrievals, miner storage deals), we stream the received data into a CARv2
ReadWrite
blockstore. On outbound transfers (client storage deals, miner retrievals), we serve the data off a CARv2ReadOnly
blockstore.Client-side imports are managed by the refactored
*imports.Manager
component (when not using IPFS integration). Just like it before, we use the go-filestore library to avoid duplicating the data from the original file in the resulting UnixFS DAG (concretely the leaves). However, the target of those imports are what we call "ref-CARv2s": CARv2 files placed under the$LOTUS_PATH/imports
directory, containing the intermediate nodes in full, and the leaves as positional references to the original file on disk.Client-side retrievals are placed into CARv2 files in the location:
$LOTUS_PATH/retrievals
.A new set of
Dagstore*
JSON-RPC operations andlotus-miner dagstore
subcommands have been introduced on the miner-side to inspect and manage the dagstore.Despite moving to a CARv2-backed system, the IPFS integration has been respected, and it continues to be possible to make storage deals with data held in an IPFS node, and to perform retrievals directly into an IPFS node.
Client imports
Because the "client" Badger blockstores are no longer used, existing imports on the client will be rendered useless.
On startup, Lotus will enumerate all imports and print WARN statements on the log for each import that needs to be reimported. These log lines contain these messages:
At the end, we will print a "sanity check completed" message indicating the count of imports found, and how many were deemed broken.
About this PR
We believe the test coverage to be pretty satisfactory, with a substantial portion of the LoC diff corresponding to tests.
This commit is a squashed version of 200+ commits developed by @aarshkshah1992, @dirkmc, and @raulk over the course of months.
This contribution was thoroughly tested in the M1 milestone with minerX.2: #6852.
Read more about the motivation here:
Co-authored-by: @aarshkshah1992 @dirkmc @raulk.
TODO
Add tests for Dagstore* JSON-RPC operations.Dagstore*
JSON-RPC operations. #7138Add tests for expired deals and slashed deals (coverage for OnDealExpiredOrSlashed).Try to get rid ofStagingBlockstore
,StagingGraphsync
.Try to get rid of/client
datastore and/staging
datastore.If the above is successful, remove the datastore muxing (try to remove the datastore muxing).