Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: dag import --stats (#8237) #8237

Merged
merged 10 commits into from
Sep 23, 2021
36 changes: 31 additions & 5 deletions core/commands/dag/dag.go
Original file line number Diff line number Diff line change
Expand Up @@ -16,9 +16,10 @@ import (
)

const (
pinRootsOptionName = "pin-roots"
progressOptionName = "progress"
silentOptionName = "silent"
pinRootsOptionName = "pin-roots"
statsOptionName = "stats"
)

// DagCmd provides a subset of commands for interacting with ipld dag objects
Expand Down Expand Up @@ -53,9 +54,15 @@ type ResolveOutput struct {
RemPath string
}

type CarImportStats struct {
BlockCount uint64
PayloadBytesCount uint64
Copy link
Member

@lidel lidel Sep 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Payload can be confusing here: is this raw blocks (CAR-metadata) or actual data in each block (CAR-metadata-dagmetadata)?

Perhaps renaming it to BlockBytesCount and setting this to sum of nd.Stat().BlockSize() is a way to remove confusion while keeping this useful no matter what codecs are used inside of blocks?

Copy link
Contributor

@gammazero gammazero Sep 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed PayloadBytesCount to BlockBytesCount

It happens that nd.Size() and nd.Stat().BlockSize return the same value, so I think it is better to use nd.Size(), given the comment here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm.. if they are the same, it makes sense, but something feels off when I compare Size imported from CAR with value reported by ipfs dag stat:

$ ipfs dag stat bafybeihcyruaeza7uyjd6ugicbcrqumejf6uf353e5etdkhotqffwtguva                                        
Size: 27676801, NumBlocks: 383

$ ipfs dag export bafybeihcyruaeza7uyjd6ugicbcrqumejf6uf353e5etdkhotqffwtguva > test.car                           
 0s  26.41 MiB / ? [--------------------------------------------------------------------------------=-----------------------] 390.25 MiB/s 0s
 
$ ipfs dag import --stats test.car
Pinned root	bafybeihcyruaeza7uyjd6ugicbcrqumejf6uf353e5etdkhotqffwtguva	success
Imported 383 blocks (125832269 bytes)

125832269 bytes is ~125 MB which is way more than 26MB

Copy link
Contributor

@gammazero gammazero Sep 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lidel What I did previously (using nd.Size()) worked for the car files in sharness/t0054-dag-car-import-export-data/ but does not work for your example. Using nd.Stat().DataSize + nd.Stat().LinksSize works for your example, but not for the test cars/dags. The test cars/dags have stats with all zeros, for almost all blocks.

It appears that nd.Size() returns nd.Stat().CumulativeSize if a block has stat values. Othersize, nd.Size() is set to len(nd.RawData()). This makes both nd.Size() and nd.Stats() completely unreliable across different dags.

Apparently, the way to get a reliable size is to always use len(nd.RawData()). So, that is what the latest change does.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DataSize and LinksSize sounds like a dag-pb thing, which probably won't be represented in the fixture files.

What stat are we actually after here, @ribasushi what's your expectation of what this sizing is going to report? I would think that it's the size of the output, which includes CAR header, CID lengths and even varint section size prefixes. But I could find that by measuring the size of the output myself, so the utility doesn't seem great. But what is the utility of reporting just the block sizes? What is the useful for?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would expect the ipfs dag stat number. It is useful in terms of "this is the amount of IPLD-data these blocks hold"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @ribasushi, len(nd.RawData()) is probably the one then, but in this case maybe just use len(block.RawData()) then you get to use the block as it comes out of the CAR rather than whatever happens to it through a Decode cycle (probably the same, but seems safer to use the one closer to the original)

}

// CarImportOutput is the output type of the 'dag import' commands
type CarImportOutput struct {
Root RootMeta
Root *RootMeta `json:",omitempty"`
Stats *CarImportStats `json:",omitempty"`
}

// RootMeta is the metadata for a root pinning response
Expand Down Expand Up @@ -160,8 +167,10 @@ var DagResolveCmd = &cmds.Command{
}

type importResult struct {
roots map[cid.Cid]struct{}
err error
blockCount uint64
payloadBytesCount uint64
roots map[cid.Cid]struct{}
err error
}

// DagImportCmd is a command for importing a car to ipfs
Expand Down Expand Up @@ -193,8 +202,9 @@ Maximum supported CAR version: 1
cmds.FileArg("path", true, true, "The path of a .car file.").EnableStdin(),
},
Options: []cmds.Option{
cmds.BoolOption(silentOptionName, "No output."),
cmds.BoolOption(pinRootsOptionName, "Pin optional roots listed in the .car headers after importing.").WithDefault(true),
cmds.BoolOption(silentOptionName, "No output."),
cmds.BoolOption(statsOptionName, "Output stats."),
},
Type: CarImportOutput{},
Run: dagImport,
Expand All @@ -206,6 +216,22 @@ Maximum supported CAR version: 1
return nil
}

// event should have only one of `Root` or `Stats` set, not both
if event.Root == nil {
if event.Stats == nil {
return fmt.Errorf("Unexpected message from DAG import")
}
stats, _ := req.Options[statsOptionName].(bool)
if stats {
fmt.Fprintf(w, "Imported %d blocks (%d bytes)\n", event.Stats.BlockCount, event.Stats.PayloadBytesCount)
}
return nil
}

if event.Stats != nil {
return fmt.Errorf("Unexpected message from DAG import")
}

enc, err := cmdenv.GetLowLevelCidEncoder(req)
if err != nil {
return err
Expand Down
24 changes: 22 additions & 2 deletions core/commands/dag/import.go
Original file line number Diff line number Diff line change
Expand Up @@ -101,7 +101,7 @@ func dagImport(req *cmds.Request, res cmds.ResponseEmitter, env cmds.Environment
failedPins++
}

if err := res.Emit(&CarImportOutput{Root: ret}); err != nil {
if err := res.Emit(&CarImportOutput{Root: &ret}); err != nil {
return err
}
}
Expand All @@ -115,6 +115,19 @@ func dagImport(req *cmds.Request, res cmds.ResponseEmitter, env cmds.Environment
}
}

stats, _ := req.Options[statsOptionName].(bool)
if stats {
err = res.Emit(&CarImportOutput{
Stats: &CarImportStats{
BlockCount: done.blockCount,
PayloadBytesCount: done.payloadBytesCount,
},
})
if err != nil {
return err
}
}

return nil
}

Expand All @@ -126,6 +139,7 @@ func importWorker(req *cmds.Request, re cmds.ResponseEmitter, api iface.CoreAPI,
batch := ipld.NewBatch(req.Context, api.Dag())

roots := make(map[cid.Cid]struct{})
var blockCount, payloadBytesCount uint64

it := req.Files.Entries()
for it.Next() {
Expand Down Expand Up @@ -176,6 +190,9 @@ func importWorker(req *cmds.Request, re cmds.ResponseEmitter, api iface.CoreAPI,
if err := batch.Add(req.Context, nd); err != nil {
return err
}
blockCount++
lidel marked this conversation as resolved.
Show resolved Hide resolved
ndSize, _ := nd.Size()
payloadBytesCount += ndSize
}

return nil
Expand All @@ -197,5 +214,8 @@ func importWorker(req *cmds.Request, re cmds.ResponseEmitter, api iface.CoreAPI,
return
}

ret <- importResult{roots: roots}
ret <- importResult{
blockCount: blockCount,
payloadBytesCount: payloadBytesCount,
roots: roots}
}
2 changes: 1 addition & 1 deletion test/sharness/lib/test-lib.sh
Original file line number Diff line number Diff line change
Expand Up @@ -269,7 +269,7 @@ test_launch_ipfs_daemon() {

# wait for api file to show up
test_expect_success "api file shows up" '
test_wait_for_file 50 100ms "$IPFS_PATH/api"
test_wait_for_file 50 200ms "$IPFS_PATH/api"
'

test_set_address_vars actual_daemon
Expand Down
2 changes: 1 addition & 1 deletion test/sharness/t0041-ping.sh
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ test_expect_success "test ping 0" '
'

test_expect_success "test ping offline" '
iptb stop 1 &&
iptb stop 1 && sleep 2 &&
! ipfsi 0 ping -n2 -- "$PEERID_1"
'

Expand Down
20 changes: 13 additions & 7 deletions test/sharness/t0054-dag-car-import-export.sh
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ do_import() {
while [[ -e spin.gc ]]; do ipfsi "$node" repo gc &>/dev/null; done &
while [[ -e spin.gc ]]; do ipfsi "$node" repo gc &>/dev/null; done &

ipfsi "$node" dag import "$@" 2>&1 && ipfsi "$node" repo verify &>/dev/null
ipfsi "$node" dag import --stats "$@" 2>&1 && ipfsi "$node" repo verify &>/dev/null
result=$?

rm -f spin.gc &>/dev/null
Expand All @@ -56,6 +56,7 @@ run_online_imp_exp_tests() {
reset_blockstore 1

cat > basic_import_expected <<EOE
Imported 1198 blocks (468513 bytes)
Pinned root${tab}bafkqaaa${tab}success
Pinned root${tab}bafy2bzaceaxm23epjsmh75yvzcecsrbavlmkcxnva66bkdebdcnyw3bjrc74u${tab}success
Pinned root${tab}bafy2bzaced4ueelaegfs5fqu4tzsh6ywbbpfk3cxppupmxfdhbpbhzawfw5oy${tab}success
Expand All @@ -64,6 +65,7 @@ EOE
cat >naked_root_import_json_expected <<EOE
{"Root":{"Cid":{"/":"bafy2bzaceaxm23epjsmh75yvzcecsrbavlmkcxnva66bkdebdcnyw3bjrc74u"},"PinErrorMsg":""}}
{"Root":{"Cid":{"/":"bafy2bzaced4ueelaegfs5fqu4tzsh6ywbbpfk3cxppupmxfdhbpbhzawfw5oy"},"PinErrorMsg":""}}
{"Stats":{"BlockCount":0,"PayloadBytesCount":0}}
EOE


Expand Down Expand Up @@ -98,7 +100,7 @@ EOE
'

test_expect_success "import/pin naked roots only, relying on local blockstore having all the data" '
ipfsi 1 dag import --enc=json ../t0054-dag-car-import-export-data/combined_naked_roots_genesis_and_128.car \
ipfsi 1 dag import --stats --enc=json ../t0054-dag-car-import-export-data/combined_naked_roots_genesis_and_128.car \
> naked_import_result_json_actual
'

Expand Down Expand Up @@ -173,28 +175,32 @@ cat >multiroot_import_json_expected <<EOE
{"Root":{"Cid":{"/":"bafy2bzaceb55n7uxyfaelplulk3ev2xz7gnq6crncf3ahnvu46hqqmpucizcw"},"PinErrorMsg":""}}
{"Root":{"Cid":{"/":"bafy2bzacebedrc4n2ac6cqdkhs7lmj5e4xiif3gu7nmoborihajxn3fav3vdq"},"PinErrorMsg":""}}
{"Root":{"Cid":{"/":"bafy2bzacede2hsme6hparlbr4g2x6pylj43olp4uihwjq3plqdjyrdhrv7cp4"},"PinErrorMsg":""}}
{"Stats":{"BlockCount":2825,"PayloadBytesCount":1339709}}
EOE
test_expect_success "multiroot import works" '
ipfs dag import --enc=json ../t0054-dag-car-import-export-data/lotus_testnet_export_256_multiroot.car > multiroot_import_json_actual
ipfs dag import --stats --enc=json ../t0054-dag-car-import-export-data/lotus_testnet_export_256_multiroot.car > multiroot_import_json_actual
'
test_expect_success "multiroot import expected output" '
test_cmp_sorted multiroot_import_json_expected multiroot_import_json_actual
'


cat >pin_import_expected << EOE
{"Stats":{"BlockCount":1198,"PayloadBytesCount":468513}}
EOE
test_expect_success "pin-less import works" '
ipfs dag import --enc=json --pin-roots=false \
ipfs dag import --stats --enc=json --pin-roots=false \
../t0054-dag-car-import-export-data/lotus_devnet_genesis.car \
../t0054-dag-car-import-export-data/lotus_testnet_export_128.car \
> no-pin_import_actual
'
test_expect_success "expected silence on --pin-roots=false" '
test_cmp /dev/null no-pin_import_actual
test_expect_success "expected no pins on --pin-roots=false" '
test_cmp pin_import_expected no-pin_import_actual
'


test_expect_success "naked root import works" '
ipfs dag import --enc=json ../t0054-dag-car-import-export-data/combined_naked_roots_genesis_and_128.car \
ipfs dag import --stats --enc=json ../t0054-dag-car-import-export-data/combined_naked_roots_genesis_and_128.car \
> naked_root_import_json_actual
'
test_expect_success "naked root import expected output" '
Expand Down