Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a debugging form for car files. #341

Merged
merged 8 commits into from
Nov 18, 2022
Merged

Add a debugging form for car files. #341

merged 8 commits into from
Nov 18, 2022

Conversation

willscott
Copy link
Member

This change adds two new sub-commands to the car CLI

car debug file.car

creates a patch-file-compatible representation of the content of the car file. Blocks will be represented in dag-json pretty-printed form.

car compile file.patch

will do the inverse process of building a car file from a debug patch file. CIDs will be re-compiled based on the contents of blocks, with links in parent blocks updated to point to the compiled values.

an example debug patch of the car used in the testscript test fixture would be:

car compile --v2 small.car
root bafybeidx5vxxny6ca3mgs5d6wy5ubwcibpirctmktpkvfk4io34i2ww2hy
for raw: bytes are 68656c6c6f20776f726c640a
--- bafkreifjjcie6lypi6ny7amxnfftagclbuxndqonfipmb64f2km2devei4
+++ raw bafkreifjjcie6lypi6ny7amxnfftagclbuxndqonfipmb64f2km2devei4
@@ -0,1 +0,1 @@
hello world

--- bafybeidx5vxxny6ca3mgs5d6wy5ubwcibpirctmktpkvfk4io34i2ww2hy
+++ json (no-end-cr) bafybeidx5vxxny6ca3mgs5d6wy5ubwcibpirctmktpkvfk4io34i2ww2hy
@@ -0,15 +0,15 @@
{
  "Data": {
    "/": {
      "bytes": "CAE"
    }
  },
  "Links": [
    {
      "Hash": {
        "/": "bafkreifjjcie6lypi6ny7amxnfftagclbuxndqonfipmb64f2km2devei4"
      },
      "Name": "foo.txt",
      "Tsize": 12
    }
  ]
}

This change adds two new sub-commands to the car CLI

car debug file.car

creates a patch-file-compatible representation of the content of the car file. Blocks will be represented in dag-json pretty-printed form.

car compile file.patch

will do the inverse process of building a car file from a debug patch file. CIDs will be re-compiled based on the contents of blocks,
with links in parent blocks updated to point to the compiled values.
@b5
Copy link

b5 commented Nov 7, 2022

Fully support having this!

@rvagg
Copy link
Member

rvagg commented Nov 14, 2022

windows failures, probably crlf related?

also this is weird on macos https://github.com/ipld/go-car/actions/runs/3410523283/jobs/5673582480, maybe just a flake since it's passing on the other 3 macos runners but it seems cmd related

@willscott
Copy link
Member Author

There's some sort of flakiness with testscript in general.

@rvagg are you okay with design / the proposed debug patch format modulo the test?


outStream.WriteString("car compile ")
if rd.Version == 2 {
outStream.WriteString("--v2 ")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this is the only hint that it once was a v2 and there's not space to say anything else about it? I guess mostly we care about stability of v1 forms and the v2 is just a convenience wrapper, but if we went ahead with additional features, such as the messaging capability in #322, then where would we put these things? Would it be hard to extend this format to include those things, and could we do it in a non-breaking way perhaps?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is mirroring git patch` where the first line mirrors the command used to generate the patch.

Here we use v2 to indicate if the original car was a v2 or not, and can use that as a default to re-build the same car format if it is not specified explicitly on the command line when re-compiling

cmd/car/compile.go Outdated Show resolved Hide resolved
cmd/car/compile.go Outdated Show resolved Hide resolved
cmd/car/compile.go Outdated Show resolved Hide resolved
if err != nil {
return err
}
if strings.HasPrefix(string(rootLine), "root ") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, so this is is not exhaustive, if it doesn't match root then it continues to loop and drop rootLine. So is this where we get potential backward compatibility of additional v2 features that we can insert with ---, and also the --foobar arguments in the header?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(continuing my thought below about v2 features, where I started my comments on debug first before moving up here to compile)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, other meta info would go as lines in this section.

cmd/car/compile.go Outdated Show resolved Hide resolved
cmd/car/compile.go Outdated Show resolved Hide resolved
cmd/car/compile.go Outdated Show resolved Hide resolved

//fmt.Printf("structuring as tree...\n")
// structure as a tree
childMap := make(map[cid.Cid][]cid.Cid)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whoa, so I think this whole next section (x2) exists because we're not confident that the resulting CIDs will match the input CIDs, so we're doing a search, reconstruct, replace operation on them all, is that right?

  1. why do we not have confidence they're going to reconstruct byte-perfect, shouldn't that be reasonable? There might be a CIDv0 CIDv1 difference but we could easily do that check -- if expected CID is v0 then downcast actual CID and compare.
  2. why don't we just error if the resulting CID of the reconstructed block doesn't match the expected?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thought about this after writing and I realise that it's probably because we anticipate some input to come in badly encoded forms so round-tripping is going to result in mismatched CIDs. So I guess that's why this is here.

This does seem like a lot of effort to go to though; and it's also a little error-prone just replacing the CIDs as strings. That won't necessarily always find the actual links, they could just be included as text and it assumes that they want them to be changed. A find and replace should probably be at least looking for "/": "...."\n, but even that's not necessarily accurate either. A more complete approach would be to walk the instantiated data model form and change the links out, but I guess that's even more code complication! Then there's the CIDv0 vs CIDv1 thing, what if the original wants to be in CIDv0 but we up-convert them to CIDv1?

Lots of effort to do it the right way, but it makes me question whether doing it at all is worth it.

Your call I guess, but maybe add some comments in here about what it's doing so the first person to encounter a bug with this knows why and can choose whether to fix it. It wasn't clear to me what it was doing until I walked through the whole two blocks.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is for the 'i want to mutate the graph and re-build a valid tree' case.
if i change the json in the patch, then that block will have a new cid when it's hashed.
that hash change needs to propagate back up the DAG.

data model still won't get all edge cases - what if it's a different codec for the same MH? what if the link is encoded in a block that's raw or that we can't parse?

this is the first pass that works reasonably well for the json/cbor cases that i've attempted to manually edit.

i am unconvinced it's worth the time to do much more complex work and still not do something perfect vs where it is currently, which supports a pretty valuable use case already.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh right, tinker with the content and regenerate the graph; fair enough, I can see that being useful - it does seem like something that needs clear caveats in comments though!

Copy link
Member

@rvagg rvagg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably OK, I'd like to know what itch was being scratched here? it's certainly a nice utility to peek inside a CAR and perhaps even a nice way to peek inside IPLD blocks in general without having to code something up (you could even ipfs export .. | car debug - to get pretty-printed output that you can't get from ipfs dag get --output-codec=dagjson, but that's quite a hack!).

Aside from comments inline, the main concern I have is the ability to corrupt the format if you include a raw block that is diff-like. You could even end up doing that with this tool - make a .patch from a CAR, bundle that up as unixfs and include it in a CAR and make a .patch from that and 💥 it will break because --- . But I think you could easily address that by changing isPrintable() to also check for ^--- and false out if there's one in there.

@willscott
Copy link
Member Author

This is probably OK, I'd like to know what itch was being scratched here? it's certainly a nice utility to peek inside a CAR and perhaps even a nice way to peek inside IPLD blocks in general without having to code something up (you could even ipfs export .. | car debug - to get pretty-printed output that you can't get from ipfs dag get --output-codec=dagjson, but that's quite a hack!).

for example:
https://twitter.com/pfrazee/status/1589747431071428609
https://gitlab.com/bnewbold/adenosine/-/blob/main/notes/ipld_car_explore.md

* add check for bytes not containing end-of-patch sequence
@willscott
Copy link
Member Author

made the raw blocks a bit more cautious per the edge case you pointed out

}, outStream); err != nil {
return err
}
for c, blk := range outBlocks {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the failing tests are because of this, and the same block below for v1 -- the blocks are ending up shuffled thanks to iterating over the go map[] .. we're going to need to keep a slice of CIDs to iterate over and then rewrite those when you do the reconstruction.

@willscott
Copy link
Member Author

good catch, @rvagg - made the output match the order of initial blocks.

@willscott willscott requested a review from rvagg November 17, 2022 09:44
cmd/car/compile.go Outdated Show resolved Hide resolved
cmd/car/compile.go Outdated Show resolved Hide resolved
Copy link
Member

@rvagg rvagg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice green ticks

I still think it needs notes in the DAG rewrite bit, so I've added my suggestions for what that might look like.

@willscott willscott merged commit dab0fd5 into master Nov 18, 2022
@willscott willscott deleted the feat/debug-compile branch November 18, 2022 11:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants