Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not throw when Tsize does not equal real DAG size #335

Closed
alanshaw opened this issue May 22, 2023 · 6 comments
Closed

Do not throw when Tsize does not equal real DAG size #335

alanshaw opened this issue May 22, 2023 · 6 comments
Labels
need/triage Needs initial labeling and prioritization

Comments

@alanshaw
Copy link
Member

#300 started throwing when the exported DAG size did not match the value encoded in the root node.

I've been informed that "there are multiple implementations out there that produce "whatever I feel like" Tsize, if any" and the WIP spec states that this value is optional...as well as:

There is no failure mode known for this field, so your implementation should be able to decode nodes where this field is wrong (not the value you expect), partially or completely missing.
https://github.com/ipfs/specs/blob/6c250e676ac2e3d19693e03404c698d56345aad7/UNIXFS.md#tsize--dagsize

@alanshaw alanshaw added the need/triage Needs initial labeling and prioritization label May 22, 2023
@Jorropo
Copy link

Jorropo commented May 22, 2023

Note, the spec might not be correct when it says that this is optional, I've since learned that some implementations previous to the spec assume this is not optional.
However it is correct when it says that this field is useless and inconsistent.

@Jorropo
Copy link

Jorropo commented May 22, 2023

There is no failure mode known for this field, so your implementation should be able to decode nodes where this field is wrong (not the value you expect), partially or completely missing. This also allows smarter encoder to give a more accurate picture (for example don't count duplicate blocks, ...).

@rvagg
Copy link
Member

rvagg commented May 23, 2023

Nice timing, I was only outlining the potential problems with baking in assumptions about this in ipfs/specs#402 (review), in that case it's about "byte range" requests which are all going to make assumptions about Tsize being according to spec. The ability to produce dodgy values makes this slightly problematic for "trustlessness".

Not advocating either way for this particular issue; the strictness side of me appreciates being forceful with bad data and I've done this in plenty of places (like dag-pb as we were discussing a couple of weeks ago on Slack). But if we have evidence of a nontrivial amount of data that's failing this then we might be stuck with needing to loosen our expectations, or at least providing modes for lax parsing where the user understands the implications and can opt in to alternative behaviour. We have lots of precedents for that (e.g. base64 padded & unpadded handling in dag-json cause of an early mistake in one implementation that got widespread use).

@aschmahmann
Copy link

aschmahmann commented May 23, 2023

@rvagg as mentioned in the other issue (ipfs/specs#402 (comment)) TSize is not about byte ranges and no reasonable implementation would use dag-pb TSize for UnixFS file byte ranges when they have fields for this in the UnixFS data.

Some implementations have ended up with bugs here due to misunderstandings though (e.g. ipfs/go-unixfsnode#34). While it'd be nice for implementations to test against existing fixtures and implementations to build confidence in the implementations and their compatibility, having a clean UnixFS spec to go with fixtures would be much better. However, that's not quite what this issue is about so that'll have to be tackled elsewhere.

@achingbrain
Copy link
Member

Tsize isn't consulted in #300 - the expected size is inferred from the UnixFS file byte ranges stored in the UnixFS metadata.

If we don't throw with under/over reads, the user may get more or less data than they expect which is probably a bad thing. This normally only manifests itself when someone has tried to create their own DAG and got the layout wrong.

Tsize is written by these modules but not read at all really, because it's unreliable and a bit redundant.

@achingbrain
Copy link
Member

Closing because we do not throw when Tsize is invalid, only when the UnixFS layout data is invalid which leads to the file data appearing to be corrupted when read.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
need/triage Needs initial labeling and prioritization
Projects
None yet
Development

No branches or pull requests

5 participants