-
Notifications
You must be signed in to change notification settings - Fork 165
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FIP-0071: Deterministic State Access (IPLD Reachability) #763
FIP-0071: Deterministic State Access (IPLD Reachability) #763
Conversation
FIPS/fip-XXXX.md
Outdated
4. If the major type is 0 (integer), 1 (negative integer), or 7 (special), we continue. | ||
5. If the major type is 2 (byte string) or 3 (string), we seek forward "immediate value" bytes and continue. We do not validate the string's encoding. | ||
6. If the major type is 4 (array) we add the immediate value to the expected number of fields, and continue. | ||
7. If the major type is 5 (map) we add two times the immediate value to the expected number of fields, and continue. We do not or restrict the allowed key/value types, nor do we validate the order of keys, nor do we check for duplicates. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC IPLD CBOR has a hard constraint that all map keys are strings. Is the idea that any block already linked will be valid IPLD CBOR so there's no special care needed to ensure that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC IPLD CBOR has a hard constraint that all map keys are strings. Is the idea that any block already linked will be valid IPLD CBOR so there's no special care needed to ensure that?
No, the idea is that the FVM doesn't really care one way or the other. Applications may care, but then its their job to check. The FVM checks the property it cares about.
(small rant)
I'd like to eventually move towards a world were we split the IPLD model into:
- A "block" model. I.e., blocks, CIDs, and blocks that link to other blocks. This is the level of bitswap, blockstores, garbage collection, etc.
- A "path" model that simply defines how to path through blocks. You need this for basic pathing, filesystems, etc.
- A "data" model. This is the part that starts caring about map key types, ordering of map keys, etc. You need this if you want to write general-purpose algorithms over abstract data (e.g., graphsync).
One of the biggest challenges in IPLD has been agreeing on how formats map to the data model, especially as not all formats will nicely "fit" into the data model. My goal here is to say that fitting into the data model simply isn't necessary.
Of course, maybe we should use a codec that's not "dag cbor", but I don't think that's really necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll add some of this to the design tradeoffs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you explain a bit more of the details and motivation behind the handle
concept? What does this level of indirection help us with as opposed to just using cids directly as handles and merging block_create
with block_link
and block_open
with block_read
?
Are handles uniquely determined by cids or the same cid could have multiple handles within the same actor? Are handles for the same cid different in different actors? Do handles contain information about which actor's reachable set they originate from? What do handles look like in memory/serializaed as method parameters in send:send?
FIPS/fip-XXXX.md
Outdated
1. Take advantage of our existing Wasm gas accounting rather than manually charging for gas based on the CBOR's structure. | ||
2. Ensure that all implementations used the exact same wasm parsing and validation logic. | ||
|
||
However, this would have increased implementation complexity and introduced additional runtime costs (switching in and out of Wasm isn't free). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be very nice to use the same gas model to account for this. Do you have a quantitative estimate of how much worse this makes performance? For block_create it seems like we'd need no context switching since we already have the data for processing in WASM memory. Extra context switching shows up during block_open because only the handle need be sent back to WASM from syscall? So we'd pay for two writes to wasm memory once in block_open once in block_read?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, we do need to copy and jump every time because we can't trust the user's wasm instance. Basically:
- Start a new wasm instance for processing the block. We could try to reuse instances here, but that may lead to "weird things". E.g., different costs depending on whether or not memory is already pre-allocated.
- Jump into the wasm instance to allocate some memory for the block.
- Jump back into the system to copy the block into the wasm instance's memory.
- Jump back into the wasm instance to parse the block.
- Finally, return to the system.
- And then parse the CIDs emitted by the wasm instance.
But the main part really is complexity:
- Doing this in wasm saves us from having to define 2 gas charges.
- Doing this natively saves us from having to manage an additional wasm module, spin up an instance, define the syscalls for crossing back and forth, etc.
Basically, the current approach is both simpler and faster.
I've added a link to https://github.com/filecoin-project/FIPs/blob/master/FIPS/fip-0030.md#ipld-memory-model to the background section. Those syscalls aren't being introduced in this FIP (although I'm happy to discuss them here). Basically, it comes down to the fact that:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll be happy to give this an editorial 👍 when the abstract is filled in.
I note that test cases will also need to be specified before it can move to last call.
Ok, this is now ABG&T (all but gas and tests). |
Co-authored-by: Alex North <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh had some comments pending for 2 weeks without noticing, sharing now
FIPS/fip-XXXX.md
Outdated
1. `ipld::block_create(codec, data) -> handle` registers a new IPLD block with the FVM, returning a block "handle" (similar to a file descriptor). | ||
2. `ipld::block_open(cid) -> handle` "opens" the IPLD block referenced by the passed CID, returning a handle to the block. | ||
3. `ipld::block_read(handle, offset, length) -> data` reads data from an open block into the actor's memory. | ||
4. `ipld::block_link(handle, multihash_code, multihash_length)` creates a CID for the block referenced by the given block "handle", using the specified multihash code and length (currently limited to blake2b-256). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think those signatures are missing some params, for example shouldn't block_read
be:
ipld::block_read(handle, offset, obuf, max_length)
reads block identified from handle into obuff which is in actors' memory
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically, but I'm trying to convey the concept here more than the literal API (that's specified in the other FIP linked at the top of the spec). I'll document that.
FIPS/fip-XXXX.md
Outdated
1. If the CID uses the identity hash function (inlines a block), this function will not return the CID directly but will instead recursively parse the inlined block. | ||
2. Otherwise, this function will emit the CID. | ||
2. Any "ignored CIDs" will be skiped and ignroed. | ||
3. Any other CIDs will cause this function to signal an error. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you consider ignoring them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I've added a section to the discussion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Co-authored-by: Friðrik Ásmundsson <[email protected]>
FIPS/fip-XXXX.md
Outdated
## Security Considerations | ||
<!--All FIPs must contain a section that discusses the security implications/considerations relevant to the proposed change. Include information that might be important for security discussions, surfaces risks and can be used throughout the life cycle of the proposal. E.g. include security-relevant design decisions, concerns, important discussions, implementation-specific guidance and pitfalls, an outline of threats and risks and how they are being addressed. FIP submissions missing the "Security Considerations" section will be rejected. A FIP cannot proceed to status "Final" without a Security Considerations discussion deemed sufficient by the reviewers.--> | ||
|
||
This FIP primarily aims to increase the security of the FVM and guard against malicious and/or buggy wasm actors. However, as with any code: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit, but I think the convention is capital "W". So, Wasm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved editorial review.
Also, great work! This is a really well written FIP. No real editorial changes, and very simply stated.
FIPS/fip-XXXX.md
Outdated
|
||
We will call these "allowed CIDs". | ||
|
||
Additionally, FVM block link analysis will ignore CIDs of sealed and unsealed commitments, as long as the multihash _digest_ is less than 64 bytes: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FVM block link analysis will ignore CIDs of sealed and unsealed commitments
is the reason for specify these two cuz sealed and unsealed commitments are the only two fields in state that follows the same codecs ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's because:
- We store such CIDs in the state already.
- Sector state isn't "reachable" from actor code regardless.
Basically, it lets actors refer to sectors and pieces directly (by CID) without messing with the FVM complaining about such CIDs not being reachable. I'll update the text.
FIPS/fip-XXXX.md
Outdated
@@ -0,0 +1,313 @@ | |||
--- | |||
fip: "<to be assigned>" <!--keep the qoutes around the fip number, i.e: `fip: "0001"`--> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FIP0071
(havent fully reviewed yet - tho @Stebalien @fridrik01 I have assigned FIP-0071 to this FIP and feel free to merge with updated FIP number given you have gotten 2 editorial ✅ already, further updates can be made in different PRs |
Review addressed and FIP# assigned. |
Let’s also add it to the readme table - please merge once that’s in |
This FIP brings us a step closer to user-defined WebAssembly actors by introducing deterministic rules defining what "state" (IPLD blocks) an actor is and is not allowed to read. Specifically, this FIP specifies that an actor may only read state "reachable" from:
Where "reachable" means that the state can be "reached" by traversing IPLD links (CIDs) from the "roots" listed above.
Discussions: #764
WIP Implementation: filecoin-project/ref-fvm#1824