Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FIP-0071: Deterministic State Access (IPLD Reachability) #763

Merged
merged 13 commits into from
Aug 22, 2023

Conversation

Stebalien
Copy link
Member

@Stebalien Stebalien commented Jul 28, 2023

This FIP brings us a step closer to user-defined WebAssembly actors by introducing deterministic rules defining what "state" (IPLD blocks) an actor is and is not allowed to read. Specifically, this FIP specifies that an actor may only read state "reachable" from:

  1. The actor's state-tree.
  2. Parameters passed into the actor from other actors.
  3. Blocks returned to the actor from other actors.

Where "reachable" means that the state can be "reached" by traversing IPLD links (CIDs) from the "roots" listed above.

Discussions: #764
WIP Implementation: filecoin-project/ref-fvm#1824

FIPS/fip-XXXX.md Outdated
4. If the major type is 0 (integer), 1 (negative integer), or 7 (special), we continue.
5. If the major type is 2 (byte string) or 3 (string), we seek forward "immediate value" bytes and continue. We do not validate the string's encoding.
6. If the major type is 4 (array) we add the immediate value to the expected number of fields, and continue.
7. If the major type is 5 (map) we add two times the immediate value to the expected number of fields, and continue. We do not or restrict the allowed key/value types, nor do we validate the order of keys, nor do we check for duplicates.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC IPLD CBOR has a hard constraint that all map keys are strings. Is the idea that any block already linked will be valid IPLD CBOR so there's no special care needed to ensure that?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC IPLD CBOR has a hard constraint that all map keys are strings. Is the idea that any block already linked will be valid IPLD CBOR so there's no special care needed to ensure that?

No, the idea is that the FVM doesn't really care one way or the other. Applications may care, but then its their job to check. The FVM checks the property it cares about.


(small rant)

I'd like to eventually move towards a world were we split the IPLD model into:

  • A "block" model. I.e., blocks, CIDs, and blocks that link to other blocks. This is the level of bitswap, blockstores, garbage collection, etc.
  • A "path" model that simply defines how to path through blocks. You need this for basic pathing, filesystems, etc.
  • A "data" model. This is the part that starts caring about map key types, ordering of map keys, etc. You need this if you want to write general-purpose algorithms over abstract data (e.g., graphsync).

One of the biggest challenges in IPLD has been agreeing on how formats map to the data model, especially as not all formats will nicely "fit" into the data model. My goal here is to say that fitting into the data model simply isn't necessary.

Of course, maybe we should use a codec that's not "dag cbor", but I don't think that's really necessary.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add some of this to the design tradeoffs.

Copy link
Contributor

@ZenGround0 ZenGround0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you explain a bit more of the details and motivation behind the handle concept? What does this level of indirection help us with as opposed to just using cids directly as handles and merging block_create with block_link and block_open with block_read?

Are handles uniquely determined by cids or the same cid could have multiple handles within the same actor? Are handles for the same cid different in different actors? Do handles contain information about which actor's reachable set they originate from? What do handles look like in memory/serializaed as method parameters in send:send?

FIPS/fip-XXXX.md Outdated
1. Take advantage of our existing Wasm gas accounting rather than manually charging for gas based on the CBOR's structure.
2. Ensure that all implementations used the exact same wasm parsing and validation logic.

However, this would have increased implementation complexity and introduced additional runtime costs (switching in and out of Wasm isn't free).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be very nice to use the same gas model to account for this. Do you have a quantitative estimate of how much worse this makes performance? For block_create it seems like we'd need no context switching since we already have the data for processing in WASM memory. Extra context switching shows up during block_open because only the handle need be sent back to WASM from syscall? So we'd pay for two writes to wasm memory once in block_open once in block_read?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, we do need to copy and jump every time because we can't trust the user's wasm instance. Basically:

  1. Start a new wasm instance for processing the block. We could try to reuse instances here, but that may lead to "weird things". E.g., different costs depending on whether or not memory is already pre-allocated.
  2. Jump into the wasm instance to allocate some memory for the block.
  3. Jump back into the system to copy the block into the wasm instance's memory.
  4. Jump back into the wasm instance to parse the block.
  5. Finally, return to the system.
  6. And then parse the CIDs emitted by the wasm instance.

But the main part really is complexity:

  1. Doing this in wasm saves us from having to define 2 gas charges.
  2. Doing this natively saves us from having to manage an additional wasm module, spin up an instance, define the syscalls for crossing back and forth, etc.

Basically, the current approach is both simpler and faster.

@Stebalien
Copy link
Member Author

Could you explain a bit more of the details and motivation behind the handle concept? What does this level of indirection help us with as opposed to just using cids directly as handles and merging block_create with block_link and block_open with block_read?

I've added a link to https://github.com/filecoin-project/FIPs/blob/master/FIPS/fip-0030.md#ipld-memory-model to the background section. Those syscalls aren't being introduced in this FIP (although I'm happy to discuss them here).

Basically, it comes down to the fact that:

  1. We might want to "create" blocks without hashing them (hence returning a handle from create instead of returning a CID). This lets us send parameters/return values as IPLD blocks without hashing.
  2. We won't know the size of an IPLD block when we first try to open it so we don't have a great way to pre-allocate memory for it. The current open/read API mimics the unix file API.

Copy link
Member

@anorth anorth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll be happy to give this an editorial 👍 when the abstract is filled in.

I note that test cases will also need to be specified before it can move to last call.

@Stebalien Stebalien marked this pull request as ready for review August 2, 2023 01:50
@Stebalien
Copy link
Member Author

Ok, this is now ABG&T (all but gas and tests).

FIPS/fip-XXXX.md Outdated Show resolved Hide resolved
Co-authored-by: Alex North <[email protected]>
Copy link
Contributor

@fridrik01 fridrik01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh had some comments pending for 2 weeks without noticing, sharing now

FIPS/fip-XXXX.md Outdated
Comment on lines 50 to 53
1. `ipld::block_create(codec, data) -> handle` registers a new IPLD block with the FVM, returning a block "handle" (similar to a file descriptor).
2. `ipld::block_open(cid) -> handle` "opens" the IPLD block referenced by the passed CID, returning a handle to the block.
3. `ipld::block_read(handle, offset, length) -> data` reads data from an open block into the actor's memory.
4. `ipld::block_link(handle, multihash_code, multihash_length)` creates a CID for the block referenced by the given block "handle", using the specified multihash code and length (currently limited to blake2b-256).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think those signatures are missing some params, for example shouldn't block_read be:

ipld::block_read(handle, offset, obuf, max_length) reads block identified from handle into obuff which is in actors' memory

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically, but I'm trying to convey the concept here more than the literal API (that's specified in the other FIP linked at the top of the spec). I'll document that.

FIPS/fip-XXXX.md Outdated
1. If the CID uses the identity hash function (inlines a block), this function will not return the CID directly but will instead recursively parse the inlined block.
2. Otherwise, this function will emit the CID.
2. Any "ignored CIDs" will be skiped and ignroed.
3. Any other CIDs will cause this function to signal an error.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you consider ignoring them?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I've added a section to the discussion.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FIPS/fip-XXXX.md Outdated Show resolved Hide resolved
FIPS/fip-XXXX.md Outdated Show resolved Hide resolved
FIPS/fip-XXXX.md Outdated
## Security Considerations
<!--All FIPs must contain a section that discusses the security implications/considerations relevant to the proposed change. Include information that might be important for security discussions, surfaces risks and can be used throughout the life cycle of the proposal. E.g. include security-relevant design decisions, concerns, important discussions, implementation-specific guidance and pitfalls, an outline of threats and risks and how they are being addressed. FIP submissions missing the "Security Considerations" section will be rejected. A FIP cannot proceed to status "Final" without a Security Considerations discussion deemed sufficient by the reviewers.-->

This FIP primarily aims to increase the security of the FVM and guard against malicious and/or buggy wasm actors. However, as with any code:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, but I think the convention is capital "W". So, Wasm.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

Copy link
Collaborator

@kaitlin-beegle kaitlin-beegle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved editorial review.

Also, great work! This is a really well written FIP. No real editorial changes, and very simply stated.

FIPS/fip-XXXX.md Outdated

We will call these "allowed CIDs".

Additionally, FVM block link analysis will ignore CIDs of sealed and unsealed commitments, as long as the multihash _digest_ is less than 64 bytes:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FVM block link analysis will ignore CIDs of sealed and unsealed commitments

is the reason for specify these two cuz sealed and unsealed commitments are the only two fields in state that follows the same codecs ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's because:

  1. We store such CIDs in the state already.
  2. Sector state isn't "reachable" from actor code regardless.

Basically, it lets actors refer to sectors and pieces directly (by CID) without messing with the FVM complaining about such CIDs not being reachable. I'll update the text.

FIPS/fip-XXXX.md Outdated
@@ -0,0 +1,313 @@
---
fip: "<to be assigned>" <!--keep the qoutes around the fip number, i.e: `fip: "0001"`-->
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FIP0071

@jennijuju jennijuju changed the title Deterministic State Access (IPLD Reachability) FIP-0071: Deterministic State Access (IPLD Reachability) Aug 22, 2023
@jennijuju
Copy link
Member

(havent fully reviewed yet - tho @Stebalien @fridrik01 I have assigned FIP-0071 to this FIP and feel free to merge with updated FIP number given you have gotten 2 editorial ✅ already, further updates can be made in different PRs

@Stebalien
Copy link
Member Author

Review addressed and FIP# assigned.

@Stebalien Stebalien merged commit 87754d8 into filecoin-project:master Aug 22, 2023
1 check passed
@Stebalien Stebalien deleted the steb/ipld-analysis branch August 22, 2023 05:26
@jennijuju
Copy link
Member

Let’s also add it to the readme table - please merge once that’s in

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants