Add support for actor upgrades #1866

fridrik01 · 2023-08-30T17:46:04Z

This PR adds support for actor upgrades through a new sdk::actor::upgrade_actor syscall. This allows actors to upgrade to a new version while still keeping the same address, balance, etc.

The implementation follows the proposal discussed in filecoin-project/FIPs#396 almost exactly except with the two minor changes:

The new syscall was referred to as sself::become_actor but we renamed it to sdk::actor::upgrade_actor to better reflect its meaning
We decided to not create a separate syscall to get the code cid of the actor initiating the upgrade, but instead the new syscall takes a second UpgradeInfo parameter to pass in metadata IPLD block (see FVM: Actor Upgrades FIPs#396 (comment))

codecov-commenter · 2023-08-30T17:53:31Z

Codecov Report

Merging #1866 (84a47d1) into master (e9044cb) will decrease coverage by 19.68%.
Report is 2 commits behind head on master.
The diff coverage is 0.00%.

Additional details and impacted files

@@             Coverage Diff             @@
##           master    #1866       +/-   ##
===========================================
- Coverage   75.73%   56.05%   -19.68%     
===========================================
  Files         152      153        +1     
  Lines       15073    15259      +186     
===========================================
- Hits        11415     8553     -2862     
- Misses       3658     6706     +3048

Files	Coverage Δ
fvm/src/kernel/error.rs	`60.00% <ø> (-12.10%)`	⬇️
fvm/src/syscalls/send.rs	`0.00% <ø> (-100.00%)`	⬇️
fvm/src/trace/mod.rs	`0.00% <ø> (-100.00%)`	⬇️
shared/src/error/mod.rs	`0.00% <ø> (-48.58%)`	⬇️
shared/src/lib.rs	`20.00% <ø> (ø)`
fvm/src/call_manager/backtrace.rs	`0.00% <0.00%> (-84.06%)`	⬇️
fvm/src/syscalls/error.rs	`0.00% <0.00%> (-48.49%)`	⬇️
shared/src/upgrade/mod.rs	`0.00% <0.00%> (ø)`
sdk/src/send.rs	`0.00% <0.00%> (ø)`
fvm/src/syscalls/mod.rs	`0.00% <0.00%> (-97.82%)`	⬇️
... and 9 more

... and 45 files with indirect coverage changes

Stebalien

initial feedback

fvm/src/call_manager/default.rs

Stebalien · 2023-09-05T23:20:38Z

fvm/src/call_manager/default.rs

+            // Make a store.
+            let mut store = engine.new_store(kernel);
+
+            // TODO: a hack until I find a better/simpler way to do this


So, honestly, we can probably treat these errors as "fatal" and check earlier. I.e., the final version of upgrade (not for this PR, but later) would:

Lookup the target code CID in some "deployed actors" table in the init actor.

Check some metadata generated on deploy to see if the actor supports the upgrade entrypoint.

By the time we get to this code, everything will have been checked and any errors would be considered "fatal".

For now, what we'd likely do is check if:

The caller is one of builtin actor types X/Y/Z (in this case, it would be restricted to an eth account).

The target code CID is an EVM actor (again, using the builtin actors manifest).

Because that's the immediate use-case (upgrading eth accounts into EVM actors).

fvm/src/call_manager/default.rs

Stebalien

Some more thoughts. Let's talk tomorrow.

fvm/src/call_manager/default.rs

Stebalien · 2023-09-21T23:11:04Z

fvm/src/call_manager/default.rs

+
+    fn maybe_put_registry(&mut self, br: &mut BlockRegistry) -> Result<()> {
+        match self.entrypoint {
+            Entrypoint::Invoke(_) => Ok(()),


IMO, we should treat invoke params the same way (if possible).

fvm/src/call_manager/default.rs

fridrik01 · 2023-10-03T15:23:46Z

@Stebalien ready for another review. There are still some build errors in CI which are not occurring on my machine which I am looking into

Stebalien

Only major thing is recursion handling. Otherwise, quibbles but LGTM. I'll take one more look tomorrow.

fvm/src/call_manager/default.rs

fvm/src/kernel/default.rs

Stebalien · 2023-10-16T05:14:01Z

fvm/src/kernel/default.rs

+                if code != code_after_upgrade {
+                    return Err(syscall_error!(Forbidden; "re-entrant upgrade detected").into());
+                }


I think this check should actually be in send, not upgrade. I.e.:

If an actor calls A (upgrade -> upgrade -> upgrade), that's fine. Because, on success, we'll "unwind" past all the upgrade calls all at once.

If an actor has a call stack like A -> B -> A (upgrade), that's not fine and we need to abort when we return to the top-level call into A. Otherwise, we'd end up running old code.

It's that last point that's problematic: We never want to re-enter some actor code after we've upgraded away from it.

And, in that second case, we should treat it as a "revert" of B (I think?).

Actually, after talking this through with @maciejwitowski, there's an alternative here that I kind of like. Instead of catching this when we return to the top-level invocation of A, we can catch this on the upgrade syscall itself. To do this, we'd need to:

Keep a map of ActorID -> reentrancy count, incrementing the count each time we re-enter an actor already on the call-stack.

In the upgrade syscall, if this count is non-zero (for the current actor), return a Forbidden syscall error (or something like that).

One reason I like this approach is that, independent of this particular change, I'd like to find a way to expose whether or not a call is reentrant to actors. I.e., some kind of flag in their environment where they can detect that they're within a reentrant call. I expect most actors will, at that point, simply abort with an error.

@Stebalien I played around with this approach in 109c5e9

Ah, I mean that we need to keep track of calls in progress, not upgrades. I.e., if some actor A is already on the call stack, no "deeper" instance of A should be able to call the upgrade syscall.

~~Oh lets talk sync today~~, Oh I see, updated the PR with that in mind and added new tests

We can remove this now, right?

fvm/src/kernel/error.rs

fvm/src/kernel/default.rs

…yscall

fvm/src/call_manager/mod.rs

Stebalien

Testing: Let's make sure we cover cases like:

Selfdestruct -> upgrade
Re-entrent call -> upgrade
upgrade -> upgrade
failure in a recursive upgrade where the top-level upgrade sticks (e.g., the code CID should be changed).
upgrade to some code CID that doesn't exist (maybe?)
upgrade to some code CID that doesn't implement the upgrade entrypoint.

You're probably already covering some of these cases.

fvm/src/call_manager/default.rs

fvm/src/call_manager/mod.rs

shared/src/lib.rs

testing/integration/tests/readonly_test.rs

Stebalien · 2023-10-20T19:12:42Z

sdk/src/actor.rs

@@ -107,6 +107,21 @@ pub fn create_actor(
    }
 }

+/// Upgrades an actor using the given block which includes the old code cid and the upgrade params
+pub fn upgrade_actor(new_code_cid: Cid, params: Option<IpldBlock>) -> SyscallResult<Response> {


We can leave it like this for now, but we should have a better API.

We generally take Cid by reference (it's large so we avoid copying till we need to).

There is no "success" case here.

It should probably look something like:

Suggested change

pub fn upgrade_actor(new_code_cid: Cid, params: Option<IpldBlock>) -> SyscallResult<Response> {

pub enum UpgradeError {

CodeNotInstalled, // code not available, could be worded better.

InvalidUpgradeTarget, // code doesn't implement the upgrade entrypoint.

ReentrentUpgrade, // actor already on the call stack.

UpgradeFailed(Response),

}

pub fn upgrade_actor(new_code_cid: Cid, params: Option<IpldBlock>) -> Result<std::convert::Infallible, UpgradeError> {

I wish we could use ! (the correct never type) but that's not stable yet.

The reason to do this is that we want to treat every possible return here as an error. If the user blindly writes upgrade_actor(...)? or upgrade_actor(...).expect("upgrade failed"), that should work.

Hm. ReentrentUpgrade may not quite work (see my other comments on the conflict with selfdestruct). Honestly, I'd just change the variant in this enum to IllegalOperation and document it as an "either or" case (where, 99% of the time, it'll be because the actor doesn't exist).

I changed to take CID by reference, will take a look at infallable/UpgradeError next

Per our conversation below, InvalidUpgradeTarget likely isn't something we need to include in this enum (that's covered by an exit code, not an error number, and we should probably leave the exit codes alone).

testing/test_actors/actors/fil-upgrade-actor/src/actor.rs

fvm/src/kernel/default.rs

Stebalien · 2023-10-20T19:23:05Z

sdk/src/sys/actor.rs

+    /// | [`InvalidHandle`]     | parameters block not found.                          |
+    /// | [`LimitExceeded`]     | recursion limit reached.                             |
+    /// | [`IllegalArgument`]   | invalid code cid buffer.                             |
+    /// | [`Forbidden`]         | target actor doesn't have an upgrade endpoint.       |


Once we're done with everything else, let's make sure to go back and revisit this.

testing/integration/tests/main.rs

- Rename send_* to call_actor_* in call manager - Check if upgrade is allowed locally inside kernel upgrade_actor - Other minor refactorings

fridrik01 · 2023-10-24T18:31:09Z

Testing: Let's make sure we cover cases like:

Selfdestruct -> upgrade

Re-entrent call -> upgrade

upgrade -> upgrade

failure in a recursive upgrade where the top-level upgrade sticks (e.g., the code CID should be changed).

upgrade to some code CID that doesn't exist (maybe?)

upgrade to some code CID that doesn't implement the upgrade entrypoint.

You're probably already covering some of these cases.

From this list, we are at least covering:

Selfdestruct -> upgrade: covered with method 5 in fil-upgrade-actor
Re-entrent call -> upgrade: covered with method 4 in fil-upgrade-actor
upgrade -> upgrade: covered with method 3 in fil-upgrade-actor
upgrade to some code CID that doesn't implement the upgrade entrypoint: covered in fil-syscall-actor).

From the remaining cases:

upgrade to some code CID that doesn't exist (maybe?): Right now we blindly accept code_cids in the kernel (as long as they could be parsed) and add them to the actor state. What would be the way to go do check if that code actually exists? (maybe store.get_cbor)?
failure in a recursive upgrade where the top-level upgrade sticks (e.g., the code CID should be changed): Not sure I understand this, so its a case of "upgrade -> upgrade" , how would the 2nd upgrade CID stick around as long as the syscall succeded.

Stebalien · 2023-10-24T19:41:25Z

upgrade to some code CID that doesn't exist (maybe?): Right now we blindly accept code_cids in the kernel (as long as they could be parsed) and add them to the actor state. What would be the way to go do check if that code actually exists? (maybe store.get_cbor)?

It's probably fine to leave that out, in that case.

failure in a recursive upgrade where the top-level upgrade sticks (e.g., the code CID should be changed): Not sure I understand this, so its a case of "upgrade -> upgrade" , how would the 2nd upgrade CID stick around as long as the syscall succeded.

I mean, I want to make sure that the following works:

Upgrade from code A to code B.
Inside code B's upgrade function, try to recursively upgrade to code C. Fail.
Return success from code B's upgrade function.

At this point, the actor's code CID should be B, not A or C.

Stebalien · 2023-10-24T19:43:13Z

upgrade -> upgrade: covered with method 3 in fil-upgrade-actor

Are you sure it's testing the right thing? I.e., upgrade (to code A) -> upgrade (to code B) should result in code B.

See #1866 (comment)

- Now checking correctly that the code cid changed - Refactored tests into test cases which makes them more readable - Added new upgrade receiver actor so we can test upgrades with different actors - Added test case for testing failure in a recursive upgrade

Stebalien

🎉

This is a minimal version of #1906 that still compiles the relevent code, just disables it at runtime.

fridrik01 force-pushed the actor-upgrades branch 6 times, most recently from 6c352f5 to bb5107e Compare September 1, 2023 16:12

fridrik01 requested a review from Stebalien September 5, 2023 14:34

Stebalien reviewed Sep 5, 2023

View reviewed changes

fridrik01 force-pushed the actor-upgrades branch 3 times, most recently from dc4e730 to 22242b1 Compare September 8, 2023 15:21

Stebalien reviewed Sep 21, 2023

View reviewed changes

fridrik01 force-pushed the actor-upgrades branch 5 times, most recently from 2de14cd to 22318f9 Compare October 3, 2023 15:00

fridrik01 changed the title ~~WIP: Actor Upgrades~~ Add support for actor upgrades Oct 3, 2023

fridrik01 marked this pull request as ready for review October 3, 2023 15:23

fridrik01 added 9 commits October 3, 2023 16:06

wip

ccb9c62

Add UpgradeInfo as ipld block to upgrade endpoint

3d5a07d

Cleanup unused Abort enum

d26bafb

exit now returns to upgrade caller, still issue with block registry

078f902

fix issue with block registry + initial handle success/failure

4847d04

several fixes

ab25ab3

simplify match arm in CM

b26f589

Improve error checking + more tests

85a9129

Detect re-entrant upgrade

aead419

reduce disk space by having fewer integration test files which broke CI

c1ef6e3

Stebalien reviewed Oct 16, 2023

View reviewed changes

fridrik01 added 2 commits October 17, 2023 18:52

minor fixes

53aa0d5

Reject upgrades if actor already on call stack

1913868

fridrik01 force-pushed the actor-upgrades branch from 109c5e9 to 1913868 Compare October 18, 2023 11:47

Add test for calling upgrade on an actor alread on call stack

c7e3498

Stebalien reviewed Oct 18, 2023

View reviewed changes

fvm/src/kernel/default.rs Outdated Show resolved Hide resolved

fridrik01 force-pushed the actor-upgrades branch from 16d03b2 to dfc4d61 Compare October 19, 2023 18:40

Add actor call stack to call manager + detect reentrant upgrades in s…

0f65803

…yscall

fridrik01 force-pushed the actor-upgrades branch from dfc4d61 to 0f65803 Compare October 19, 2023 18:48

Stebalien reviewed Oct 19, 2023

View reviewed changes

fvm/src/call_manager/mod.rs Outdated Show resolved Hide resolved

fvm/src/call_manager/mod.rs Outdated Show resolved Hide resolved

simplify call_manager interface for actor call stack

816ed06

Stebalien reviewed Oct 20, 2023

View reviewed changes

fridrik01 added 4 commits October 24, 2023 11:59

Address review comments

79653b6

- Rename send_* to call_actor_* in call manager - Check if upgrade is allowed locally inside kernel upgrade_actor - Other minor refactorings

Rename SendResult to CallResult

caf8200

fix: take cid as ref in upgrade_actor

f55904d

Refactor tests + add testcase for upgrading after self_destruct

dc66e33

fridrik01 and others added 3 commits October 26, 2023 12:59

fix: used wrong id when updating the code_cid in upgrade_actor

700837c

fixup the error conditions

0a98555

Stebalien approved these changes Oct 26, 2023

View reviewed changes

disable the actor-upgrade feature by default

84a47d1

This is a minimal version of #1906 that still compiles the relevent code, just disables it at runtime.

Stebalien enabled auto-merge (squash) October 26, 2023 19:17

Stebalien merged commit c412b3a into master Oct 26, 2023
14 checks passed

Stebalien deleted the actor-upgrades branch October 26, 2023 19:31

arajasek mentioned this pull request Dec 5, 2023

WIP: Integrate latest FVM filecoin-project/filecoin-ffi#439

Closed

fridrik01 mentioned this pull request Dec 15, 2023

FIP-0088: Add support for upgradable actors filecoin-project/FIPs#873

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for actor upgrades #1866

Add support for actor upgrades #1866

fridrik01 commented Aug 30, 2023 •

edited

Loading

codecov-commenter commented Aug 30, 2023 •

edited

Loading

Stebalien left a comment

Stebalien Sep 5, 2023

Stebalien left a comment

Stebalien Sep 21, 2023

fridrik01 commented Oct 3, 2023

Stebalien left a comment

Stebalien Oct 16, 2023

Stebalien Oct 16, 2023

Stebalien Oct 16, 2023

fridrik01 Oct 17, 2023

Stebalien Oct 17, 2023

fridrik01 Oct 18, 2023 •

edited

Loading

Stebalien Oct 24, 2023

Stebalien left a comment

Stebalien Oct 20, 2023

Stebalien Oct 20, 2023

Stebalien Oct 20, 2023

fridrik01 Oct 24, 2023

Stebalien Oct 24, 2023

Stebalien Oct 20, 2023

fridrik01 commented Oct 24, 2023

Stebalien commented Oct 24, 2023

Stebalien commented Oct 24, 2023

Stebalien left a comment

-pub fn upgrade_actor(new_code_cid: Cid, params: Option<IpldBlock>) -> SyscallResult<Response> {
+pub enum UpgradeError {
+    CodeNotInstalled, // code not available, could be worded better.
+    InvalidUpgradeTarget, // code doesn't implement the upgrade entrypoint.
+    ReentrentUpgrade, // actor already on the call stack.
+    UpgradeFailed(Response),
+}
+pub fn upgrade_actor(new_code_cid: Cid, params: Option<IpldBlock>) -> Result<std::convert::Infallible, UpgradeError> {

Add support for actor upgrades #1866

Add support for actor upgrades #1866

Conversation

fridrik01 commented Aug 30, 2023 • edited Loading

codecov-commenter commented Aug 30, 2023 • edited Loading

Codecov Report

Stebalien left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Stebalien left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fridrik01 commented Oct 3, 2023

Stebalien left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fridrik01 Oct 18, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Stebalien left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fridrik01 commented Oct 24, 2023

Stebalien commented Oct 24, 2023

Stebalien commented Oct 24, 2023

Stebalien left a comment

Choose a reason for hiding this comment

fridrik01 commented Aug 30, 2023 •

edited

Loading

codecov-commenter commented Aug 30, 2023 •

edited

Loading

fridrik01 Oct 18, 2023 •

edited

Loading