Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ref-fvm version philosophy #358

Open
jennijuju opened this issue Mar 7, 2022 · 23 comments
Open

ref-fvm version philosophy #358

jennijuju opened this issue Mar 7, 2022 · 23 comments

Comments

@jennijuju
Copy link
Member

Starting a discussion for the ref-fvm version philosophy.

Things to consider:

  • do we associate FVM version w/ network upgrade versions?
  • what's the approach that may support multiple fvm version in the future.
  • what's the FVM version would be for network v16 M1 FVM?
@raulk
Copy link
Member

raulk commented Mar 9, 2022

There are roughly two approaches here:

  1. One FVM release per network version, with client importing all and selecting which version of the FVM to instantiate depending on the network version.
    • Pros: theoretically cleaner; every network version is self contained.
    • Cons: the integration of through FFI, etc. is going to be quite hard; higher complexity; binary bloat.
  2. Single FVM import, with support for network versions added through version guards (like today).
    • Pros: much simpler; easier to reason about; easier to develop / manage the codebase; smaller binary sizes; no FFI drama.
    • Cons: less clean; no isolation/self-containment.

I am more agreeable to (2), and it's going to take quite a bit of effort to convince me that (1) is the better approach.

@raulk
Copy link
Member

raulk commented Mar 9, 2022

@Stebalien Please add your input here, and let's aim to close this issue with a final decision.

@Stebalien
Copy link
Member

First, because I'm pedantic and want us to be on the same page:

Multiple FVMs:

  • Bloat: This isn't correct. It'll all be in a single crate so the only "bloat" will be from the FVM code itself, not the dependencies. The FVM logic itself is pretty tiny and many of the types actually live in "shared".
  • Integration in the FFI: this is pretty easy. We'll need a small abstraction layer, but that comes down to: a constructor, an "apply message" method, and a "flush" method. We'd also likely want to break out the "externs" interface into a new crate, but that's not difficult.
  • High complexity: I'm not sure how not having to support old network versions in the FVM will lead to higher complexity.

Single FVM:

  • Simpler: See my previous comment.
  • Easier to reason about: Multiple FVM versions means less complexity in each version, which should be easier to reason about. Unless you mean "easier to reason about what changed", in which case I agree.
  • Easier to develop/manage: I agree.

So really, the question is development. If we have multiple versions:

  1. Backporting fixes becomes annoying.
  2. The "new" version won't be stress-tested on the network until the network upgrade.
  3. It can get harder to see what was "introduced" in the new network version.

But I think I have a reasonably balanced solution:

  1. Develop the "next" version with version conditionals.
  2. After the network upgrade, release a new FVM version that doesn't support previous network versions.

This has some advantages:

  1. We don't need to support "old" network versions in master.
  2. We can punt the FFI abstraction until NV18 (because we'll build NV17 in the same FVM version as NV16).
  3. Network upgrades won't immediately switch to a "fresh" FVM.

Effectively this means:

FVM v1.0 will support NV16.
FVM v1.1 will support NV16 and NV17.
FVM v1.2 will support NV17.
FVM v1.3 will support NV17 and NV18.

@raulk
Copy link
Member

raulk commented Jun 9, 2022

Now that we're embarked on M2 development, I'd like to revisit this discussion and come to a consensus. I'd like to hear from client implementors about the "sliding versions" approach, as they are the main stakeholder that will need to reason about ref-fvm versions.

Tagging: @filecoin-project/lotus-maintainers @filecoin-project/forest @filecoin-project/venus

  1. FVM 1.0 has shipped (well, it's in RC) with nv15 and nv16 support. nv15 support only exists for testing purposes and we can assume it's a one-off breakage of the above model due to the big milestone, the legacy VM => FVM migration. To clean this up, we could:
    • a) Retain nv15 only in RCs, and drop nv15 before we ship the final v1.0.0; however, this requires a change in code between the last RC and final, which is risky and not something I'd advise.
    • b) Ship 1.0.0 final with nv15 and nv16, and immediately follow up with an 1.0.1 that removes nv15 support, to be adopted by clients in their next bugfix release.
  2. If a client wants to maintain historical sync ability (e.g. Lotus), it will need to import every even minor ref-fvm version (1.0, 1.2, 1.4) and the FVM selector abstraction described above. This is true, except for upgrade releases which would import even minor ref-fvm versions except for the latest, which would be an odd ref-fvm version (1.0, 1.2, 1.3).

@raulk
Copy link
Member

raulk commented Jun 9, 2022

@Stebalien Separately from the above, I wanted to talk specifically about version number usage. The model you propose implies staying on v1 forever, but I'm pretty sure we'll introduce breaking changes in M2 and beyond. So staying on v1 would be incorrect. There are three approaches possible IMO:

  1. Staying on v1 forever, disregarding breaking changes as events to bump major. Strongly oppose.
  2. Adopting semver, breaking changes bump major version. We'd end up with an uneven version history (some NVs will break APIs, others won't), and we'd need a tracking table to reason about version alignment.
    • This adds cognitive overhead.
    • We don't expecct public API breakages outside of network version upgrades. So what's primarrily semantically relevant to us is network version alignment.
  3. Every network version gets a major bump. Not every NV will break interfaces, but this installs a predictable and straightforward pattern.
    • The N.0.0 version could be the one that supports the before and after network versions.
    • The N.1.0 version would drop support for the before network version.

@anorth
Copy link
Member

anorth commented Jun 9, 2022

I am not a client implementer so my suggestions should carry relatively little weight. I think that Raul's original suggestion of single FVM import with support for multiple versions has the best potential to not be a pain. Manage the versioning of behaviour internally in the code for FVM, and maintain as much historical support as reasonable through methods within the code, rather than external to the code in repository branching and versioning. I wouldn't weigh bundle size very highly yet, but maybe a stripped-down build could drop historical support in the future.

I would not couple the versions of FVM to network upgrades very tightly.

  • There will be many reasons to release new (major) versions of the VM that don't change it's consensus-critical behaviour: new client APIs, better inspection facilities, optimisations, debugging tools, etc etc. You don't want to hold them back for network releases. Clients should be able to upgrade to the latest hotness at any point (and I would try to extend this to full archive clients).
  • Necessary changes to behaviour for network upgrade will require new FVM releases. But such release could come well before the network "turns on" the new behaviour via network version upgrade.

@raulk raulk added this to the M2 milestone Jun 9, 2022
@Stebalien
Copy link
Member

I would not couple the versions of FVM to network upgrades very tightly.

I really think it depends on the release, but I expect that M2 will have some significant changes (especially around how accounts are handled). I expect that most network upgrades won't involve versioning the FVM itself as backwards compatibility will become more and more important as we get user-defined actors.

In terms of abstracting over multiple versions, I was assuming we'd do this at the FFI layer, but we can always add a new "abstraction" crate that can pull in older versions of the FVM. That should be significantly cleaner than a bunch of ifdefs.

@raulk
Copy link
Member

raulk commented Jun 10, 2022

@anorth Now that we have a better line of sight into the kinds and magnitudes of code changes that will go into M2, as well as future features, I'm withdrawing my skepticism around versioning ref-fvm and having multiple version imports to support various NVs. It will lead to cleaner code, and "flip the page" mental model as we ship new ref-fvm releases supporting new NVs.

I fear that the build up from supporting every network version in the past will rapidly muddy the codebase. There are three kinds of changes that can go in network upgrades. We're OK with the first, kinda OK with the second one, and definitely not good with the third:

  1. Backwards compatible changes. For example, reachability analysis, refcounting/linkstore, etc. These don't require network guards because the new logic should be a noop in terms of consensus with the previous logic.
  2. Easily gateable/configurable changes, e.g. gas, the set of wired syscalls, etc. These would be simple ifs.
  3. More fundamental/invasive changes, e.g. account abstraction model, code upgrades, etc. If we have to selectively enable these features, the code would get very unreadable.

@Stebalien and I caught up today and we're aligned on this approach.

  1. Every network version is a new major version of ref-fvm, with a "flip the page" mental model.
    • nv16 => ref-fvm v1, nv17 => ref-fvm v2, nv18 => ref-fvm v3...
    • What do we do with NVs that don't require ref-fvm changes? Do we strive for uniformity (and re-release with no changes), or do we omit the bump? I prefer the latter; the relationship between ref-fvm versions and NVs would be one-to-many.
  2. Code changes can safely assume that they're targeting only that network version onwards. Therefore, they don't need to switch behaviours depending on network version.
  3. However, on every new version will strive to maintain backwards compatibilty with the previous version, as this facilitates canary testing on mainnet. This may not always be feasible/sustainable. If during development the effort/value tradeoff is no longer justified, we will drop compatibility for that release.
    • Backwards compatibility would only exist during alpha, beta, pre, and rc releases (as it's only useful for testing).
    • For the final release, all backwards compatibility will be dropped.
  4. When performing upstream dependency upgrades, we may backport them to previous ref-fvm versions as this will keep the dep graph small and the resulting filecoin-ffi library lighter.
  5. On Monday after debug execution: implement actor redirects in engine #605 is merged, we will create the release/v1 branch and master will switch to v2.
  6. We can have an selector crate that selects the right fvm dependency given a NV or an epoch (assuming mainnet, always).

@jennijuju
Copy link
Member Author

Code changes can safely assume that they're targeting only that network version onwards
What if its a bug fix thats not consensus breaking? are we going to use minor version for that?

question what will happen in the following situation:

  • master version v2-dev, network version 88
  • feature A introduced by FIP-1001 gets implemented and merged into v2-dev
  • feature B, independent of feature A, introduced by FIP-1002 gets implemented and merged into v2-dev
  • core devs decide to finalize only FIP-1001 in nv89
    how will release v2 be cut?

@Stebalien
Copy link
Member

Every network version is a new major version of ref-fvm, with a "flip the page" mental model.

For M2, yes. But not necessarily every network version. We're going to have a lot of network versions that don't require large changes to the FVM. Really, I expect most upgrades to require much smaller changes like adding syscalls.


@jennijuju

core devs decide to finalize only FIP-1001 in nv89. how will release v2 be cut?

It really depends on the change. IMO, we'd want to add some version guards on the FIP-1001 feature. I think our primary concern is accidentally introducing consensus-breaking changes as we refactor large portions of the FVM. If the code is already there (even if it's behind an if statement), there's a lot less to be concerned about.

If the change is really invasive, we might just have to either revert.

@mriise
Copy link
Contributor

mriise commented Jun 11, 2022

@jennijuju

core devs decide to finalize only FIP-1001 in nv89. how will release v2 be cut?

the decision of what new things get finalized should probably happen around beta and pre (maybe rc?) stages, managing these release stages (im guessing different branches, or CI something something) seems that it may become cumbersome, but it seems to work nicely for rust-lang

@raulk
Copy link
Member

raulk commented Jun 13, 2022

Every network version is a new major version of ref-fvm, with a "flip the page" mental model.

For M2, yes. But not necessarily every network version. We're going to have a lot of network versions that don't require large changes to the FVM. Really, I expect most upgrades to require much smaller changes like adding syscalls.

Agree, but the point I'm trying to drive home is that regardless of the scope/impact of the change, the mental model is: "this new version doesn't need to preserve backwards compatibility."

@jennijuju
Copy link
Member Author

@jennijuju

core devs decide to finalize only FIP-1001 in nv89. how will release v2 be cut?

the decision of what new things get finalized should probably happen around beta and pre (maybe rc?) stages, managing these release stages (im guessing different branches, or CI something something) seems that it may become cumbersome, but it seems to work nicely for rust-lang

So only cherry pick the needed PR when it’s ship time? If that’s the case sgtm!

@raulk
Copy link
Member

raulk commented Jun 25, 2022

This discussion resulted in a successful outcome above. Closing.

@raulk raulk closed this as completed Jun 25, 2022
@jennijuju
Copy link
Member Author

jennijuju commented Dec 8, 2022

@lemmih from the forest team has raised that: currently fvm publish different versions of fvm creates to support different filecoin network version. For example, fvm@2.0.0 for nv16/17 and fvm@3.0.0 for nv18.

"Such a scheme is problematic because version 2 of the fvm crate cannot be extended with breaking changes. Its dependencies will become dated and security patches will not be back-ported. As such, version 2 will for all practical purposes, be unmaintained. From Forest's perspective, this is a complete show-stopper for us."

Steb pointed out that currently, such described patches are being released with minor version. Other than thats bad semantic versioning, but more importantly it means minor versions may introducing breaking changes that will break down streaming users. If such behaviour is expected, there should be a huge warning to user in README.

But in general, thats not a good eng, practise. Both David and Steb suggested, we should consider to respect the semantic versioning rule more, and introducing new create name for each version of fvm. i.e: fvm_v1@ for nv16/17 and fvm_v2@ for nv18. Each of those crates could then be maintained.

@jennijuju jennijuju reopened this Dec 8, 2022
@shawnrader
Copy link

From the filecoin-ffi side I have some concerns around scope expansion of the filecoin-ffi project with this fvm release strategy. For fil-crypto team, we are using filecoin-ffi to export Go bindings for our API crate, filecoin-proofs-api. The filecoin-proofs-api rust crate handles which APIs are available for which network versions (groth16, halo2, testudo, vector commits, etc).

For FVM in filecoin-ffi, it is also exporting a Golang interface, but in addition to that it is including a translation layer to handle differences between the different fvm crates (fvm2, fvm3). I would suggest moving the translation layer into it’s own wrapper/API rust crate to make sure the various FVM crates play nice with each other and dependencies are resolved at that layer. This would make the filecoin-ffi build much simpler, and also help out the Rust based Filecoin clients which do not use filecoin-ffi but may need to build with multiple FVM versions and would otherwise have to deal with resolving those dependency conflicts in their own projects.

@q9f
Copy link
Member

q9f commented Dec 12, 2022

But in general, thats not a good eng, practise. Both David and Step suggested, we should consider to respect the semantic versioning rule more, and introducing new create name for each version of fvm. i.e: fvm_v1@ for nv16/17 and fvm_v2@ for nv18. Each of those crates could then be maintained.

What about supporting all previous network versions in one fvm release and handling the logic internally in the crate? That way you would be able to use whatever network version you desire for your project but still always have the latest code.

We had the same discussion around a year ago but I cannot recall where, maybe on slack...

@lemmih
Copy link
Contributor

lemmih commented Dec 12, 2022

But in general, thats not a good eng, practise. Both David and Step suggested, we should consider to respect the semantic versioning rule more, and introducing new create name for each version of fvm. i.e: fvm_v1@ for nv16/17 and fvm_v2@ for nv18. Each of those crates could then be maintained.

What about supporting all previous network versions in one fvm release and handling the logic internally in the crate? That way you would be able to use whatever network version you desire for your project but still always have the latest code.

We had the same discussion around a year ago but I cannot recall where, maybe on slack...

I'm strongly in favor of making the fvm backwards compatible with older network versions. If accidental changes to the logic of older network versions is a concern, we at ChainSafe would gladly set up the infrastructure for running exhaustive compliance tests: For each new release of the fvm, we could re-validate every block of the mainnet blockchain (starting at nv16 when the FVM became mandatory). This would allow the code to be actively maintained while guaranteeing that the logic of the past stay unchanged.

@Stebalien
Copy link
Member

It's not about accidentally breaking things, its about the complexity of supporting multiple versions. We've shipped the existing versions of the FVM knowing that we'll need to change things in the future. Future FVM versions will:

  • Add additional validation logic not present in previous versions.
  • Completely change many of the system calls.
  • Potentially change the entire syscall ABI.
  • Change how/when we charge gas.

Making these changes depend on a network version internally would be difficult, to put it mildly. We're not making minor tweaks and adjustments, we're making very large changes to how everything works under the covers.

Ideally, we'd have a shim inside the FVM itself to mux between different FVM versions.

@Stebalien
Copy link
Member

Also note: once we have multiple FVM crates, this should become much easier. We can then make backwards incompatible changes to prior FVM crates such that the APIs use common shared types. That will let us write a simple abstraction to multiplex between multiple VM versions.

@q9f
Copy link
Member

q9f commented Dec 13, 2022

I think it's a philosophical discussion. In general, blockchains by nature want to be backwards compatible as the history, to some degree, is always important in the present. I have noticed that Filecoin works differently here. But traditionally, you want to maintain that extra logic to ensure future compatibility with previous protocol versions. Outsourcing that logic into stand-alone crates basically results in letting this part of the protocol die as nobody wants to maintain marginally used crates. You could now argue the same is true for having everything in one crate, however, that's the better design decision architecture and maintenance wise (in my opinion).

@raulk
Copy link
Member

raulk commented Jan 4, 2023

@q9f I understand the point you're raising, but ref-fvm nor this versioning policy limits or conditions historical support in any manner, as far as I can tell. Clients are free to retain support for historical network versions in their codebases, and that support would just use the corresponding ref-fvm major version.

@raulk
Copy link
Member

raulk commented Jan 4, 2023

Definitely agree that the ref-fvm selector should live in a separate repo or ref-fvm (I prefer the latter), so that Forest can reuse it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants