-
Notifications
You must be signed in to change notification settings - Fork 799
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PVF: Consider treating RuntimeConstruction
as an internal execution error
#661
Comments
We'll no-show when this happens I guess. We should double check what happens when we run out of tranches though.. We run out of tranches when all validators are checking anyways, so de facto it should collapse into 2/3 votes at some point before this. We need roughly one no-show per tranche though, but each tranche consumes 2.25 of the checkers, ignoring the zeroth tranche, so we've exhausted like 1 / 2.25 0.4444... of the validator set as no-shows, so nothing could be approved anyways. |
Try to make sense of this this. Please correct me if I'm mistaken.:) @mrcnski We want to avoid raising polkadot-sdk/polkadot/node/core/pvf/common/src/executor_intf.rs Lines 138 to 144 in 0d3c67d
For the purpose, this one in particular shouldn't be a |
Wow, good catch. It's not even a problem with the PVF, but with the executor parameters which come from on-chain. So if somehow there are invalid exec params on-chain, it would break here for everybody. We do have some checks for exec parameter correctness (do we check that they can be converted to wasmtime semantics?), and beside that there's not much we can do - in the words of @s0me0ne-unkn0wn, if the network wants to commit suicide, it will find a way. So we're kind of borked here anyway, so doesn't make too much difference how we handle this, but we might as well raise a sensible error. Raising an internal error here doesn't make sense because it's not a local issue, but an issue with the chain. Maybe even panicking is appropriate, because it seems like we can only get to this state with a suicided chain (game over anyway), or some developer bug.
Yes, exactly! 🚀 |
I see. This is much more serious that I thought. I think it justifies to panick either way.
Nicely said. 🤣 |
Note that currently, |
Or, we panick in |
Yes, sounds good, and in that case, |
Per the original issue, we now check for runtime construction errors during prechecking, so it is sound to treat this as an internal error in execution. We just have to make sure to do the runtime construction outside of the execution job process. Since that process runs untrusted code, malicious code can return any error they want, so we can't treat any error returned by that process as an internal/local error. |
After #2406, what's left to be done? |
Hey @eagr! I think just implementing this, i.e. catching this case during execution. And keeping in mind this part:
|
* Starts with setting up CL in E2E tests * Progress on beacon network * Attempts to get script to work * Simple example for debugging * Finally got the lodestar dependencies to import. * Update script with latest `next` code. * Fix script. * Progress on private beacon net * Adds jwt token * Lodestar setup script * Updated dependencies * Try the local setup again. * Beacon node local testnet function added * Cleanup * Fixes * Config for local net * Adds constants * Cleans up errors so main problem is evident * Pallet config constant progress * Try something else * Swap constants based on config. * Cleanup * Revert to usize for bitvector. * Revert changes. * Revert unnecessary changes. * Local beacon net testing fixes * Temp config update to test minimal config * Testing minimal spec * Update API endpoints and use public Lodestar Ropsten server for start-services. * Remove echo. * Adds config replacement for beacon endpoint * Finishing off local beacon testnet * Last bit of cleanup * Final tweaks * PR comments * Reverts config Co-authored-by: claravanstaden <Cats 4 life!>
@s0me0ne-unkn0wn Copying my concern from #2871: This seemed sound because we now do runtime construction in prechecking. However, I realized that the exec params might have changed between prechecking and execution, and if they changed in such a way that the runtime construction now fails, abstaining would stall finality. We don't currently have any exec params that are applicable to runtime construction, but we might in the future? |
@mrcnski Well, on the one hand, it's a valid concern; on the other hand, we talked a lot of times that a new set of executor params should never be more restrictive than the old one was. Currently, we don't have a mechanism to unsee changes in executor params. If a new Wasm extension has been enabled, it's enabled forever; if a new stack size has been enforced, it will never fall down again, it can only grow further (btw, it's a good idea to enforce such things in executor params consistency checks). And we discussed those measures literally because we don't want some PVF already on-chain to cease compiling/instantiating/executing with a new set of params. In that sense, we're okay here. Unless that rule is broken, nothing bad can happen. If it's broken, well... If the network wants to commit suicide... You know. |
Thanks! I understand about |
I believe if we ever decide to change something internal, like the instantiation strategy, we should only do that after checking all the on-chain PVFs to survive that change (AFAIR @ordian was developing some tooling for that). |
ISSUE
Overview
sc_executor_common::error::Error::RuntimeConstruction
should theoretically only happen due to some local issue, and not due to the PVF itself. Therefore when it arises during execution we may want to treat it as internal (i.e. not vote against -- possibly raising a dispute -- as we do right now). We just have to ensure thatRuntimeConstruction
is never constructed due to an issue with the PVF, and document it as such on the Substrate side.We may be able to treat some of the other error variants the same way. This would help to prevent disputes due to local issues.
Important note 1
Credit to @eskimor for this section.
In general, for errors not raising a dispute we have to very careful. This is only sound, if we either:
Reasoning: Otherwise it would be possible to register a PVF where candidates can not be checked, but we don't get a dispute - so nobody gets punished. Second we end up with a finality stall that is not going to resolve!
Important note 2
See #661 (comment).
Related issue
Somewhat related issue about
WasmError::Other
variant (which gets turned intoError::RuntimeConstruction
):paritytech/substrate#13853
The text was updated successfully, but these errors were encountered: