PVF: Consider treating `RuntimeConstruction` as an internal execution error #661

mrcnski · 2023-04-10T16:39:07Z

ISSUE

Overview

sc_executor_common::error::Error::RuntimeConstruction should theoretically only happen due to some local issue, and not due to the PVF itself. Therefore when it arises during execution we may want to treat it as internal (i.e. not vote against -- possibly raising a dispute -- as we do right now). We just have to ensure that RuntimeConstruction is never constructed due to an issue with the PVF, and document it as such on the Substrate side.

We may be able to treat some of the other error variants the same way. This would help to prevent disputes due to local issues.

Important note 1

Credit to @eskimor for this section.

In general, for errors not raising a dispute we have to very careful. This is only sound, if we either:

Ruled out that error in pre-checking. If something is not checked in pre-checking, even if candidate independent we must raise a dispute.
We are 100% confident that it is a hardware/local issue: Like corrupted file, etc.

Reasoning: Otherwise it would be possible to register a PVF where candidates can not be checked, but we don't get a dispute - so nobody gets punished. Second we end up with a finality stall that is not going to resolve!

Important note 2

See #661 (comment).

Related issue

Somewhat related issue about WasmError::Other variant (which gets turned into Error::RuntimeConstruction):
paritytech/substrate#13853

The text was updated successfully, but these errors were encountered:

burdges · 2023-04-11T11:44:41Z

We'll no-show when this happens I guess. We should double check what happens when we run out of tranches though..

We run out of tranches when all validators are checking anyways, so de facto it should collapse into 2/3 votes at some point before this. We need roughly one no-show per tranche though, but each tranche consumes 2.25 of the checkers, ignoring the zeroth tranche, so we've exhausted like 1 / 2.25 0.4444... of the validator set as no-shows, so nothing could be approved anyways.

eagr · 2023-11-03T12:33:50Z

Try to make sense of this this. Please correct me if I'm mistaken.:) @mrcnski

We want to avoid raising WasmError for PVF issues on the Polkadot side, right? But we would want WasmError for related internal issues. Coz that's what turns into a RuntimeConstruction error.

polkadot-sdk/polkadot/node/core/pvf/common/src/executor_intf.rs

Lines 138 to 144 in 0d3c67d

    
           pub unsafe fn create_runtime_from_artifact_bytes( 
        
           	compiled_artifact_blob: &[u8], 
        
           	executor_params: &ExecutorParams, 
        
           ) -> Result<WasmtimeRuntime, WasmError> { 
        
           	let mut config = DEFAULT_CONFIG.clone(); 
        
           	config.semantics = 
        
           		params_to_wasmtime_semantics(executor_params).map_err(|err| WasmError::Other(err))?;

For the purpose, this one in particular shouldn't be a WasmError, correct?

mrcnski · 2023-11-05T12:51:31Z

Wow, good catch. It's not even a problem with the PVF, but with the executor parameters which come from on-chain. So if somehow there are invalid exec params on-chain, it would break here for everybody.

We do have some checks for exec parameter correctness (do we check that they can be converted to wasmtime semantics?), and beside that there's not much we can do - in the words of @s0me0ne-unkn0wn, if the network wants to commit suicide, it will find a way. So we're kind of borked here anyway, so doesn't make too much difference how we handle this, but we might as well raise a sensible error. Raising an internal error here doesn't make sense because it's not a local issue, but an issue with the chain. Maybe even panicking is appropriate, because it seems like we can only get to this state with a suicided chain (game over anyway), or some developer bug.

We want to avoid raising WasmError for PVF issues on the Polkadot side, right? But we would want WasmError for related internal issues. Coz that's what turns into a RuntimeConstruction error.

Yes, exactly! 🚀

eagr · 2023-11-05T15:21:13Z

Maybe even panicking is appropriate, because it seems like we can only get to this state with a suicided chain (game over anyway), or some developer bug.

I see. This is much more serious that I thought. I think it justifies to panick either way.

if the network wants to commit suicide, it will find a way.

Nicely said. 🤣

s0me0ne-unkn0wn · 2023-11-05T16:24:08Z

Note that currently, params_to_wasmtime_semantics() may only return an error if there's a serious bug in the code (the default stack limit is None in the DEFAULT_CONFIG, although the comment at that constant explicitly prohibits changing anything). So I believe it's okay to panic!(), or unreachable!(), or whatever. But I'd go for changing .map_err() to .expect() tbh, it seems to me it's exactly as it should be.

eagr · 2023-11-05T18:34:56Z

Or, we panick in params_to_wasmtime_semantics()? I don't see a good reason not to exit early. Is there?

s0me0ne-unkn0wn · 2023-11-05T18:56:35Z

Yes, sounds good, and in that case, params_to_wasmtime_semantics() should always return plain Semantics.

mrcnski · 2023-11-19T18:52:17Z

Per the original issue, we now check for runtime construction errors during prechecking, so it is sound to treat this as an internal error in execution. We just have to make sure to do the runtime construction outside of the execution job process. Since that process runs untrusted code, malicious code can return any error they want, so we can't treat any error returned by that process as an internal/local error.

eagr · 2023-11-20T06:02:02Z

After #2406, what's left to be done?

mrcnski · 2023-11-20T10:52:28Z

Hey @eagr! I think just implementing this, i.e. catching this case during execution. And keeping in mind this part:

it is sound to treat this as an internal error in execution. We just have to make sure to do the runtime construction outside of the execution job process.

* Starts with setting up CL in E2E tests * Progress on beacon network * Attempts to get script to work * Simple example for debugging * Finally got the lodestar dependencies to import. * Update script with latest `next` code. * Fix script. * Progress on private beacon net * Adds jwt token * Lodestar setup script * Updated dependencies * Try the local setup again. * Beacon node local testnet function added * Cleanup * Fixes * Config for local net * Adds constants * Cleans up errors so main problem is evident * Pallet config constant progress * Try something else * Swap constants based on config. * Cleanup * Revert to usize for bitvector. * Revert changes. * Revert unnecessary changes. * Local beacon net testing fixes * Temp config update to test minimal config * Testing minimal spec * Update API endpoints and use public Lodestar Ropsten server for start-services. * Remove echo. * Adds config replacement for beacon endpoint * Finishing off local beacon testnet * Last bit of cleanup * Final tweaks * PR comments * Reverts config Co-authored-by: claravanstaden <Cats 4 life!>

mrcnski · 2024-01-09T18:58:56Z

@s0me0ne-unkn0wn Copying my concern from #2871:

This seemed sound because we now do runtime construction in prechecking. However, I realized that the exec params might have changed between prechecking and execution, and if they changed in such a way that the runtime construction now fails, abstaining would stall finality.

We don't currently have any exec params that are applicable to runtime construction, but we might in the future?

s0me0ne-unkn0wn · 2024-01-09T21:17:09Z

@mrcnski Well, on the one hand, it's a valid concern; on the other hand, we talked a lot of times that a new set of executor params should never be more restrictive than the old one was. Currently, we don't have a mechanism to unsee changes in executor params. If a new Wasm extension has been enabled, it's enabled forever; if a new stack size has been enforced, it will never fall down again, it can only grow further (btw, it's a good idea to enforce such things in executor params consistency checks). And we discussed those measures literally because we don't want some PVF already on-chain to cease compiling/instantiating/executing with a new set of params.

In that sense, we're okay here. Unless that rule is broken, nothing bad can happen. If it's broken, well... If the network wants to commit suicide... You know.

mrcnski · 2024-01-10T13:12:33Z

Thanks! I understand about more restrictive, but is e.g. changing the instantiation strategy disallowed? Perhaps we discover some awesome new instantiation strategy and want to enact it through exec params. But, if that can't happen then implementing this ticket should be sound.

s0me0ne-unkn0wn · 2024-01-10T15:54:25Z

I believe if we ever decide to change something internal, like the instantiation strategy, we should only do that after checking all the on-chain PVFs to survive that change (AFAIR @ordian was developing some tooling for that).

mrcnski added T4-parachains_engineering labels Apr 10, 2023

This was referenced Apr 10, 2023

PVF: Add a new class of "possibly invalid" errors #660

Closed

PVF: Vote invalid on panics in execution thread paritytech/polkadot#7045

Closed

mrcnski mentioned this issue Jun 2, 2023

PVF: add landlock sandboxing paritytech/polkadot#7303

Merged

3 tasks

Sophia-Gold transferred this issue from paritytech/polkadot Aug 24, 2023

the-right-joyce added C1-mentor A task where a mentor is available. Please indicate in the issue who the mentor could be. T8-parachains_engineering and removed J7-mentor labels Aug 25, 2023

the-right-joyce added this to parachains team board Oct 18, 2023

the-right-joyce moved this to Backlog in parachains team board Oct 18, 2023

eagr mentioned this issue Oct 20, 2023

Refactor ValidationError #1958

Closed

the-right-joyce removed the T8-parachains_engineering label Oct 23, 2023

eagr mentioned this issue Nov 20, 2023

Refactor ValidationError #2406

Merged

mrcnski mentioned this issue Jan 6, 2024

Disputes raised due to RuntimeConstruction error in pvf execution #2863

Closed

This was referenced Jan 29, 2024

PVF: Incorporate wasmtime version in worker version checks #2742

Closed

PVF: Consider re-preparing artifact on failed runtime construction #3139

Closed

ggwpez removed the C1-mentor A task where a mentor is available. Please indicate in the issue who the mentor could be. label Feb 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PVF: Consider treating `RuntimeConstruction` as an internal execution error #661

PVF: Consider treating `RuntimeConstruction` as an internal execution error #661

mrcnski commented Apr 10, 2023 •

edited

Loading

burdges commented Apr 11, 2023

eagr commented Nov 3, 2023 •

edited

Loading

mrcnski commented Nov 5, 2023

eagr commented Nov 5, 2023

s0me0ne-unkn0wn commented Nov 5, 2023

eagr commented Nov 5, 2023

s0me0ne-unkn0wn commented Nov 5, 2023

mrcnski commented Nov 19, 2023

eagr commented Nov 20, 2023

mrcnski commented Nov 20, 2023

mrcnski commented Jan 9, 2024

s0me0ne-unkn0wn commented Jan 9, 2024

mrcnski commented Jan 10, 2024

s0me0ne-unkn0wn commented Jan 10, 2024

PVF: Consider treating RuntimeConstruction as an internal execution error #661

PVF: Consider treating RuntimeConstruction as an internal execution error #661

Comments

mrcnski commented Apr 10, 2023 • edited Loading

ISSUE

Overview

Important note 1

Important note 2

Related issue

burdges commented Apr 11, 2023

eagr commented Nov 3, 2023 • edited Loading

mrcnski commented Nov 5, 2023

eagr commented Nov 5, 2023

s0me0ne-unkn0wn commented Nov 5, 2023

eagr commented Nov 5, 2023

s0me0ne-unkn0wn commented Nov 5, 2023

mrcnski commented Nov 19, 2023

eagr commented Nov 20, 2023

mrcnski commented Nov 20, 2023

mrcnski commented Jan 9, 2024

s0me0ne-unkn0wn commented Jan 9, 2024

mrcnski commented Jan 10, 2024

s0me0ne-unkn0wn commented Jan 10, 2024

PVF: Consider treating `RuntimeConstruction` as an internal execution error #661

PVF: Consider treating `RuntimeConstruction` as an internal execution error #661

mrcnski commented Apr 10, 2023 •

edited

Loading

eagr commented Nov 3, 2023 •

edited

Loading