-
Notifications
You must be signed in to change notification settings - Fork 782
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PVF: Research the use of CPU virtualization for PVF execution #652
Comments
Sounds like this could be easier than using seccomp? |
No, it is not. Hardware virtualization provides far better security than any other software sandboxing technology. AWS , GCP and other cloud providers are using KVM to virtualize their compute and memory. |
Building a PoC would help to understand any shortcomings in the context our very specific usecase. We don't even need device emulation or a real VM, just a single virutalized CPU with a few memory regions. We could easily fork https://github.com/firecracker-microvm/firecracker and strip it down to our basic needs. |
Yeah for sure, I know :D What I meant is that doing this is much better then trying to predict what kind of syscalls we are doing and prohibiting the others. I'm still afraid that this will fail at some point when we oversee some syscall. |
Yes, I am also afraid of stalling the chain because we did not add a syscall to whitelist. |
That's definitely a concern. I've been putting in a lot of work to mitigate that. Using @koute's script at build-time should mostly prevent even building a binary that contains disallowed syscalls. (In practice though, the syscalls that are actually used by the worker threads are quite few (about a dozen).) But yeah, we want additional measures because just blocking syscalls is not enough -- if an attacker can break out of the WASM sandbox they probably can also get out-of-bounds memory, which essentially gives them a source of randomness. So say they make the worker job vote against with 50% chance, that would stall the chain. |
Note that AFAIK seccomp should essentially work out-of-box everywhere while KVM might require some extra setup from the users (e.g. some distros disallow access to KVM and require the user to be added to a special But you're right that hardware virtualization would be technically more secure. It should be worth it to make a proof of concept and test it out in practice.
Hmm... preventing the attacker from acquiring a source of randomness is going to be tricky. In presence of remote code execution it's not possible to disallow access to a source of randomness. Even if you disallow things like creating threads or measuring time through seccomp (although IIRC grabbing the time on amd64 goes through the vdso shim, so it might not be even possible to sandbox that with seccomp as no syscalls are involved) you still have e.g. the |
AFAIK it is widely supported, and if it is not the validator can run without it. Indeed it requires some extra setup, but this is a small price to pay for the increased security. It could easily be part of the validator setup guide.
Yeah, that is something we should pursue !
What we would need is to develop and maintain a CPU template that customizes what we expose in CPUID and allow. An example, which is for general purpose VMs can be seen here: https://github.com/firecracker-microvm/firecracker/tree/main/src/vmm/src/cpuid . AFAIK this was not really supported on aarch64, but things might have changed. In our case we would have something very restrictive, or maybe we want to enable some CPU instruction set extensions that WASM can use for increased performance. |
@mrcnski I would highly recommend that this is done before moving forward with the seccomp implementation. |
Yep. But I think we still can first do the work of splitting the worker into a separate binary (and stripping it as much as possible) without necessarily sandboxing it yet, as that will be necessary regardless of which approach we pick. |
Agreed. I already have much of the seccomp logging implemented, so IMO it makes sense to finish that before a big context switch. And then the logging can run on validators for a while while I work on virtualization. And yeah, I will first split out the worker binaries (without musl-builder for now).1 Footnotes
|
Okay ty! |
IMO virtualization PoC is a big rock to push uphill. We should shoud first enable the logging to at least collect some data while we work on the PoC. |
This issue has been mentioned on Polkadot Forum. There might be relevant details there: https://forum.polkadot.network/t/ux-of-distributing-multiple-binaries-take-2/2854/1 |
I like this idea and I think it'll provide far superior than any process-level sandboxing indeed 👍🏻 but as Andrei said, it'll be challenging.
Actually, even if a given CPUID template is set (which doesn't advertise a certain instruction), a guest program can still try to call that instruction (and it'll work). CPUID only helps with non-malicious guests, to provide a common view of the CPU features to all guests. |
* Unify rpc api naming Signed-off-by: koushiro <[email protected]> * Add some docs Signed-off-by: koushiro <[email protected]> * Make test more stable Signed-off-by: koushiro <[email protected]>
For an improved security posture, we should consider running the PVF on a KVM virtualized CPU. As I/O is not required and communication with the host can happen via a memory mapped region there shouldn't be any performance degradation even in the case of nested virtualization.
The text was updated successfully, but these errors were encountered: