Check artifact integrity before execution#8833
Conversation
AndreiEres
left a comment
There was a problem hiding this comment.
If we come across a corrupted artifact, we should prepare it again. Can it be a possible vulnerability, @s0me0ne-unkn0wn?
| unistd::{ForkResult, Pid}, | ||
| }; | ||
| use polkadot_node_core_pvf_common::{ | ||
| executor_interface::{prepare, prevalidate}, |
There was a problem hiding this comment.
It looks messy, but I just merged two imports of polkadot_node_core_pvf_common. In fact, only compute_checksum is newly imported.
This reverts commit e4afb12.
|
|
||
| /// Compute the checksum of the given artifact. | ||
| pub fn compute_checksum(data: &[u8]) -> ArtifactChecksum { | ||
| blake3::hash(data).into() |
There was a problem hiding this comment.
Should we switch to twox?
There was a problem hiding this comment.
I personally don't have any preference here. The blake3's throughput is more than enough for us, so why wouldn't we use it (especially given that we're already using it).
There was a problem hiding this comment.
I checked Twox; it seems much faster, so I decided to switch eventually.
|
/cmd prdoc --audience node_dev --bump patch |
…e_dev --bump patch'
alexggh
left a comment
There was a problem hiding this comment.
Looks good to me.
I added, some comments, I would also be interested if this retry path we have is ever tested with an integration test.
| ) | ||
| })?; | ||
|
|
||
| if artifact_checksum != compute_checksum(&compiled_artifact_blob) { |
There was a problem hiding this comment.
How much does this take for 10MiB, 100MiB ?
There was a problem hiding this comment.
Blake3's throughput is ~3Gb/sec on what is close to our reference hw AFAIR
There was a problem hiding this comment.
According to the crate's benchmark data, 10 MiB with Blake3 takes 1-2 ms. Twox should be at least 3x faster.
There was a problem hiding this comment.
Ok, so we are not worried about this eating up too much time.
s0me0ne-unkn0wn
left a comment
There was a problem hiding this comment.
Looks good, left some comments but none of them are blockers!
| Ok(buf) | ||
| } | ||
|
|
||
| pub type ArtifactChecksum = [u8; 32]; |
There was a problem hiding this comment.
nit: In other places of code, we're very idiomatic and usually go with
#[repr(transparent)]
pub struct ArtifactChecksum(H256)With the following AsRef implementations, if needed, etc. I do not insist it should be implemented like that in this very case, it just seems to be one of our "best practices".
|
|
||
| /// Compute the checksum of the given artifact. | ||
| pub fn compute_checksum(data: &[u8]) -> ArtifactChecksum { | ||
| blake3::hash(data).into() |
There was a problem hiding this comment.
I personally don't have any preference here. The blake3's throughput is more than enough for us, so why wouldn't we use it (especially given that we're already using it).
| Ok((pvd, pov, execution_timeout)) | ||
|
|
||
| let artifact_checksum = framed_recv_blocking(stream)?; | ||
| let artifact_checksum = |
There was a problem hiding this comment.
I'm NOT encouraging to change this right away, but...
- Why do we want to encode a raw 32-byte sequence? Why not transfer it as a raw 32-byte sequence?
- If we ought to encode, why don't we encode the entire tuple and do one-by-one instead?
Maybe a good candidate for a refactoring issue? I bet single recv() and single decode() are somewhat more performant than one-by-ones.
There was a problem hiding this comment.
Let's do it in another pr
|
All GitHub workflows were cancelled due to failure one of the required jobs. |
|
@AndreiEres Can we backport this in stable2506 ? I see no reason to wait until 2509. |
|
Successfully created backport PR for |
Fixes #677 Fixes #2399 # Description To detect potential corruption of PVF artifacts on disk, we store their checksums and verify if they match before execution. In case of a mismatch, we recreate the artifact. ## Integration In Candidate Validation, we treat the error similarly to PossiblyInvalidError::RuntimeConstruction due to their close nature. ## Review Notes The Black3 hashing algorithm has already been used. I believe we can switch to twox, as suggested in the issue, because the checksum does not need to be cryptographically hashed, and we do not reveal the checksum in logs. --------- Co-authored-by: cmd[bot] <41898282+github-actions[bot]@users.noreply.github.com> (cherry picked from commit 310e81d)
Backport #8833 into `stable2506` from AndreiEres. This backport includes a major version bump due to internal API changes that only affect the polkadot binary. Since stable2506 hasn’t been released yet and no other downstream users are impacted, the change is considered safe. See the [documentation](https://github.com/paritytech/polkadot-sdk/blob/master/docs/BACKPORT.md) on how to use this bot. <!-- # To be used by other automation, do not modify: original-pr-number: #${pull_number} --> Co-authored-by: Andrei Eres <eresav@me.com> Co-authored-by: cmd[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Fixes #677 Fixes #2399 # Description To detect potential corruption of PVF artifacts on disk, we store their checksums and verify if they match before execution. In case of a mismatch, we recreate the artifact. ## Integration In Candidate Validation, we treat the error similarly to PossiblyInvalidError::RuntimeConstruction due to their close nature. ## Review Notes The Black3 hashing algorithm has already been used. I believe we can switch to twox, as suggested in the issue, because the checksum does not need to be cryptographically hashed, and we do not reveal the checksum in logs. --------- Co-authored-by: cmd[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Fixes #677
Fixes #2399
Description
To detect potential corruption of PVF artifacts on disk, we store their checksums and verify if they match before execution. In case of a mismatch, we recreate the artifact.
Integration
In Candidate Validation, we treat the error similarly to PossiblyInvalidError::RuntimeConstruction due to their close nature.
Review Notes
The Black3 hashing algorithm has already been used. I believe we can switch to twox, as suggested in the issue, because the checksum does not need to be cryptographically hashed, and we do not reveal the checksum in logs.