Skip to content

[Fix] Check certificate validity early.#3537

Merged
vicsn merged 5 commits intoProvableHQ:stagingfrom
acoglio:fix-cert-fetch
Mar 26, 2025
Merged

[Fix] Check certificate validity early.#3537
vicsn merged 5 commits intoProvableHQ:stagingfrom
acoglio:fix-cert-fetch

Conversation

@acoglio
Copy link
Contributor

@acoglio acoglio commented Mar 12, 2025

Before recursively fetching previous certificates.

Closes #2935.

Since it looks like we had working code to run the exploit, it would be good to revive it and make sure it no longer crashes the validator.

Before recursively fetching previous certificates.

Closes ProvableHQ#2935.
@vicsn vicsn requested review from kaimast and kpandl March 12, 2025 20:39
Copy link
Contributor

@kaimast kaimast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

@acoglio acoglio requested a review from raychu86 March 15, 2025 18:36
@vicsn
Copy link
Collaborator

vicsn commented Mar 20, 2025

Asked Kai some questions in DM about testing this.

@kaimast
Copy link
Contributor

kaimast commented Mar 21, 2025

Quick update: I was able to reproduce the attack. I will verify that the attack does not work anymore with the proposed changes soon.

Here are some observations I made:

  1. There are already some checks in place that reject batches that contain the author as one of the endorsers. The original attack code does that because a certificate needs at least one signer. But this can be circumvented by having a random other address or a second malicious validator sign the batch. There is no way to verify the signer is valid before retrieving the full certificate chain, unless I am missing something.
  2. SnarkVM also rejects batches that have timestamps too close together. The original attack set the same timestamp for all fake certificates, and correct nodes will reject these.
  3. What seems to work is setting timestamps in the future. Nodes probably should reject batches that are more than a second in the future. On my branch, nodes seem to (at least initially) accept batches that are multiple minutes in the future.

Copy link
Contributor

@kaimast kaimast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could verify the fix works (yay!), but, now that I have a better understanding of the code, I also left a few more notes on the PR.

To verify, this is the error message I get when honest nodes receive the attack:

Cannot store a certificate from '127.0.0.1:5004' - Certificate '2671926863586933361463317741031072687577528356414354042732451240158089381291field' for round 5 does not meet quorum requirements

Another thing we should consider, not in this PR but sometime in the future, is removing the recursion entirely, because there could still be valid cases where you receive a fairly long certificate chain. Alternatively, we could ensure that the number of recursions always stays below some small bound.
For example, it might be more elegant to have a data structure that maps certificate ID to unprocessed certificates that depend on it. Whenever we accept a batch, we check that mapping for certificates to process next.

@acoglio
Copy link
Contributor Author

acoglio commented Mar 21, 2025

To verify, this is the error message I get when honest nodes receive the attack:

Cannot store a certificate from '127.0.0.1:5004' - Certificate '2671926863586933361463317741031072687577528356414354042732451240158089381291field' for round 5 does not meet quorum requirements

Nice! That's indeed the error I was expecting to happen: the certificate doesn't have enough signatures. Under the correct super-majority assumption, the validator would be unable to obtain enough signatures.

@kaimast
Copy link
Contributor

kaimast commented Mar 25, 2025

I have a branch here that adds unit tests, but it requires a small change to snarkVM.

We could merge this PR now and I will add the tests with another PR. I tested it extensively, and it should land in staging sooner than later, considering it prevents a possible attack.

@vicsn vicsn merged commit f50937e into ProvableHQ:staging Mar 26, 2025
2 checks passed
@acoglio acoglio deleted the fix-cert-fetch branch March 26, 2025 14:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] A malicious validator can broadcast invalid BatchCertificate or Propose that cause other validators stack overflow

4 participants