add votor-messages by bw-solana · Pull Request #7895 · anza-xyz/agave

bw-solana · 2025-09-04T20:43:02Z

Problem

votor-messages defines Certificate and Vote message types that are used both internally and sent across the wire (post serialization). These primitives are used all over for creating and parsing alpenglow consensus messages. This is a necessary prerequisite to bringing the votor crate into Agave

See #7802

Summary of Changes

Bring over votor-messages crate from Alpenglow repo

wen-coding · 2025-09-04T21:51:23Z

+    feature = "frozen-abi",
+    derive(AbiExample),
+    frozen_abi(digest = "G8Nrx3sMYdnLpHsCNark3BGA58BmW2sqNnqjkYhQHtN")
+)]


If we need frozen-abi, maybe we should do it for ConsensusMessage as well, otherwise let's remove all of them.

can address separately but it might be a good idea to version ConsensusMessage for future proofing

@alexpyattaev is submitting TLV crate to do versioning https://github.com/anza-xyz/agave/pull/7694/files/43d7d744ad59158b59d9efd209a501c2eff4bd14..479b657b9d5fea4ccc2196eebc719cf470a296f9

Might be a good idea to version Vote as well.

Yes please! Gossip is a mess because it is essentially impossible to deprecate stuff from parser without breaking abi.

btw, I'm happy to use this PR as a landing spot for this versioning design discussion (i.e. can up-level from precise code semantics for a second). Don't need to approve the PR w/ haste.

Current version:

keeps ABI versioning

renames ConsensusMessage enum --> Message

Introduces ConsensusMessage struct that wraps Message and has major/minor version fields

Test to simulate some future unknown message that passes/fails deserialization

Some questions:

Do we need/want to freeze the ABI if we have versioning?

Are we aligned on just versioning the top level struct?

How big of sticklers do we want to be on rolling version for breaking changes in the near future (before we're running any of this on persistent clusters)?

I suppose if we have versioning from TLV we can do without ABI digest, now we don't write votes and certs into snapshots etc. As long as we have reviewers enforce that any data structure changes bump version number and do a full rollout.

Yes, I vote for versioning the top level struct. I'd love to do "if you know this version x.y, then you know the whole data structure". I don't want it to happen that you accepted a message from wire but figures out you only understand part of it. If things work with only part of the data structure then this means you are probably wrongly combining unrelated stuff in the same data structure. In the future we can pack multiple data structures in the same packet if needed.

I suppose before our real launch on testnet these can change as much as we want, because on test clusters it's not really public, and we don't save it into existing snapshots etc at all.

I'd prefer keeping the ABI just because it creates a test that will fail if the format is accidentally changed. AFAIK it doesn't add any overhead, when we add a new version we can update the digest.

And I'm inclined to just not version for now and once TLV lands update this rather than hand rolling our own versioning scheme. Since we're not on testnet yet we can play loose with format updates, but we'll ensure we have versioning before launch.

Versioning with current impl is better than no versioning, but I believe your Message field should be Vec and then deserialized if parser for relevant version is available. See also comment on the test you have added. To avoid double deserialization costs, TLV crate can be used. One can trivially enforce that all fields are parsed in TLV packet if desired.

I find myself agreeing with Ashwin:

Keep the ABI digest as a hard reminder that when we change things here, we MUST update the parsers and versions and such. I can add a comment to that effect.

I'm leaning towards removing the naive versioning so that we:

Don't dupe ourselves into thinking we have a proper solution here

Don't invest resources in putting lipstick on this pig

codecov-commenter · 2025-09-04T22:56:53Z

Codecov Report

❌ Patch coverage is 0% with 162 lines in your changes missing coverage. Please review.
✅ Project coverage is 83.1%. Comparing base (9256b81) to head (f78a576).
⚠️ Report is 37 commits behind head on master.

Additional details and impacted files

@@            Coverage Diff            @@
##           master    #7895     +/-   ##
=========================================
- Coverage    83.1%    83.1%   -0.1%     
=========================================
  Files         810      812      +2     
  Lines      357536   357698    +162     
=========================================
+ Hits       297244   297282     +38     
- Misses      60292    60416    +124

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

ksn6

LGTM after nits

ksn6 · 2025-09-05T03:40:08Z

+    feature = "frozen-abi",
+    derive(AbiExample),
+    frozen_abi(digest = "G8Nrx3sMYdnLpHsCNark3BGA58BmW2sqNnqjkYhQHtN")
+)]


Might be a good idea to version Vote as well.

alexpyattaev · 2025-09-05T04:46:40Z

+#[derive(Clone, Copy, Debug, PartialEq, Serialize, Deserialize)]
+pub enum Vote {
+    /// A notarization vote
+    Notarize(NotarizationVote),


Can we pleeeease not step on this landmine again and use TLV instead of a enum here? This will inevitably turn into maintenance nightmare the moment we will want to change format of any of these messages, same as in gossip.

ser land #7694 and it shall be done.
I think it's best to do this on the top level message ConsensusMessage and leave the inner enums unversioned.

+1, I think do TLV on the top level message is probably better.

alexpyattaev · 2025-09-06T08:16:05Z

-        })
+    #[derive(Clone, Debug, PartialEq, Serialize, Deserialize)]
+    pub enum MessageUnknown {
+        /// Vote message, with the vote and the rank of the validator.


I am not convinced the test is faithful since Unknown message is the same one as known one, i.e. it will pass deserialize no matter what. To properly test, we need to include dynamic length fields such that "old" deserialize impl would get confused when faced with "new" message.

The difference is the Unknown Message type

That's how we get deserialization to fail here:
https://github.com/anza-xyz/agave/pull/7895/files#diff-3c82b38f403b9a6bc7ac9073ff52760f7d5265fe8e84db6c57f29191ffa2dc3cR263

Fully acknowledge that we could add a million more tests to ensure versioning works as expected (e.g. adding mock VoteV2, CertificateV2, etc.).

I'm hesitant to roll all of that into this PR until we have consensus on what we want the versioning strategy to look like. Seems like we want to just wait for TLV and scrap the handroll

The difference is the Unknown Message type

That's how we get deserialization to fail here: https://github.com/anza-xyz/agave/pull/7895/files#diff-3c82b38f403b9a6bc7ac9073ff52760f7d5265fe8e84db6c57f29191ffa2dc3cR263

Yes and if I declare

enum Msg_v1{ VariantA(Vec<u64>) } enum Msg_v2{ VariantA([u64; 4]) }

then if parser for v1 encounters a v2 message containing huge number, it would fail to deserialize completely (since it would trigger out-of-bounds read).

alexpyattaev · 2025-09-08T15:12:19Z

FYI the TLV PR #7694 is here ready for review.

bw-solana added 5 commits September 4, 2025 20:42

add votor-messages

720b11f

update lib

b89aea2

clean up deps

961f435

fix ABI

063c374

sort

01ba090

bw-solana added this to Alpenglow Sep 4, 2025

bw-solana self-assigned this Sep 4, 2025

bw-solana moved this to In Progress in Alpenglow Sep 4, 2025

bw-solana marked this pull request as ready for review September 4, 2025 21:36

bw-solana requested review from AshwinSekar, ksn6 and wen-coding September 4, 2025 21:36

wen-coding reviewed Sep 4, 2025

View reviewed changes

ksn6 previously approved these changes Sep 5, 2025

View reviewed changes

alexpyattaev reviewed Sep 5, 2025

View reviewed changes

PR feedback

71632ff

bw-solana dismissed ksn6’s stale review via 71632ff September 5, 2025 16:31

wen-coding previously approved these changes Sep 5, 2025

View reviewed changes

fix ABI hash

3ef21fe

bw-solana dismissed wen-coding’s stale review via 3ef21fe September 5, 2025 16:51

alexpyattaev reviewed Sep 6, 2025

View reviewed changes

remove versioning

f78a576

AshwinSekar approved these changes Sep 10, 2025

View reviewed changes

bw-solana merged commit b4c1c2d into anza-xyz:master Sep 10, 2025
54 checks passed

github-project-automation Bot moved this from In Progress to Done in Alpenglow Sep 10, 2025

Conversation

bw-solana commented Sep 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Summary of Changes

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

codecov-commenter commented Sep 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

ksn6 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alexpyattaev Sep 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alexpyattaev commented Sep 8, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

bw-solana commented Sep 4, 2025 •

edited

Loading

codecov-commenter commented Sep 4, 2025 •

edited

Loading

alexpyattaev Sep 8, 2025 •

edited

Loading