Skip to content

add votor-messages#7895

Merged
bw-solana merged 8 commits intoanza-xyz:masterfrom
bw-solana:votor_messages
Sep 10, 2025
Merged

add votor-messages#7895
bw-solana merged 8 commits intoanza-xyz:masterfrom
bw-solana:votor_messages

Conversation

@bw-solana
Copy link
Copy Markdown

@bw-solana bw-solana commented Sep 4, 2025

Problem

votor-messages defines Certificate and Vote message types that are used both internally and sent across the wire (post serialization). These primitives are used all over for creating and parsing alpenglow consensus messages. This is a necessary prerequisite to bringing the votor crate into Agave

See #7802

Summary of Changes

Bring over votor-messages crate from Alpenglow repo

@bw-solana bw-solana self-assigned this Sep 4, 2025
@bw-solana bw-solana moved this to In Progress in Alpenglow Sep 4, 2025
@bw-solana bw-solana marked this pull request as ready for review September 4, 2025 21:36
Comment thread votor-messages/src/consensus_message.rs Outdated
Comment thread votor-messages/src/consensus_message.rs Outdated
feature = "frozen-abi",
derive(AbiExample),
frozen_abi(digest = "G8Nrx3sMYdnLpHsCNark3BGA58BmW2sqNnqjkYhQHtN")
)]
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we need frozen-abi, maybe we should do it for ConsensusMessage as well, otherwise let's remove all of them.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can address separately but it might be a good idea to version ConsensusMessage for future proofing

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be a good idea to version Vote as well.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes please! Gossip is a mess because it is essentially impossible to deprecate stuff from parser without breaking abi.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw, I'm happy to use this PR as a landing spot for this versioning design discussion (i.e. can up-level from precise code semantics for a second). Don't need to approve the PR w/ haste.

Current version:

  1. keeps ABI versioning
  2. renames ConsensusMessage enum --> Message
  3. Introduces ConsensusMessage struct that wraps Message and has major/minor version fields
  4. Test to simulate some future unknown message that passes/fails deserialization

Some questions:

  1. Do we need/want to freeze the ABI if we have versioning?
  2. Are we aligned on just versioning the top level struct?
  3. How big of sticklers do we want to be on rolling version for breaking changes in the near future (before we're running any of this on persistent clusters)?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. I suppose if we have versioning from TLV we can do without ABI digest, now we don't write votes and certs into snapshots etc. As long as we have reviewers enforce that any data structure changes bump version number and do a full rollout.
  2. Yes, I vote for versioning the top level struct. I'd love to do "if you know this version x.y, then you know the whole data structure". I don't want it to happen that you accepted a message from wire but figures out you only understand part of it. If things work with only part of the data structure then this means you are probably wrongly combining unrelated stuff in the same data structure. In the future we can pack multiple data structures in the same packet if needed.
  3. I suppose before our real launch on testnet these can change as much as we want, because on test clusters it's not really public, and we don't save it into existing snapshots etc at all.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer keeping the ABI just because it creates a test that will fail if the format is accidentally changed. AFAIK it doesn't add any overhead, when we add a new version we can update the digest.

And I'm inclined to just not version for now and once TLV lands update this rather than hand rolling our own versioning scheme. Since we're not on testnet yet we can play loose with format updates, but we'll ensure we have versioning before launch.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Versioning with current impl is better than no versioning, but I believe your Message field should be Vec and then deserialized if parser for relevant version is available. See also comment on the test you have added. To avoid double deserialization costs, TLV crate can be used. One can trivially enforce that all fields are parsed in TLV packet if desired.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find myself agreeing with Ashwin:

Keep the ABI digest as a hard reminder that when we change things here, we MUST update the parsers and versions and such. I can add a comment to that effect.

I'm leaning towards removing the naive versioning so that we:

  1. Don't dupe ourselves into thinking we have a proper solution here
  2. Don't invest resources in putting lipstick on this pig

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Sep 4, 2025

Codecov Report

❌ Patch coverage is 0% with 162 lines in your changes missing coverage. Please review.
✅ Project coverage is 83.1%. Comparing base (9256b81) to head (f78a576).
⚠️ Report is 37 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff            @@
##           master    #7895     +/-   ##
=========================================
- Coverage    83.1%    83.1%   -0.1%     
=========================================
  Files         810      812      +2     
  Lines      357536   357698    +162     
=========================================
+ Hits       297244   297282     +38     
- Misses      60292    60416    +124     
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

ksn6
ksn6 previously approved these changes Sep 5, 2025
Copy link
Copy Markdown

@ksn6 ksn6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM after nits

Comment thread votor-messages/src/consensus_message.rs Outdated
feature = "frozen-abi",
derive(AbiExample),
frozen_abi(digest = "G8Nrx3sMYdnLpHsCNark3BGA58BmW2sqNnqjkYhQHtN")
)]
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be a good idea to version Vote as well.

#[derive(Clone, Copy, Debug, PartialEq, Serialize, Deserialize)]
pub enum Vote {
/// A notarization vote
Notarize(NotarizationVote),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we pleeeease not step on this landmine again and use TLV instead of a enum here? This will inevitably turn into maintenance nightmare the moment we will want to change format of any of these messages, same as in gossip.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ser land #7694 and it shall be done.
I think it's best to do this on the top level message ConsensusMessage and leave the inner enums unversioned.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, I think do TLV on the top level message is probably better.

wen-coding
wen-coding previously approved these changes Sep 5, 2025
Comment thread votor-messages/src/consensus_message.rs Outdated
})
#[derive(Clone, Debug, PartialEq, Serialize, Deserialize)]
pub enum MessageUnknown {
/// Vote message, with the vote and the rank of the validator.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not convinced the test is faithful since Unknown message is the same one as known one, i.e. it will pass deserialize no matter what. To properly test, we need to include dynamic length fields such that "old" deserialize impl would get confused when faced with "new" message.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fully acknowledge that we could add a million more tests to ensure versioning works as expected (e.g. adding mock VoteV2, CertificateV2, etc.).

I'm hesitant to roll all of that into this PR until we have consensus on what we want the versioning strategy to look like. Seems like we want to just wait for TLV and scrap the handroll

Copy link
Copy Markdown

@alexpyattaev alexpyattaev Sep 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The difference is the Unknown Message type

That's how we get deserialization to fail here: https://github.com/anza-xyz/agave/pull/7895/files#diff-3c82b38f403b9a6bc7ac9073ff52760f7d5265fe8e84db6c57f29191ffa2dc3cR263

Yes and if I declare

enum Msg_v1{
VariantA(Vec<u64>)
}

enum Msg_v2{
VariantA([u64; 4])
}

then if parser for v1 encounters a v2 message containing huge number, it would fail to deserialize completely (since it would trigger out-of-bounds read).

@alexpyattaev
Copy link
Copy Markdown

FYI the TLV PR #7694 is here ready for review.

@bw-solana bw-solana merged commit b4c1c2d into anza-xyz:master Sep 10, 2025
54 checks passed
@github-project-automation github-project-automation Bot moved this from In Progress to Done in Alpenglow Sep 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

6 participants