add votor-messages#7895
Conversation
| feature = "frozen-abi", | ||
| derive(AbiExample), | ||
| frozen_abi(digest = "G8Nrx3sMYdnLpHsCNark3BGA58BmW2sqNnqjkYhQHtN") | ||
| )] |
There was a problem hiding this comment.
If we need frozen-abi, maybe we should do it for ConsensusMessage as well, otherwise let's remove all of them.
There was a problem hiding this comment.
can address separately but it might be a good idea to version ConsensusMessage for future proofing
There was a problem hiding this comment.
@alexpyattaev is submitting TLV crate to do versioning https://github.com/anza-xyz/agave/pull/7694/files/43d7d744ad59158b59d9efd209a501c2eff4bd14..479b657b9d5fea4ccc2196eebc719cf470a296f9
There was a problem hiding this comment.
Might be a good idea to version Vote as well.
There was a problem hiding this comment.
Yes please! Gossip is a mess because it is essentially impossible to deprecate stuff from parser without breaking abi.
There was a problem hiding this comment.
btw, I'm happy to use this PR as a landing spot for this versioning design discussion (i.e. can up-level from precise code semantics for a second). Don't need to approve the PR w/ haste.
Current version:
- keeps ABI versioning
- renames
ConsensusMessageenum -->Message - Introduces
ConsensusMessagestruct that wrapsMessageand has major/minor version fields - Test to simulate some future unknown message that passes/fails deserialization
Some questions:
- Do we need/want to freeze the ABI if we have versioning?
- Are we aligned on just versioning the top level struct?
- How big of sticklers do we want to be on rolling version for breaking changes in the near future (before we're running any of this on persistent clusters)?
There was a problem hiding this comment.
- I suppose if we have versioning from TLV we can do without ABI digest, now we don't write votes and certs into snapshots etc. As long as we have reviewers enforce that any data structure changes bump version number and do a full rollout.
- Yes, I vote for versioning the top level struct. I'd love to do "if you know this version x.y, then you know the whole data structure". I don't want it to happen that you accepted a message from wire but figures out you only understand part of it. If things work with only part of the data structure then this means you are probably wrongly combining unrelated stuff in the same data structure. In the future we can pack multiple data structures in the same packet if needed.
- I suppose before our real launch on testnet these can change as much as we want, because on test clusters it's not really public, and we don't save it into existing snapshots etc at all.
There was a problem hiding this comment.
I'd prefer keeping the ABI just because it creates a test that will fail if the format is accidentally changed. AFAIK it doesn't add any overhead, when we add a new version we can update the digest.
And I'm inclined to just not version for now and once TLV lands update this rather than hand rolling our own versioning scheme. Since we're not on testnet yet we can play loose with format updates, but we'll ensure we have versioning before launch.
There was a problem hiding this comment.
Versioning with current impl is better than no versioning, but I believe your Message field should be Vec and then deserialized if parser for relevant version is available. See also comment on the test you have added. To avoid double deserialization costs, TLV crate can be used. One can trivially enforce that all fields are parsed in TLV packet if desired.
There was a problem hiding this comment.
I find myself agreeing with Ashwin:
Keep the ABI digest as a hard reminder that when we change things here, we MUST update the parsers and versions and such. I can add a comment to that effect.
I'm leaning towards removing the naive versioning so that we:
- Don't dupe ourselves into thinking we have a proper solution here
- Don't invest resources in putting lipstick on this pig
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #7895 +/- ##
=========================================
- Coverage 83.1% 83.1% -0.1%
=========================================
Files 810 812 +2
Lines 357536 357698 +162
=========================================
+ Hits 297244 297282 +38
- Misses 60292 60416 +124 🚀 New features to boost your workflow:
|
| feature = "frozen-abi", | ||
| derive(AbiExample), | ||
| frozen_abi(digest = "G8Nrx3sMYdnLpHsCNark3BGA58BmW2sqNnqjkYhQHtN") | ||
| )] |
There was a problem hiding this comment.
Might be a good idea to version Vote as well.
| #[derive(Clone, Copy, Debug, PartialEq, Serialize, Deserialize)] | ||
| pub enum Vote { | ||
| /// A notarization vote | ||
| Notarize(NotarizationVote), |
There was a problem hiding this comment.
Can we pleeeease not step on this landmine again and use TLV instead of a enum here? This will inevitably turn into maintenance nightmare the moment we will want to change format of any of these messages, same as in gossip.
There was a problem hiding this comment.
ser land #7694 and it shall be done.
I think it's best to do this on the top level message ConsensusMessage and leave the inner enums unversioned.
There was a problem hiding this comment.
+1, I think do TLV on the top level message is probably better.
| }) | ||
| #[derive(Clone, Debug, PartialEq, Serialize, Deserialize)] | ||
| pub enum MessageUnknown { | ||
| /// Vote message, with the vote and the rank of the validator. |
There was a problem hiding this comment.
I am not convinced the test is faithful since Unknown message is the same one as known one, i.e. it will pass deserialize no matter what. To properly test, we need to include dynamic length fields such that "old" deserialize impl would get confused when faced with "new" message.
There was a problem hiding this comment.
The difference is the Unknown Message type
That's how we get deserialization to fail here:
https://github.com/anza-xyz/agave/pull/7895/files#diff-3c82b38f403b9a6bc7ac9073ff52760f7d5265fe8e84db6c57f29191ffa2dc3cR263
There was a problem hiding this comment.
Fully acknowledge that we could add a million more tests to ensure versioning works as expected (e.g. adding mock VoteV2, CertificateV2, etc.).
I'm hesitant to roll all of that into this PR until we have consensus on what we want the versioning strategy to look like. Seems like we want to just wait for TLV and scrap the handroll
There was a problem hiding this comment.
The difference is the
UnknownMessage typeThat's how we get deserialization to fail here: https://github.com/anza-xyz/agave/pull/7895/files#diff-3c82b38f403b9a6bc7ac9073ff52760f7d5265fe8e84db6c57f29191ffa2dc3cR263
Yes and if I declare
enum Msg_v1{
VariantA(Vec<u64>)
}
enum Msg_v2{
VariantA([u64; 4])
}then if parser for v1 encounters a v2 message containing huge number, it would fail to deserialize completely (since it would trigger out-of-bounds read).
|
FYI the TLV PR #7694 is here ready for review. |
Problem
votor-messagesdefines Certificate and Vote message types that are used both internally and sent across the wire (post serialization). These primitives are used all over for creating and parsing alpenglow consensus messages. This is a necessary prerequisite to bringing thevotorcrate into AgaveSee #7802
Summary of Changes
Bring over
votor-messagescrate from Alpenglow repo