snapshot: Airgap TransctionError in the status cache#7559
snapshot: Airgap TransctionError in the status cache#7559joncinque merged 5 commits intoanza-xyz:masterfrom
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #7559 +/- ##
=========================================
- Coverage 83.4% 83.4% -0.1%
=========================================
Files 811 811
Lines 365014 365271 +257
=========================================
+ Hits 304586 304661 +75
- Misses 60428 60610 +182 🚀 New features to boost your workflow:
|
|
I merged #7556 to get master functional again & so we can take our time on proper review for this one without more canaries / teammates hitting the snapshot incompatibility |
7ce45cc to
4c09818
Compare
brooksprumo
left a comment
There was a problem hiding this comment.
With these copies/clones and collects, how's the perf of the changes? Mainly the times for doing the status cache serialization. I wouldn't expect a big change, but I do think it'd be good to know.
| type SnapshotStatus<T> = HashMap<Hash, (usize, Vec<(status_cache::KeySlice, T)>)>; | ||
| type SnapshotSlotDelta<T> = (Slot, bool, SnapshotStatus<T>); | ||
| type SnapshotBankSlotDelta = SnapshotSlotDelta<Result<(), SnapshotTransactionError>>; |
There was a problem hiding this comment.
For these airgap types, we use the name Serde (vs Snapshot here). Not a blocker for this PR, as we can change it after the fact. More just making a note for myself.
There was a problem hiding this comment.
Ah darn sorry, I tried to follow the conventions that I saw
| derive(AbiExample, AbiEnumVisitor) | ||
| )] | ||
| #[derive(Debug, PartialEq, Eq, Clone, Serialize, Deserialize)] | ||
| enum SnapshotTransactionError { |
There was a problem hiding this comment.
nit: We use the prefix "Serde" for all these snapshot-airgap-like types. I'd recommend we do that here too.
We could also merge this PR as-is and then do the rename separately if there are no other changes need, thus avoiding a CI run.
| .map(|slot_delta| { | ||
| let status_map = slot_delta.2.lock().unwrap(); | ||
| let snapshot_status_map = status_map | ||
| .iter() | ||
| .map(|(key, value)| { | ||
| ( | ||
| *key, | ||
| ( | ||
| value.0, | ||
| value | ||
| .1 | ||
| .iter() | ||
| .map(|(key_slice, result)| { | ||
| ( | ||
| *key_slice, | ||
| result.clone().map_err(SnapshotTransactionError::from), | ||
| ) | ||
| }) | ||
| .collect::<Vec<_>>(), | ||
| ), | ||
| ) | ||
| }) | ||
| .collect::<HashMap<_, _>>(); | ||
| (slot_delta.0, slot_delta.1, snapshot_status_map) | ||
| }) |
There was a problem hiding this comment.
I pulled down the code so that I could actually understand all this. It works! IMO it's not particularly easy to read. Not something that requires a change, but something to note. Maybe we simplify in a subsequent PR.
There was a problem hiding this comment.
Yeah, these types are awful to work with. If you have any suggestions to make it simpler, I'm all for it!
Are there any nodes running this change already? If yes, I can check the metrics. |
Given that it is unmerged, none of the "managed fleet" is. I do have this running on a node right now tho; identity is |
brooksprumo
left a comment
There was a problem hiding this comment.
![]()
Approving. This unblocks updating the SDK to v3 within agave. We can improve this code as needed in subsequent PRs.
I updated I also updated the original bench to: On my dev machine, here's master: Here's this PR: So looks like an extra 5%-10% for just writing the status cache. |
|
Well looks like I got an approval in the interim, so I'll merge this and put in a PR for the changes you suggested. Do you want me to include the updated benches? |
I didn't even know we had a bench for serializing the status cache, hah! The "max" version seems useful. Switching to use |
* snapshot: Rename status cache types, add bench #### Problem As pointed out #7559, the snapshot types were incorrectly named, and there's a bench missing for the specific serialize logic. #### Summary of changes Rename everything from `Snapshot*` to `Serde*`. Add a bench for serializing a max status cache to a file. * Update frozen abi

Problem
The current tip of master fails to deserialize the status cache because
InstructionError::BorshIoErrorno longer contains a string value.Summary of Changes
Gap the SDK's types from the snapshot's types by introducing
SnapshotTransactionErrorandSnapshotInstructionError. Note that the conversion requires some clones, so I'm not sure if this will be a bottleneck during snapshot generation.Long term, the plan should be to remove the TransactionError from snapshots in the status cache, but we can do that in future work.
Addresses #6457.