Use events for adding and removing committees.#3432
Conversation
deuszx
left a comment
There was a problem hiding this comment.
First pass through the code. It looks good overall but I had some more general questions. I think the general approach is slightly hacky (i.e. looping through the storage in search for interesting events). It could be improved if we had an option to subscribe for specific event streams in the client.
I am not giving an approval yet - I want to see the answers and then will re-review it.
|
Closing this in favor of #3453. |
|
Reopening as a draft. I will mark it ready for review when #3453 is fully implemented. |
aa3299f to
016c387
Compare
…nsistencies. (#3489) ## Motivation Working on #3432 I found a few issues that can be fixed separately. ## Proposal * Move the operation inspection code from the two `published_blob_ids` implementations to `Operation` and deduplicate it. * Use the `select!` macro in the client listener. This will make it easier to add another stream, to listen for admin chain events. * Fix a typo where a blob ID was called `chain_id`. * Add a check that events don't get overwritten. Testing this will be complicated, and possibly require more SDK features: #3488 * Treat `stage_new_committee` like other client operations. The issue with conflicting blocks needs to be addressed for all operations anyway, and admin operations are always on a single-user chain in our case. #3172 * Use `stage_new_committee` in the node service` instead of manually creating the operation. There is no reason to have different semantics here in CLI vs. the node service API, and to have the user input the current epoch manually. ## Test Plan CI should catch regressions. I created #3488 for the events check. ## Release Plan - Nothing to do / These changes follow the usual release cycle. ## Links - Found while working on #3432. - [reviewer checklist](https://github.com/linera-io/linera-protocol/blob/main/CONTRIBUTING.md#reviewer-checklist)
| /// The block's validation round. | ||
| Round(Option<u32>), | ||
| /// An event was read. | ||
| Event(EventId, Vec<u8>), |
There was a problem hiding this comment.
Ok so if we don't include the hash of the certificate that created the event, this means that a user event (stream) can become deprecated -- which may be a good thing after all (except for fast blocks).
There was a problem hiding this comment.
Not sure: This oracle response does re-certify the event itself (including its content), just not the certificate that created it.
| let chain_ids = { | ||
| let guard = context.lock().await; | ||
| let mut chain_ids = BTreeSet::from_iter(guard.wallet().chain_ids()); | ||
| chain_ids.insert(guard.wallet().genesis_admin_chain()); |
There was a problem hiding this comment.
I could add a getter to the wallet that returns the wallet's chains plus the admin chain.
| { | ||
| let mut guard = listening.lock().await; | ||
| if guard.contains(&chain_id) { | ||
| if !listening.lock().await.insert(chain_id) { |
| let admin_listener: NotificationStream = if client.admin_id() == chain_id { | ||
| Box::pin(stream::pending()) | ||
| } else { | ||
| Box::pin(client.subscribe_to(client.admin_id()).await?) |
There was a problem hiding this comment.
Is this creating a distinct admin chain listener for all mon-admin chains?
There was a problem hiding this comment.
No, just a local entry in the notifier.
linera-core/src/client/mod.rs
Outdated
| #[cfg(with_metrics)] | ||
| let _latency = metrics::PROCESS_INBOX_WITHOUT_PREPARE_LATENCY.measure_latency(); | ||
|
|
||
| self.synchronize_chain_state(self.admin_id).await?; |
There was a problem hiding this comment.
This could be a bit expensive
There was a problem hiding this comment.
Yes, I need to revisit whether it's actually necessary, once we've got all the tests working.
There was a problem hiding this comment.
It's the only way to learn about admin chain events now. It doesn't send us a message anymore.
We could make it a separate command, but I'd rather not: It means that more users won't end up migrating to new epochs.
(Edit: Actually, maybe it is redundant, since we're already doing this in synchronize_from_validators. Will remove if the tests pass without it.)
dcf3eee to
fbc6f93
Compare
| #[error(transparent)] | ||
| BcsError(#[from] bcs::Error), |
There was a problem hiding this comment.
The BcsError could be put in the ViewError. Non blocker of course.
There was a problem hiding this comment.
Not sure: I've also heard the argument that we should create errors only in their own module.
| bcs::to_bytes(&blob_hash)?, | ||
| ); | ||
| } | ||
| AdminOperation::RemoveCommittee { epoch } => { |
There was a problem hiding this comment.
It is a little strange to have such a formulation.
I would expect RemoveCommittee { committee }.
Then maybe rename RemoveFromCommittee.
Not blocker of course.
There was a problem hiding this comment.
I agree, it should probably be called RevokeEpoch? But then I'd rename the other one also from CreateCommittee to CreateEpoch.
Not sure what the best names are.
| } | ||
|
|
||
| /// Creates a new committee [`BlobContent`] from the provided serialized committee. | ||
| pub fn new_committee(committee: impl Into<Box<[u8]>>) -> Self { |
There was a problem hiding this comment.
This API should be more type-safe. Right now it accepts any bytes blindly a s Committee which can led to problems. Why not pass the actual committee instead and serialise it here?
| /// A blob containing an application description. | ||
| ApplicationDescription, | ||
| /// A blob containing a committee of validators. | ||
| Committee, |
There was a problem hiding this comment.
Since we're adding more and more here I think we should have a discussion and some design doc (agreement) what can/should become a BlobType.
linera-core/src/client/mod.rs
Outdated
| let (mut min_epoch, mut next_epoch) = { | ||
| let query = ChainInfoQuery::new(self.chain_id).with_committees(); | ||
| let info = *self | ||
| .client | ||
| .local_node | ||
| .handle_chain_info_query(query) | ||
| .await? | ||
| .info; | ||
| let committees = info | ||
| .requested_committees | ||
| .ok_or(LocalNodeError::InvalidChainInfoResponse)?; | ||
| let min_epoch = *committees.keys().next().unwrap_or(&Epoch::ZERO); | ||
| let epoch = info.epoch.ok_or(LocalNodeError::InvalidChainInfoResponse)?; | ||
| (min_epoch, epoch.try_add_one()?) | ||
| }; | ||
| let mut epoch_change_ops = Vec::new(); | ||
| while self.has_admin_event(EPOCH_STREAM_NAME, next_epoch).await? { | ||
| epoch_change_ops.push(Operation::System(SystemOperation::ProcessNewEpoch( | ||
| next_epoch, | ||
| ))); | ||
| next_epoch.try_add_assign_one()?; | ||
| } | ||
| while self | ||
| .has_admin_event(REMOVED_EPOCH_STREAM_NAME, min_epoch) | ||
| .await? | ||
| { | ||
| epoch_change_ops.push(Operation::System(SystemOperation::ProcessRemovedEpoch( | ||
| min_epoch, | ||
| ))); | ||
| min_epoch.try_add_assign_one()?; | ||
| } | ||
| let mut epoch_change_ops = epoch_change_ops.into_iter(); |
There was a problem hiding this comment.
Can we extract this to a helper method? Sth like collect_committee_changes ?
| )); | ||
| let blob_hash = committee_blob.id().hash; | ||
| self.storage | ||
| .write_blob(&committee_blob) |
There was a problem hiding this comment.
I think changing the resource policy should not require re-publishing a new committee. There should be a separate mechanism for that. If we change the cost of a Wasm multiplication will we also publish a new committee? It should be stored somewhere in the "system storage" and its updates should require quorum of validators.
There was a problem hiding this comment.
But it also needs to be applied separately to each chain, just like committee changes. It makes sense to me to treat all changes with that property as one thing. Anyway, this wasn't changed in this PR, so it's probably a separate discussion.
| stream_id: StreamId::system(EPOCH_STREAM_NAME), | ||
| key: bcs::to_bytes(&epoch)?, | ||
| }; | ||
| let bytes = match txn_tracker.next_replayed_oracle_response()? { |
There was a problem hiding this comment.
This pattern of checking for oracle response mismatches is now repeated in a few different places, maybe it could somehow be extracted into a method on TransactionTracker? (Although it might not be so simple, because one of the parameters would have to be a pattern... it would need some thought.)
Anyway, not a blocker.
There was a problem hiding this comment.
Yes, that's a good point: We could pass in the closure that extracts the data by matching the pattern.
What's probably less easy to do is to also pass in the closure that uses the actual oracle, since that's going to be async and often borrow a lot of references.
But yes, I'd prefer to do that in a separate PR.
I created #3519.
…3540) ## Motivation Epoch changes were recently changed to use events (#3432), but the `TestValidator::change_resource_control_policy` was still trying to receive epoch change system messages instead of creating an operation to process the new epoch. ## Proposal Use the new `SystemOperation::ProcessNewEpoch` to update the chains to use the new epoch, and skip updating the admin chain because it fails to try to process the same epoch twice. ## Test Plan Tested with a rebased version of the tests in #3509. The tests stop working after #3432 but pass again after this PR. ## Release Plan - These changes follow the usual release cycle, because it fixes an issue with SDK that is not present in any released version. ## Links - [reviewer checklist](https://github.com/linera-io/linera-protocol/blob/main/CONTRIBUTING.md#reviewer-checklist)
## Motivation As of #3432 we are not using system channels anymore. ## Proposal Remove the `SystemChannel` type and any system channel-specific logic, in particular the "off-chain subscription handling". ## Test Plan The `social` example tests make sure that _user_ pub-sub channels still work. ## Release Plan - Nothing to do / These changes follow the usual release cycle. ## Links - [reviewer checklist](https://github.com/linera-io/linera-protocol/blob/main/CONTRIBUTING.md#reviewer-checklist)
Motivation
To make epoch changes scalable, we want to use events instead of messages (#365).
Proposal
Remove the committee messages and add add
ProcessNewEpochandProcessRemovedEpochsystem operations instead. The committee change operations on the admin change create events now, and the two new operations consume those events, i.e. verify that they exist, change the epoch and committees on the local chain, and create anOracleResponse::Evententry in the execution outcome.Clients get notified about new epochs because they always follow/synchronize the admin chain and thus store the epoch change events locally. They will then run
process_inboxwhich, in addition to incoming messages, looks for epoch change events, and includes the new operations in the proposal accordingly.Test Plan
Several existing tests verify that reconfigurations are processed correctly. These check the new mechanism now instead of the message-based one.
Release Plan
Links