-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
enhance consensus key rotation support #13926
Conversation
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code seems reasonable to me.
Only big question is where are we planning to document this flow? It seems pretty challenging to get right without detailed instructions 😄
tokio::time::sleep(Duration::from_secs(5)).await; | ||
(operator_addr, new_pk, pop, operator_idx) | ||
} else { | ||
unreachable!() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: maybe just unwrap instead of checking and then doing unreachable!()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still think the if-let is superfluous 😄
} | ||
} | ||
|
||
pub fn overriding_identity_blob_paths_mut(&mut self) -> &mut Vec<PathBuf> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Maybe make this test-only? It seems like it's only used in the smoke tests? Or better yet, don't expose this here if possible?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed.
NOTE that we can't reuse feature testing
as crate smoke-test
needs it disabled in addition to overriding_identity_blob_paths_mut
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aah, in that case, don't worry about it -- you can remove the smoke-test
feature. I was just wondering if it was a quick change 😄 Seems like overkill for a new feature.
@@ -123,15 +123,22 @@ impl ConfigSanitizer for SafetyRulesConfig { | |||
pub enum InitialSafetyRulesConfig { | |||
FromFile { | |||
identity_blob_path: PathBuf, | |||
#[serde(skip_serializing_if = "Vec::is_empty", default)] | |||
overriding_identity_paths: Vec<PathBuf>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this need to be a vector? Will 1 path not suffice (if the operator wants to rotate again, they need to move the new key to the old key location, and then add a single override). Would that work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In theory 2 slots (1 existing, 1 new) is enough. It's just i found n slots can also be supported for free by just replacing Option<PathBuf>
with Vec<PathBuf>
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks reasonable, let's replace the debug logs with meaningful logs. we may want to support truncating old keys too but can do it later
.expect("Failed in loading consensus key for ExecutionProxyClient."); | ||
let signer = Arc::new(ValidatorSigner::new(self.author, consensus_key)); | ||
let consensus_sk = maybe_consensus_key | ||
.expect("consensus key unavailable for ExecutionProxyClient"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
given we just panic if the option doesn't exist, we should probably pass in Arc directly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we unwrap only if the current node is in the validator set.
For non-validators i guess we can't unwrap outside...
) -> CliTypedResult<TransactionSummary> { | ||
UpdateConsensusKey { | ||
txn_options: self.transaction_options(operator_index, None), | ||
txn_options: self.transaction_options(operator_index, gas_options), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how's this change related?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
needed for a smoke test to be less flaky.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
✅ Forge suite
|
✅ Forge suite
|
✅ Forge suite
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks reasonable to me (from the node config and smoke test side). But, I defer the final stamp to @zekun000 for the consensus and validator verifier changes (some of this code is very old 😄).
} | ||
} | ||
|
||
pub fn overriding_identity_blob_paths_mut(&mut self) -> &mut Vec<PathBuf> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aah, in that case, don't worry about it -- you can remove the smoke-test
feature. I was just wondering if it was a quick change 😄 Seems like overkill for a new feature.
} | ||
tokio::time::sleep(Duration::from_secs(1)).await; | ||
} | ||
bail!(""); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: maybe put a real error message here? 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oops, i forgot auto-merge is on...
tokio::time::sleep(Duration::from_secs(5)).await; | ||
(operator_addr, new_pk, pop, operator_idx) | ||
} else { | ||
unreachable!() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still think the if-let is superfluous 😄
} | ||
} | ||
} | ||
info!("Overriding key work time: {:?}", timer.elapsed()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: do we think this is necessary to track/log?
Description
Consensus key rotation is not well-supported today
Today, a validator only uses a single consensus key, and the only way to update the consensus key is to edit the node config and restart the node...
e
then updates local consensus sk in epoche+1
, incorrect augmented keys for randomness has been persisted at the beginning of epoche+1
, and node will crash on restart (until epoche+2
when the incorrect data is cleared).e
and updates local consensus sk also in epoche
, the node won't be able to participate in consensus or randomness until epoche+1
.Proposed user flow
The code changes
NodeConfig::ConsensusConfig::SafetyRulesConfig::InitialSafetyRulesConfig::FromFile::overriding_identity_paths
, which is to hold new validator identitie(s).consensus_<PK_HEX>
wherePK_HEX
is the public key hexlified.consensus_<X>
where X is its on-chain validator pk hexlified. If they don't exist, fall back to the value under keyconsensus
(set by the existing implementation).consensus_key_rotation
to capture the user flow.Type of Change
Which Components or Systems Does This Change Impact?
How Has This Been Tested?
A smoke test case.
Key Areas to Review
The entire PR.
Checklist