-
Notifications
You must be signed in to change notification settings - Fork 588
HDDS-7945. Integrate secret keys to SCM snapshot #4549
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HDDS-7945. Integrate secret keys to SCM snapshot #4549
Conversation
...src/main/java/org/apache/hadoop/hdds/scm/proxy/SCMSecurityProtocolFailoverProxyProvider.java
Show resolved
Hide resolved
...-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/scm/TestSecretKeySnapshot.java
Show resolved
Hide resolved
|
I will go over the integration tests, but is there a need for the follower to fetch secrets twice? Once before the snapshot install and once after to cover any window where the leader created a secret that did not get reflected on the follower? |
…ozone/scm/TestSecretKeySnapshot.java Co-authored-by: Ritesh H Shukla <[email protected]>
|
Can you write down the sequence of events that is expected to happen to check if the code is meeting the intended design. |
I've updated the PR description to include the step. Basically, before the snapshot is installed, the new follower doesn't fetch secret keys from the leader. After the snapshot is installed, any new secret keys update may be synchronized again ( from leader to follower) if there a new rotation happens since the snapshot index. This is fine and expected. |
I've updated the PR description to include the step. Basically, before the snapshot is installed, the new follower doesn't fetch secret keys from the leader. |
|
The overall code looks fine, I will go over the integration test a few more times. |
...-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/scm/TestSecretKeySnapshot.java
Show resolved
Hide resolved
| // Wait & retry for follower to update transactions to leader | ||
| // snapshot index. | ||
| // Timeout error if follower does not load update within 3s | ||
| GenericTestUtils.waitFor(() -> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a race but it would be nice if we could validate that Ratis did not have any transactions on start and no keys and then post-sync caught up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a race and the test will fail if the time it takes for the follower to start >= secret key rotation duration (which is now at 30s for this test).
It'd probably not fail on my laptop, but it will a lot in the CI env, in which hardware is much slower.
We can increase the rotation duration to increase the chance of success, e.g. to 90s, but that will slow the test down. I'm not sure if we wanna do that.
...-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/scm/TestSecretKeySnapshot.java
Show resolved
Hide resolved
|
Minor nits and some additional tests. LGTM. |
* master: (73 commits) HDDS-8587. Test that CertificateClient can store multiple rootCA certificates (apache#4852) HDDS-8801. ReplicationManager: Add metric to count how often replication is throttled (apache#4864) HDDS-8477. Unit test for Snapdiff using tombstone entries (apache#4678) HDDS-7507. [Snapshot] Implement List Snapshot API Pagination (apache#4065) (apache#4861) HDDS-8373. Document that setquota doesn't accept decimals (apache#4856) HDDS-8779. Recon - Expose flag for enable/disable of heatmap. (apache#4845) HDDS-8677. Ozone admin OM CLI command for block tokens (apache#4760) HDDS-8164. Authorize secret key APIs (apache#4597) HDDS-7945. Integrate secret keys to SCM snapshot (apache#4549) HDDS-8003. E2E integration test cases for block tokens (apache#4547) HDDS-7831. Use symmetric secret key to sign and verify token (apache#4417) HDDS-7830. SCM API for OM and Datanode to get secret keys (apache#4345) HDDS-7734. Implement symmetric SecretKeys lifescycle management in SCM (apache#4194) HDDS-8679. Add dedicated, configurable thread pool for OM gRPC server (apache#4771) HDDS-8790. Split EC acceptance tests (apache#4855) HDDS-8714. TestScmHAFinalization: mark testFinalizationWithRestart as flaky, enable other test cases HDDS-8787. Reduce ozone sh calls in robot tests (apache#4854) HDDS-8774. Log allocation stack trace for leaked CodecBuffer (apache#4840) HDDS-8729. Add metric for count of blocks pending deletion on datanode (apache#4800) HDDS-8780. Leak of ManagedChannel in HASecurityUtils (apache#4850) ...
* tmp-dir-refactor: (73 commits) HDDS-8587. Test that CertificateClient can store multiple rootCA certificates (apache#4852) HDDS-8801. ReplicationManager: Add metric to count how often replication is throttled (apache#4864) HDDS-8477. Unit test for Snapdiff using tombstone entries (apache#4678) HDDS-7507. [Snapshot] Implement List Snapshot API Pagination (apache#4065) (apache#4861) HDDS-8373. Document that setquota doesn't accept decimals (apache#4856) HDDS-8779. Recon - Expose flag for enable/disable of heatmap. (apache#4845) HDDS-8677. Ozone admin OM CLI command for block tokens (apache#4760) HDDS-8164. Authorize secret key APIs (apache#4597) HDDS-7945. Integrate secret keys to SCM snapshot (apache#4549) HDDS-8003. E2E integration test cases for block tokens (apache#4547) HDDS-7831. Use symmetric secret key to sign and verify token (apache#4417) HDDS-7830. SCM API for OM and Datanode to get secret keys (apache#4345) HDDS-7734. Implement symmetric SecretKeys lifescycle management in SCM (apache#4194) HDDS-8679. Add dedicated, configurable thread pool for OM gRPC server (apache#4771) HDDS-8790. Split EC acceptance tests (apache#4855) HDDS-8714. TestScmHAFinalization: mark testFinalizationWithRestart as flaky, enable other test cases HDDS-8787. Reduce ozone sh calls in robot tests (apache#4854) HDDS-8774. Log allocation stack trace for leaked CodecBuffer (apache#4840) HDDS-8729. Add metric for count of blocks pending deletion on datanode (apache#4800) HDDS-8780. Leak of ManagedChannel in HASecurityUtils (apache#4850) ...
What changes were proposed in this pull request?
Integrate secret keys to SCM snapshot. The taken approach is during snapshot installation, the follower makes a separate call to get secrets from SCM leader.
Protocol description of a new node joining an existing SCM (ratis) ring:
What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-7945
How was this patch tested?
Integration test & CI run.