-
Notifications
You must be signed in to change notification settings - Fork 593
HDDS-5158. Add documentation for SCM HA Security. #2205
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
4 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -25,10 +25,6 @@ summary: HA setup for Storage Container Manager to avoid any single point of fai | |
|
|
||
| Ozone has two metadata-manager nodes (*Ozone Manager* for key space management and *Storage Container Management* for block space management) and multiple storage nodes (Datanode). Data is replicated between Datanodes with the help of RAFT consensus algorithm. | ||
|
|
||
| <div class="alert alert-warning" role="alert"> | ||
| Please note that SCM-HA is not ready for production in secure environments. Security work is in progress and will be finished soon. | ||
| </div> | ||
|
|
||
| To avoid any single point of failure the metadata-manager nodes also should have a HA setup. | ||
|
|
||
| Both Ozone Manager and Storage Container Manager supports HA. In this mode the internal state is replicated via RAFT (with Apache Ratis) | ||
|
|
@@ -111,6 +107,70 @@ This can be changed with using `ozone.scm.primordial.node.id`. You can define th | |
|
|
||
| Based on the `ozone.scm.primordial.node.id`, the init process will be ignored on the second/third nodes and bootstrap process will be ignored on all nodes except the primordial one. | ||
|
|
||
| ## SCM HA Security | ||
|
|
||
|  | ||
|
|
||
| In a secure SCM HA cluster on the SCM where we perform init, we call this SCM as a primordial SCM. | ||
| Primordial SCM starts root-CA with self-signed certificates and is used to issue a signed certificate | ||
| to itself and other bootstrapped SCM’s. Only primordial SCM can issue signed certificates for other SCM’s. | ||
| So, primordial SCM has a special role in the SCM HA cluster, as it is the only one that can issue certificates to SCM’s. | ||
|
|
||
| The primordial SCM takes a root-CA role, which signs all SCM instances with a sub-CA certificate. | ||
| The sub-CA certificates are used by SCM to sign certificates for OM/Datanodes. | ||
|
|
||
| When bootstrapping a SCM, it gets a signed certificate from the primary SCM and starts sub-CA. | ||
|
|
||
| Sub-CA on the SCM’s are used to issue signed certificates for OM/DN in the cluster. Only the leader SCM issues a certificate to OM/DN. | ||
|
|
||
| ### How to enable security: | ||
|
|
||
| ```XML | ||
| <property> | ||
| <config>ozone.security.enable</config> | ||
| <value>true</value> | ||
| </property> | ||
|
|
||
| <property> | ||
| <config>hdds.grpc.tls.enabled</config> | ||
| <value>true</value> | ||
| </property> | ||
| ``` | ||
|
|
||
| Above configs are needed in addition to normal SCM HA configuration. | ||
|
|
||
| ### Primordial SCM: | ||
|
|
||
| Primordial SCM is determined from the config ozone.scm.primordial.node.id. | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can we add some description about how it is determined when ozone.scm.primordial.node.id is not defined?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Updated it |
||
| The value for this can be node id or hostname of the SCM. If the config is | ||
| not defined, the node where init is run is considered as the primordial SCM. | ||
|
|
||
| {{< highlight bash >}} | ||
| bin/ozone scm --init | ||
| {{< /highlight >}} | ||
|
|
||
| This will set up a public,private key pair and self-signed certificate for root CA | ||
| and also generate public, private key pair and CSR to get a signed certificate for sub-CA from root CA. | ||
|
|
||
|
|
||
| ### Bootstrap SCM: | ||
|
|
||
| {{< highlight bash >}} | ||
| bin/ozone scm --bootstrap | ||
| {{< /highlight >}} | ||
|
|
||
| This will set up a public, private key pair for sub CA and generate CSR to get a | ||
| signed certificate for sub-CA from root CA. | ||
|
|
||
| **Note**: Make sure to run **--init** only on one of the SCM host if | ||
| primordial SCM is not defined. Bring up other SCM's using **--bootstrap**. | ||
|
|
||
| ### Current SCM HA Security limitation: | ||
| 1. When primordial SCM is down, new SCM’s cannot be bootstrapped and join the | ||
| quorum. | ||
| 2. Secure cluster upgrade to ratis-enable secure cluster is not supported. | ||
|
|
||
|
|
||
| ## Implementation details | ||
|
|
||
| SCM HA uses Apache Ratis to replicate state between the members of the SCM HA quorum. Each node maintains the block management metadata in local RocksDB. | ||
|
|
||
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add some details about how primordial SCM is determined by SCM? Like, the first node in the SCM node list or a specific configuration key?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done