Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
68 changes: 64 additions & 4 deletions hadoop-hdds/docs/content/feature/SCM-HA.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,10 +25,6 @@ summary: HA setup for Storage Container Manager to avoid any single point of fai

Ozone has two metadata-manager nodes (*Ozone Manager* for key space management and *Storage Container Management* for block space management) and multiple storage nodes (Datanode). Data is replicated between Datanodes with the help of RAFT consensus algorithm.

<div class="alert alert-warning" role="alert">
Please note that SCM-HA is not ready for production in secure environments. Security work is in progress and will be finished soon.
</div>

To avoid any single point of failure the metadata-manager nodes also should have a HA setup.

Both Ozone Manager and Storage Container Manager supports HA. In this mode the internal state is replicated via RAFT (with Apache Ratis)
Expand Down Expand Up @@ -111,6 +107,70 @@ This can be changed with using `ozone.scm.primordial.node.id`. You can define th

Based on the `ozone.scm.primordial.node.id`, the init process will be ignored on the second/third nodes and bootstrap process will be ignored on all nodes except the primordial one.

## SCM HA Security

![Overview](scm-secure-ha.png)

In a secure SCM HA cluster on the SCM where we perform init, we call this SCM as a primordial SCM.
Primordial SCM starts root-CA with self-signed certificates and is used to issue a signed certificate
to itself and other bootstrapped SCM’s. Only primordial SCM can issue signed certificates for other SCM’s.
So, primordial SCM has a special role in the SCM HA cluster, as it is the only one that can issue certificates to SCM’s.

The primordial SCM takes a root-CA role, which signs all SCM instances with a sub-CA certificate.
The sub-CA certificates are used by SCM to sign certificates for OM/Datanodes.

When bootstrapping a SCM, it gets a signed certificate from the primary SCM and starts sub-CA.

Sub-CA on the SCM’s are used to issue signed certificates for OM/DN in the cluster. Only the leader SCM issues a certificate to OM/DN.

### How to enable security:

```XML
<property>
<config>ozone.security.enable</config>
<value>true</value>
</property>

<property>
<config>hdds.grpc.tls.enabled</config>
<value>true</value>
</property>
```

Above configs are needed in addition to normal SCM HA configuration.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add some details about how primordial SCM is determined by SCM? Like, the first node in the SCM node list or a specific configuration key?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


### Primordial SCM:

Primordial SCM is determined from the config ozone.scm.primordial.node.id.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add some description about how it is determined when ozone.scm.primordial.node.id is not defined?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated it

The value for this can be node id or hostname of the SCM. If the config is
not defined, the node where init is run is considered as the primordial SCM.

{{< highlight bash >}}
bin/ozone scm --init
{{< /highlight >}}

This will set up a public,private key pair and self-signed certificate for root CA
and also generate public, private key pair and CSR to get a signed certificate for sub-CA from root CA.


### Bootstrap SCM:

{{< highlight bash >}}
bin/ozone scm --bootstrap
{{< /highlight >}}

This will set up a public, private key pair for sub CA and generate CSR to get a
signed certificate for sub-CA from root CA.

**Note**: Make sure to run **--init** only on one of the SCM host if
primordial SCM is not defined. Bring up other SCM's using **--bootstrap**.

### Current SCM HA Security limitation:
1. When primordial SCM is down, new SCM’s cannot be bootstrapped and join the
quorum.
2. Secure cluster upgrade to ratis-enable secure cluster is not supported.


## Implementation details

SCM HA uses Apache Ratis to replicate state between the members of the SCM HA quorum. Each node maintains the block management metadata in local RocksDB.
Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.