HDDS-8689. Rotate Root CA and Sub CA in SCM. #4943

ChenSammi · 2023-06-20T15:53:31Z

https://issues.apache.org/jira/browse/HDDS-8689

The major features in this patch are

RootCARotationManager manages the all ca rotation flow. Only the leader can start the rotation process.

2.RootCARotationHandlerImpl handles all command received through ratis.

RootCARotationMetrics show the metrics of rotation, here is jmx ouput

There are also other changes in this patch

Certificate subject principle is used in the certificate client and below tls layer to query different kind of information. So to avoid the same subject of the old and new certificate, subject creation rule is changed.
For root CA certificate, subject DN is changed from scm@hostname to scm-$number@hostname, number starting from 1, which is also the certificate serial Id of the certificate.
For scm CA certificate, subject DN is changed from scm-sub@hostname to scm-sub-$timestamp@hostname.
A new sequence ID type "rootCertificateId" is added to SequenceIdGenerator for root certificate serial Id generation.
Reject add new scm, scm decommission, scm leader transfer from CLI, dn/om/recon CSR request during CA certificate rotation.

adoroszlai

Please don't duplicate ozonesecure-ha environment as ozonesecure-carotation. Add a new test script ozonesecure-ha/test-certificate-rotation.sh instead.

See https://github.com/apache/ozone/blob/master/hadoop-ozone/dist/src/main/compose/ozonesecure/test-certificate-rotation.sh as an example.

There's also checkstyle failure:

hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/security/x509/certificate/client/DefaultCertificateClient.java
 130: Variable 'scmSecurityProtocolClient' must be private and have accessor methods.
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/security/RootCARotationMetrics.java
 62: 'ms' hides a field.

ChenSammi · 2023-06-22T15:11:12Z

Please don't duplicate ozonesecure-ha environment as ozonesecure-carotation. Add a new test script ozonesecure-ha/test-certificate-rotation.sh instead.

See https://github.com/apache/ozone/blob/master/hadoop-ozone/dist/src/main/compose/ozonesecure/test-certificate-rotation.sh as an example.

@adoroszlai , thanks, I refactored the robot test.

fapifta

Hi @ChenSammi I am really greatful to have the opprtunity to go through the code live with you. Please find some comments inline based on our discussions.

fapifta · 2023-06-29T13:05:06Z

...p-hdds/common/src/main/java/org/apache/hadoop/hdds/security/ssl/ReloadingX509KeyManager.java

We had a long discussion on this one today, and on Tuesday as well, let me summarize what we have found and agreed on.

The initial problem with the workaround from my side was that it just seems inappropriate to preserve all the old key manager references, and go over all of them, I wanted to understand why it works, and what else we can do.

The final cause for the problem as we determined is within the Netty tcnative code as you have commented as well. The following happens:
During the setup of the connections there is an SSLContext to be created, that uses the TrustManager's getAcceptedIssuers() method, and within the ReferenceCountedOpenSslServerContext the results are cached to the native layer via BIO. This cached value is not updated ever after initialization, but based on our debugging, that value is used to present to the KeyManager counterpart for mTLS authentication when this SSLContext is used for communication via the bidirectional Netty channel under the Ratis implementation.
During the communication, the KeyManager's chooseEngineClientAlias(String, Principal[], SSLEngine) method is being called, and the cached data is used to present the Principals to the chooseEngineClientAlias() method.

The original KeyManager we have the reference for has a mechanism to select the certificate and key alias based on the provided principals, and after the first rotation, the new KeyManager will not have anything to provide to the original certificates that were present to the native layer via the getAcceptedIssuers() call, therefore once we have rotated the certificate, the chooseEngineClientAlias() call returned null, and therefore in the Ratis layer the mTLS authentication failed, then was retried without a chance to succeed.

The workaround works, as at the end of the day it goes back in time to the first KeyManager, which still is able to resolve and then return the same alias that was/is used for all the earlier and the current KeyManager to store the certificates to be used after the actual rotation.

Based on these we agreed that this class should not have the workaround implemented this way, but instead we should stick to the alias we use at the creation of the KeyManager in the load methods.

The reason is that, we just put one private key and one certificate to every KeyManager instance when we create it in the loadKeyManager() method, and we can be sure that once there is query for a key and certificate, we want to use the only key and certificate that is present.

We discussed an alternative, where we use the same SubjectDN for all the rootCA certificates, but that causes operational trouble when we want to understand certificate chaining during debugging.

Furthermore I believe that as we have just one key and one certificate in every KeyManager, we can safely return the same alias from all the methods that are there to request an alias (chooseEngineClientAlias, chooseEngineServerAlias, getClientAliases, chooseClientAlias, getServerAliases, chooseServerAlias), and we can safely return the only key and only certificate from the getPrivateKey and getCertificateChain methods, but that is up for you to consider @ChenSammi I am fine with delegating the rest down to a non-custom engine provided KeyManager instance.

Yea, it's a wonder discussion. @fapifta ,thanks for the further digging. I will update the comments with more detail info explain the work around the the issue. My current thought is keeping the workaround in a small scope as possbile, try not to bring any unnecessary side effect. So besides chooseEngineClientAlias which needs the workaround now, for other KeyManager functions which doesn't affect by this issue, I tend to keep them untouched, unless later we found they are affected too.

Agree, we can look into that later, agree to keep the scope low, just wanted to note that this might as well be a good approach.

fapifta · 2023-06-29T13:08:22Z

...hdds/common/src/main/java/org/apache/hadoop/hdds/security/ssl/ReloadingX509TrustManager.java

+        !rootCACerts.stream().filter(
+            c -> !currentRootCACertIds.contains(c.getSerialNumber().toString()))
+            .findAny().isPresent()) {


Suggested change

!rootCACerts.stream().filter(

c -> !currentRootCACertIds.contains(c.getSerialNumber().toString()))

.findAny().isPresent()) {

rootCACerts.stream().allMatch(c ->

currentRootCACertIds.contains(c.getSerialNumber().toString()))) {

Also this we might change in the ReloadingX509KeyManager at L240-242 (loadKeyManager method).

Nice suggestion.

fapifta · 2023-06-29T13:20:59Z

...n/java/org/apache/hadoop/hdds/security/x509/certificate/client/DefaultCertificateClient.java

During testing with small rootCA lifetime, I have noticed that we never do any cleanup on the certificateMap internally in the DefaultCertificateClient.
It is not critical, and I will file a JIRA separately for that as it is not critical with the default values, and in our fairly short tests.
If you have an idea where to add it, I am happy to accept that, if not, then leave this for the future... I don't see any good place at the moment, so I am on the side to defer this question at least in the context of this PR.

The only case I can think of to do some cleanup of certificateMap is when some certificates are expired, especially root certificates, or certificate revocation. Besides those, I don't see other cases we need to cleanup certificateMap. For the expired root certificate case, we can do it in a follower up JIRA. For revocation case, I think we can deal with it when revocation is implemented in future.

agree, let's defer this later, revocation will surely help, as in that case we can revoke the old certificate after the rotation and revocation can handle the removal from the in-memory structures if needed. I forget yesterday, but now I have added a JIRA to track this. (https://issues.apache.org/jira/browse/HDDS-8963)

fapifta · 2023-06-29T13:23:39Z

...n/java/org/apache/hadoop/hdds/security/x509/certificate/client/DefaultCertificateClient.java

  }

+  public static void acquirePermit() throws InterruptedException {
+    semaphore.acquire();


As we discussed, the SCMCertificateClient is the one that requires this semaphore for externally syncronizing the renewal, as that happens from within the SCM state machine in raft as part of the root CA rotation.

Some questions came to my mind regarding this:
Do we need the CertificateLifetimeMonitor within the SCMCertificateClient, if the whole thing is managed centrally via Ratis? Will a standalone SCM go through the procedure correctly, or for that we need some further adjustments in the logic?

A SCM along doing the certificate rotation is not a common case. I can just think one. Here is the flow,
a. SCM A bootstrapped with its sub CA root signed by root certficate 1. Later some how it doesn't start and join the SCM HA immediately.
b. Root certificate was rotated from 1 to 2.
c. SCM A started and joined the SCM HA.
d. Some time later, SCM A found it need to rotate its certificate.

So let's keep the CertificateLifetimeMonitor in SCMCertificateClient.

It might happen together when the grace period is 0 seconds, so the two monitors are kicking in almost at the same time, besides that an other case, when there is a failure, and the wait ack is equal to the grace period, as then the first failure will push the second try to the time when the certificate monitor also kicks in.

These are rare cases, and should not cause any trouble due to the proper locking now, it might result in unnecessary renewals. I am not worried about those, I think it is just more explicit to have the certificate lifetime monitor explicitly disabled showing that it is not used in SCM. But let's leave ti for now as it is, I fine with it this way as well.

fapifta · 2023-06-29T13:42:22Z

...hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/security/RootCARotationManager.java

+    }
    try {
      executorService.shutdown();
      if (!executorService.awaitTermination(3, TimeUnit.SECONDS)) {


As we discussed, due to the various calls to scm.shutDown in case of unrecoverable failures during the rootCA rotation process, this wait here is not fortunate, as there is a circular dependency here.

Once we call scm.shutDown, it will call into the stop method of the manager, then wait 3 seconds to see if the manager stops, but that fundamentally blocks the manager, and shutdown is graceful, so there won't be interruptions either.

I think we should call the shutdownNow() method only on the executorService, and whenever we call scm.shutDown also bail out from further execution either by throwing an exception, or by returning to the caller after scm.shutDown is called. This way we can ensure that scm shuts down properly, and none of the rest of the execution that might happen in the executor puts any unexpected garbage into the on-disk structure, while the executor also tears down gracefully.

The rollbacks and cleanups seem to happen properly, but I am worried what happens if we leave the executor to run the thing further, especially because the executorService uses daemon threads. What do you think?

I written one test case to test the effect of System.exit(), it will not continue execute the statement after it. So don't worry that the daemon will continue the execution.

public class TestExecutorService {
static final ScheduledExecutorService executor = Executors.newScheduledThreadPool(1,
new ThreadFactoryBuilder().setNameFormat("Test")
.setDaemon(true).build());

static void test() throws InterruptedException {
executor.submit(new Runnable() {
@OverRide
public void run() {
while(true) {
if (Thread.currentThread().isInterrupted()) {
System.out.println("interrupted " + new Date());
//break;
} else {
System.out.println("not interrupted " + new Date());
}
executor.shutdownNow();
System.out.println("shutdownNow is called ");
System.exit(1);
System.out.println("exit is called ");
}
}
});
Thread.sleep(1000);
System.out.println("Exiting normally...");
}

public static void main(String[] args) {
try {
test();
} catch (InterruptedException e) {
e.printStackTrace();
}
}

And shutdownNow is a good advice.

Right, let's use scm.shutdown in combination with just shutdownNow in the scm.stop() method for the rotation manager. Thank you for looking into it!

nandakumar131

Thanks @ChenSammi for working on this. Overall the change looks good to me.
There are some minor comments, feel free to ignore the NIT.

nandakumar131 · 2023-06-30T07:23:26Z

hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/ha/HASecurityUtils.java

+    if (cert.getIssuerX500Principal().equals(cert.getSubjectX500Principal())) {
+      return true;
+    }
+    return false;


Suggested change

if (cert.getIssuerX500Principal().equals(cert.getSubjectX500Principal())) {

return true;

}

return false;

return cert.getIssuerX500Principal().equals(cert.getSubjectX500Principal());

nandakumar131 · 2023-06-30T07:24:10Z

hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/ha/HASecurityUtils.java

+    if (cert.getBasicConstraints() != -1) {
+      return true;
+    }
+    return false;


Suggested change

if (cert.getBasicConstraints() != -1) {

return true;

}

return false;

return cert.getBasicConstraints() != -1;

nandakumar131 · 2023-06-30T07:59:08Z

...p-hdds/common/src/main/java/org/apache/hadoop/hdds/security/ssl/ReloadingX509KeyManager.java

+    for (int i = 0; i < newCertList.size(); i++) {
+      LOG.info(newCertList.get(i).toString());
+    }


Suggested change

for (int i = 0; i < newCertList.size(); i++) {

LOG.info(newCertList.get(i).toString());

}

for (X509Certificate x509Certificate : newCertList) {

LOG.info(x509Certificate.toString());

}

nandakumar131 · 2023-06-30T08:01:24Z

...hdds/common/src/main/java/org/apache/hadoop/hdds/security/ssl/ReloadingX509TrustManager.java

+    rootCACerts.stream().forEach(
+        c -> currentRootCACertIds.add(c.getSerialNumber().toString()));


Suggested change

rootCACerts.stream().forEach(

c -> currentRootCACertIds.add(c.getSerialNumber().toString()));

rootCACerts.forEach(

c -> currentRootCACertIds.add(c.getSerialNumber().toString()));

nandakumar131 · 2023-06-30T08:05:00Z

.../server-scm/src/main/java/org/apache/hadoop/hdds/scm/security/RootCARotationHandlerImpl.java

+  private String newCAComponent = SCM_ROOT_CA_COMPONENT_NAME +
+      HDDS_NEW_KEY_CERT_DIR_NAME_SUFFIX +
+      HDDS_NEW_KEY_CERT_DIR_NAME_PROGRESS_SUFFIX;


This field is never used, we can remove it.

nandakumar131 · 2023-06-30T08:11:55Z

...hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/security/RootCARotationManager.java

+            throw new RuntimeException("Exit the this " +
+                "WaitSubCARotationPrepareAckTask for root certificate " +
+                rootCACertId + " since the rotation is finished execution");


Why are we throwing exception from finally block here?

If I remember well, we want to stop scheduling this task again...

I think we can take the reference of the Future once the task is submitted, and cancel via that one instead.
Nice catch @nandakumar131 we have talked about this with Sammi, I just missed this one from my notes as it seems. Similarly as we cancel the waitAckTimeoutTask.

ChenSammi · 2023-07-01T05:08:02Z

I checked run https://github.com/apache/ozone/actions/runs/5428637052,

actually the ozonesecure-ha/test-root-ca-rotation.sh is passed. The acceptance (HA-secure) is failed when it try to run ozonesecure/test-root-ca-rotation.sh under this HA-secure acceptance. I don't clear why it tried to execute a bash file in another directory.
Hi @adoroszlai , is there any limitation on the bash file naming? In this patch, there are two bash file with the same name test-root-ca-rotation.sh, one under ozonesecure, another under ozonesecure-ha.

adoroszlai · 2023-07-01T05:19:12Z

@ChenSammi Both scripts are executed in acceptance (HA-secure) because they both are marked for this test split:

https://github.com/apache/ozone/blob/ecb32652052f6173188ce5dfbf9dbf9d6d94cbc7/hadoop-ozone/dist/src/main/compose/ozonesecure/test-root-ca-rotation.sh#L18

The non-HA one should be assigned to the secure split:

https://github.com/apache/ozone/blob/ecb32652052f6173188ce5dfbf9dbf9d6d94cbc7/hadoop-ozone/dist/src/main/compose/ozonesecure/test-vault.sh#L18

But I don't think that's causing any failure. The cluster simply times out waiting for exit from safe mode. (Logs are available only until the failed check is re-run...)

ChenSammi · 2023-07-01T06:37:29Z

@ChenSammi Both scripts are executed in acceptance (HA-secure) because they both are marked for this test split:

https://github.com/apache/ozone/blob/ecb32652052f6173188ce5dfbf9dbf9d6d94cbc7/hadoop-ozone/dist/src/main/compose/ozonesecure/test-root-ca-rotation.sh#L18

The non-HA one should be assigned to the secure split:

https://github.com/apache/ozone/blob/ecb32652052f6173188ce5dfbf9dbf9d6d94cbc7/hadoop-ozone/dist/src/main/compose/ozonesecure/test-vault.sh#L18

But I don't think that's causing any failure. The cluster simply times out waiting for exit from safe mode. (Logs are available only until the failed check is re-run...)

Good catch! I see. Thanks @adoroszlai. Let me fix that.

adoroszlai · 2023-07-01T07:18:34Z

hadoop-ozone/dist/src/main/compose/ozonesecure/test-root-ca-rotation.sh

Suggested change

export OM_SERVICE_ID="omservice"

export SCM=scm1.org

export SCM=scm

This is the cause for the safe mode timeout.

SCM actually exits safe mode, but we're checking in the wrong container (scm1.org does not exist):

scm_1 | 2023-07-01 04:10:26,398 [EventQueue-OpenPipelineForHealthyPipelineSafeModeRule] INFO safemode.SCMSafeModeManager: SCM exiting safe mode.

adoroszlai · 2023-07-01T07:21:21Z

hadoop-ozone/dist/src/main/compose/ozonesecure/test-root-ca-rotation.sh

Suggested change

execute_commands_in_container scm1.org "ozone sh volume create /r-v1 && ozone sh bucket create /r-v1/r-b1"

execute_commands_in_container scm "ozone sh volume create /r-v1 && ozone sh bucket create /r-v1/r-b1"

…rocess end

adoroszlai

Thanks @ChenSammi for working on this. Note: I haven't checked the core logic as I'm not familiar with this area.

adoroszlai · 2023-07-03T05:51:39Z

...p-hdds/common/src/main/java/org/apache/hadoop/hdds/security/ssl/ReloadingX509KeyManager.java

Suggested change

strings == null ? "" : Arrays.stream(strings).map(Object::toString)

.collect(Collectors.joining(", ")),

principals == null ? "" : Arrays.stream(principals)

.map(Object::toString).collect(Collectors.joining(", ")),

sslEngine == null ? "" : sslEngine.toString(), ret);

strings == null ? "" : Arrays.toString(strings),

principals == null ? "" : Arrays.toString(principals),

sslEngine == null ? "" : sslEngine, ret);

Also, why do you prefer "" instead of null in the log?

Just a personal habit.

adoroszlai · 2023-07-03T05:52:45Z

...p-hdds/common/src/main/java/org/apache/hadoop/hdds/security/ssl/ReloadingX509KeyManager.java

Suggested change

principals == null ? "" : Arrays.stream(principals)

.map(Object::toString).collect(Collectors.joining(", ")),

sslEngine == null ? "" : sslEngine.toString());

principals == null ? "" : Arrays.toString(principals),

sslEngine == null ? "" : sslEngine);

adoroszlai · 2023-07-03T06:33:30Z

hadoop-ozone/dist/src/main/compose/testlib.sh

+     set +e
+     docker-compose exec -T $container /bin/bash -c "$command"
+     status=$?
+     set -e
+     if [ $status -eq 0 ] ; then
+         echo "$command succeed"
+         return;


This was recently changed, please avoid reverting to previous implementation.

Suggested change

set +e

docker-compose exec -T $container /bin/bash -c "$command"

status=$?

set -e

if [ $status -eq 0 ] ; then

echo "$command succeed"

return;

if docker-compose exec -T $container /bin/bash -c "$command"; then

echo "$command succeed"

adoroszlai · 2023-07-03T06:35:31Z

hadoop-ozone/dist/src/main/compose/testlib.sh

+  set -e
+  # shellcheck disable=SC2068
+  docker-compose exec -T $container /bin/bash -c "$command"
+  set +e


Suggested change

set -e

# shellcheck disable=SC2068

docker-compose exec -T $container /bin/bash -c "$command"

set +e

# shellcheck disable=SC2068

docker-compose exec -T $container /bin/bash -c "$command"

execute_command_in_container was recently changed to avoid changing exit-on-error setting, please apply the same here.

adoroszlai · 2023-07-03T06:42:57Z

hadoop-ozone/dist/src/main/compose/testlib.sh

      -v OZONE_DIR:"${OZONE_DIR}" \
      -v SECURITY_ENABLED:"${SECURITY_ENABLED}" \
      -v SCM:"${SCM}" \
+      -v TARGET_SCM:"${TARGET_SCM:-scm2.org}" \


TARGET_SCM is specific to few tests, it should be passed in ozonesecure-ha/test-root-ca-rotation.sh.

Suggested change

-v TARGET_SCM:"${TARGET_SCM:-scm2.org}" \

adoroszlai · 2023-07-03T06:44:31Z

hadoop-ozone/dist/src/main/compose/ozonesecure-ha/test-root-ca-rotation.sh

+export TARGET_SCM=scm3.org
+execute_robot_test scm4.org scmha/scm-leader-transfer.robot


Suggested change

export TARGET_SCM=scm3.org

execute_robot_test scm4.org scmha/scm-leader-transfer.robot

execute_robot_test scm4.org -v "TARGET_SCM:scm3.org" scmha/scm-leader-transfer.robot

adoroszlai · 2023-07-03T06:45:28Z

hadoop-ozone/dist/src/main/compose/ozonesecure-ha/test-root-ca-rotation.sh

+export TARGET_SCM=scm4.org
+execute_robot_test scm4.org scmha/scm-leader-transfer.robot


Suggested change

export TARGET_SCM=scm4.org

execute_robot_test scm4.org scmha/scm-leader-transfer.robot

execute_robot_test scm4.org -v "TARGET_SCM:scm4.org" scmha/scm-leader-transfer.robot

adoroszlai · 2023-07-03T06:49:30Z

hadoop-ozone/dist/src/main/smoketest/scmha/scm-leader-transfer.robot

 Test Timeout        5 minutes

 *** Variables ***
+${TARGET_SCM}=      %{TARGET_SCM=scm2.org}


Robot variable is passed via -v option. This can be simplified to avoid using an environment variable, too.

Suggested change

${TARGET_SCM}= %{TARGET_SCM=scm2.org}

${TARGET_SCM} scm2.org

adoroszlai · 2023-07-03T06:59:00Z

...hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/security/RootCARotationHandler.java

+  @Replicate
+  void rotationPrepare(String rootCertId)
+      throws IOException, TimeoutException;
+
+  @Replicate
+  void rotationPrepareAck(String rootCertId, String scmCertId, String scmId)
+      throws IOException, TimeoutException;
+
+  @Replicate
+  void rotationCommit(String rootCertId)
+      throws IOException, TimeoutException;
+
+  @Replicate
+  void rotationCommitted(String rootCertId)
+      throws IOException, TimeoutException;


@Replicate no longer requires TimeoutException to be declared.

adoroszlai · 2023-07-03T07:00:58Z

...main/java/org/apache/hadoop/hdds/scm/protocol/SCMSecurityProtocolServerSideTranslatorPB.java

    }
    String certificate = impl.getSCMCertificate(request.getScmDetails(),
-        request.getCSR());
+        request.getCSR(), request.hasRenew() ? request.getRenew() : false);


Nit:

Suggested change

request.getCSR(), request.hasRenew() ? request.getRenew() : false);

request.getCSR(), request.hasRenew() && request.getRenew());

Galsza · 2023-07-03T14:10:49Z

Thank you Sammi for working on this, it looks good to me. We can merge it once you fix Attila's comments

fapifta · 2023-07-03T22:08:31Z

Thank you for the changes and continued work on this one @ChenSammi I think from my end we are good, so Attila's open comments remains to be addressed than I also think we are good to go.

… failure because of rotation is fast and some daemons start slowly

Galsza · 2023-07-05T06:19:59Z

@ChenSammi @adoroszlai @nandakumar131 I'm good with merging this patch, could you please take a look?

JacksonYao287 · 2023-07-05T12:01:47Z

LGTM +1， thanks @ChenSammi for working on this!

adoroszlai · 2023-07-05T12:50:39Z

Thanks @ChenSammi for addressing my comments.

ChenSammi · 2023-07-05T13:04:03Z

Thanks @adoroszlai @fapifta @nandakumar131 @Galsza and @JacksonYao287 for the code review.

* master: (36 commits) HDDS-8990. Intermittent timeout waiting on datanode4 9856 to become available (apache#5039) Revert "HDDS-7750. Incorrect WRITE ACL check. (apache#4992)" HDDS-7750. Incorrect WRITE ACL check. (apache#4992) HDDS-8985. Intermittent timeout exiting safe mode in HA secure tests (apache#5033) HDDS-8593. Add RootCARotationPoller to CertClient (apache#5030) HDDS-7645. Kubernetes check should fail fast if cluster cannot start (apache#5028) HDDS-8981. TestRootedOzoneFileSystem runs out of disk space (apache#5029) HDDS-8592. Fetch and save all root certificates during service's certificate rotation. (apache#5025) HDDS-8981. Disable TestRootedOzoneFileSystem#testSafeMode HDDS-8591. Create scheduler to check for new root ca certificates (apache#4961) HDDS-8979. error validating kustomization.yaml (apache#5024) HDDS-8973. Ozone SCM HA should not allocates duplicate IDs when transferring leadership (apache#5018) HDDS-8970. Snapshot Diff should return path relative to bucket root (apache#5015) HDDS-8975. Clarify SCM HA auto-bootstrap doc (apache#5021) HDDS-8689. Rotate Root CA and Sub CA in SCM. (apache#4943) HDDS-8436. Support setSafeMode(), isFileClosed() FileSystem API (apache#4825) HDDS-8880. Intermittent fork timeout in TestOMRatisSnapshots (apache#5022) HDDS-8962. Ensure docker env is stopped (apache#5011) HDDS-7794. [snapshot] SnapshotDiff should throw better error messages for exception handling (apache#5007) HDDS-7922. [FSO] S3G folder support fso layout filestatus s3A compatibility (apache#4448) ...

ChenSammi added the security label Jun 20, 2023

adoroszlai reviewed Jun 20, 2023

View reviewed changes

ChenSammi force-pushed the HDDS-8689 branch from 2a37dce to 7589cd9 Compare June 22, 2023 09:43

ChenSammi force-pushed the HDDS-8689 branch 2 times, most recently from c08a175 to 73c595c Compare June 25, 2023 15:11

swagle requested a review from nandakumar131 June 26, 2023 15:43

fapifta reviewed Jun 29, 2023

View reviewed changes

fapifta mentioned this pull request Jun 29, 2023

HDDS-8591. Create scheduler to check for new root ca certificates #4961

Merged

Galsza mentioned this pull request Jun 29, 2023

HDDS-8593. Add RootCARotationPoller to CertClient #5001

Closed

nandakumar131 reviewed Jun 30, 2023

View reviewed changes

ChenSammi force-pushed the HDDS-8689 branch 3 times, most recently from a90a817 to 483829f Compare June 30, 2023 17:09

adoroszlai reviewed Jul 1, 2023

View reviewed changes

ChenSammi force-pushed the HDDS-8689 branch from 0ec52d7 to 8f01cc6 Compare July 2, 2023 06:49

ChenSammi added 11 commits July 3, 2023 14:06

HDDS-8689. Rotate Root CA and Sub CA in SCM.

7f91ea4

fix checkstyle and refactor robot test

8da1191

fix findbugs

382f9fc

rebase fix

35a6a44

fix test

cd2dd6e

fix findbugs

7f3a4a4

change the way to fetch HA group member count

11c9429

check if rotation can be skipped after get the lock resource

0567024

address comments and improve robot test

05097b0

rollback the changes in ci.yml for previous debug

e5d76d6

disable monitor task in SCMCertificateClient

862c21b

improve robot test and move saving root certificate into rocksdb to p…

7cb7937

…rocess end

ChenSammi force-pushed the HDDS-8689 branch from 8f01cc6 to 7cb7937 Compare July 3, 2023 06:12

adoroszlai reviewed Jul 3, 2023

View reviewed changes

address comments

494ecc9

ChenSammi force-pushed the HDDS-8689 branch from ba3a649 to 494ecc9 Compare July 4, 2023 06:49

ChenSammi added 2 commits July 4, 2023 17:23

override shouldStartCertificateMonitor in SCMCertificateClient

4287822

increase max certificate lifetime to reduce the possibility of random…

0e0b8e4

… failure because of rotation is fast and some daemons start slowly

ChenSammi merged commit 69738e6 into apache:master Jul 5, 2023

tanvipenumudy mentioned this pull request Oct 17, 2023

HDDS-9453. scm-leader-transfer.robot not part of any acceptance test #5441

Merged

		rootCACerts.stream().forEach(
		c -> currentRootCACertIds.add(c.getSerialNumber().toString()));

	export OM_SERVICE_ID="omservice"
	export SCM=scm1.org
	export SCM=scm

	execute_commands_in_container scm1.org "ozone sh volume create /r-v1 && ozone sh bucket create /r-v1/r-b1"
	execute_commands_in_container scm "ozone sh volume create /r-v1 && ozone sh bucket create /r-v1/r-b1"

-          strings == null ? "" : Arrays.stream(strings).map(Object::toString)
-              .collect(Collectors.joining(", ")),
-          principals == null ? "" : Arrays.stream(principals)
-              .map(Object::toString).collect(Collectors.joining(", ")),
-          sslEngine == null ? "" : sslEngine.toString(), ret);
+          strings == null ? "" : Arrays.toString(strings),
+          principals == null ? "" : Arrays.toString(principals),
+          sslEngine == null ? "" : sslEngine, ret);

		export TARGET_SCM=scm3.org
		execute_robot_test scm4.org scmha/scm-leader-transfer.robot

	export TARGET_SCM=scm3.org
	execute_robot_test scm4.org scmha/scm-leader-transfer.robot
	execute_robot_test scm4.org -v "TARGET_SCM:scm3.org" scmha/scm-leader-transfer.robot

		export TARGET_SCM=scm4.org
		execute_robot_test scm4.org scmha/scm-leader-transfer.robot

	request.getCSR(), request.hasRenew() ? request.getRenew() : false);
	request.getCSR(), request.hasRenew() && request.getRenew());

HDDS-8689. Rotate Root CA and Sub CA in SCM. #4943

HDDS-8689. Rotate Root CA and Sub CA in SCM. #4943

Uh oh!

Conversation

ChenSammi commented Jun 20, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

adoroszlai left a comment

Choose a reason for hiding this comment

Uh oh!

ChenSammi commented Jun 22, 2023

Uh oh!

fapifta left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ChenSammi Jun 30, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nandakumar131 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fapifta Jun 30, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ChenSammi commented Jul 1, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

adoroszlai commented Jul 1, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ChenSammi commented Jul 1, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

ChenSammi commented Jun 20, 2023 •

edited

Loading

ChenSammi Jun 30, 2023 •

edited

Loading

fapifta Jun 30, 2023 •

edited

Loading

ChenSammi commented Jul 1, 2023 •

edited

Loading

adoroszlai commented Jul 1, 2023 •

edited

Loading

ChenSammi commented Jul 1, 2023 •

edited

Loading