Skip to content

Conversation

@Galsza
Copy link
Contributor

@Galsza Galsza commented Jun 22, 2023

What changes were proposed in this pull request?

After the SCMs finished their part of root ca rotation the clients also need to get the new root CA certificate. This is done through a simple polling mechanism that asks the SCMs for the root CAs and once it observes a change it invokes a consumer from the clients to notify them. Integrating this change is going to be done in later items.

What is the link to the Apache JIRA

HDDS-8591

How was this patch tested?

Added unit tests. Functional change is not yet observed in this patch as the polling is not actually integrated into anything yet.

@Galsza
Copy link
Contributor Author

Galsza commented Jun 22, 2023

@ChenSammi, @fapifta please take a look at this patch

Copy link
Contributor

@ashishkumar50 ashishkumar50 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Galsza, Thanks for working on this. Please find few comments.

</description>
</property>
<property>
<name>ozone.scm.security.service.address</name>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ozone.scm.security.service.address This is duplicate config, already there is one. Don't see used in the new code, may be by mistake added. Please remove it.

executor.shutdownNow();
}
if (!executor.awaitTermination(5, TimeUnit.SECONDS)) {
LOG.error("Unable to shutdown state machine properly.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please correct log wrt cert and not state machine.

if (!knownRootCerts.containsAll(scmRootCaCerts)) {
rootCaListConsumers.forEach(c -> c.accept(scmRootCaCerts));
knownRootCerts = new HashSet<>();
knownRootCerts.addAll(scmRootCaCerts);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice to have a log here, when there is a new cert?

scmSecureClient.getAllRootCaCertificates();
List<X509Certificate> scmRootCaCerts =
OzoneSecurityUtil.convertToX509(pemEncodedRootCaList);
if (!knownRootCerts.containsAll(scmRootCaCerts)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Objects(X509Certificate) inside knownRootCerts and scmRootCaCerts are same when both have same content? Can you verify if it is fine. If not it will always enter into this condition.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two certs are the same when their encoded content is the same. I think this should be good enough. I'm going to discuss this with @fapifta just to be sure and make a local test for it.

@Galsza Galsza marked this pull request as draft June 26, 2023 12:15
@Galsza
Copy link
Contributor Author

Galsza commented Jun 26, 2023

Converted to draft for now because some additional retry functionality needs to be created for each consumer in the poller.

@Galsza Galsza marked this pull request as ready for review June 27, 2023 16:17
@Galsza
Copy link
Contributor Author

Galsza commented Jun 27, 2023

@ashishkumar50 Please take another look!

knownCerts.add(knownCert);
List<String> certsFromScm = new ArrayList<>();
certsFromScm.add(CertificateCodec.getPEMEncodedString(knownCert));
certsFromScm.add(CertificateCodec.getPEMEncodedString(newRootCa));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you have small test by just adding knowCert to ensure both cases are fine.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what you meant by this, but I have refined the first test case to use the knownCert instead of not using certs at all. I hope this fixes it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I meant the same. Thanks for updating it.

Copy link
Contributor

@fapifta fapifta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @Galsza for the work on this one, as we discussed today, let's wait with merging this one until we can get #4943 in, and then adjust this and the subsequent ones.

+1 from my end.

@ashishkumar50
Copy link
Contributor

Thanks @Galsza, LGTM +1.

@Galsza Galsza marked this pull request as draft June 30, 2023 13:07
@Galsza
Copy link
Contributor Author

Galsza commented Jun 30, 2023

Moving to draft until Sammi's patch is merged: #4943

@Galsza Galsza force-pushed the HDDS-8591_root_ca_rotation_polling branch from d719380 to a89ce73 Compare July 5, 2023 13:27
@Galsza
Copy link
Contributor Author

Galsza commented Jul 5, 2023

Rebased to Sammi's change, since #4943 is merged now

@Galsza Galsza marked this pull request as ready for review July 5, 2023 13:32
@Galsza
Copy link
Contributor Author

Galsza commented Jul 5, 2023

@fapifta @adoroszlai @ChenSammi this pr is ready now, please take another look!

return StringUtils.join(certs.stream()
.map(X509Certificate::getSerialNumber)
.map(BigInteger::toString)
.collect(Collectors.toList()), ", ");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can use stream function to achieve the join, like this
certs.stream().map(X509Certificate::getSerialNumber).map(BigInteger::toString).collect(Collectors.joining(", "));

LOG.error("Unable to shutdown root ca certificate rotation poller.");
}
} catch (InterruptedException e) {
LOG.error("Error attempting to shutdown.", e);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WARN instead of ERROR is more appropriate here, Line 129 too. Here is an example,

LOG.warn("{} couldn't be stopped gracefully", getClass().getSimpleName());

}
} catch (InterruptedException e) {
LOG.error("Error attempting to shutdown.", e);
executor.shutdownNow();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can avoid call the executor.shutdownNow() here, for shutdown is already called, and without the explicitly check isInterrupted() or any wait() like function, shutdownNow will have the same effect as shutdown.

}
});
} catch (IOException e) {
LOG.error("Error while trying to rotate root ca certificate", e);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Error while trying to rotate root ca certificate
->
Error while trying to poll root ca certificates

this.knownRootCerts = initiallyKnownRootCaCerts;
poller = Executors.newScheduledThreadPool(1,
new ThreadFactoryBuilder().setNameFormat(
"RootCaRotationPoller")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better use "this.getClass().getSimpleName()" to avoid hard code class name in case it's changed later.

public static final String HDDS_X509_CA_ROTATION_ACK_TIMEOUT_DEFAULT =
"PT15M";
public static final String HDDS_X509_ROOTCA_CLIENT_POLLING_INTERVAL =
"hdds.x509.rootca.client.polling.interval";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest change the name to "hdds.x509.rootca.certificate.polling.interval".

return caAckTimeout;
}

public Duration getRootCaClientPollingInterval() {
Copy link
Contributor

@ChenSammi ChenSammi Jul 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Galsza , the client -> certificate change is missed in this SecurityConfig.java .

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ChenSammi thanks I've addressed it now

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, let's wait for the CI to pass.

Copy link
Contributor

@ChenSammi ChenSammi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The last patch LGTM +1.

@ChenSammi ChenSammi merged commit 343df6a into apache:master Jul 6, 2023
errose28 added a commit to errose28/ozone that referenced this pull request Jul 10, 2023
* master: (36 commits)
  HDDS-8990. Intermittent timeout waiting on datanode4 9856 to become available (apache#5039)
  Revert "HDDS-7750. Incorrect WRITE ACL check. (apache#4992)"
  HDDS-7750. Incorrect WRITE ACL check. (apache#4992)
  HDDS-8985. Intermittent timeout exiting safe mode in HA secure tests (apache#5033)
  HDDS-8593. Add RootCARotationPoller to CertClient (apache#5030)
  HDDS-7645. Kubernetes check should fail fast if cluster cannot start (apache#5028)
  HDDS-8981. TestRootedOzoneFileSystem runs out of disk space (apache#5029)
  HDDS-8592. Fetch and save all root certificates during service's certificate rotation. (apache#5025)
  HDDS-8981. Disable TestRootedOzoneFileSystem#testSafeMode
  HDDS-8591. Create scheduler to check for new root ca certificates (apache#4961)
  HDDS-8979. error validating kustomization.yaml (apache#5024)
  HDDS-8973. Ozone SCM HA should not allocates duplicate IDs when transferring leadership (apache#5018)
  HDDS-8970. Snapshot Diff should return path relative to bucket root (apache#5015)
  HDDS-8975. Clarify SCM HA auto-bootstrap doc (apache#5021)
  HDDS-8689. Rotate Root CA and Sub CA in SCM. (apache#4943)
  HDDS-8436. Support setSafeMode(), isFileClosed() FileSystem API (apache#4825)
  HDDS-8880. Intermittent fork timeout in TestOMRatisSnapshots (apache#5022)
  HDDS-8962. Ensure docker env is stopped (apache#5011)
  HDDS-7794. [snapshot] SnapshotDiff should throw better error messages for exception handling (apache#5007)
  HDDS-7922. [FSO] S3G folder support fso layout filestatus s3A compatibility (apache#4448)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants