-
Notifications
You must be signed in to change notification settings - Fork 588
HDDS-8591. Create scheduler to check for new root ca certificates #4961
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HDDS-8591. Create scheduler to check for new root ca certificates #4961
Conversation
|
@ChenSammi, @fapifta please take a look at this patch |
ashishkumar50
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Galsza, Thanks for working on this. Please find few comments.
| </description> | ||
| </property> | ||
| <property> | ||
| <name>ozone.scm.security.service.address</name> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ozone.scm.security.service.address This is duplicate config, already there is one. Don't see used in the new code, may be by mistake added. Please remove it.
| executor.shutdownNow(); | ||
| } | ||
| if (!executor.awaitTermination(5, TimeUnit.SECONDS)) { | ||
| LOG.error("Unable to shutdown state machine properly."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please correct log wrt cert and not state machine.
| if (!knownRootCerts.containsAll(scmRootCaCerts)) { | ||
| rootCaListConsumers.forEach(c -> c.accept(scmRootCaCerts)); | ||
| knownRootCerts = new HashSet<>(); | ||
| knownRootCerts.addAll(scmRootCaCerts); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be nice to have a log here, when there is a new cert?
| scmSecureClient.getAllRootCaCertificates(); | ||
| List<X509Certificate> scmRootCaCerts = | ||
| OzoneSecurityUtil.convertToX509(pemEncodedRootCaList); | ||
| if (!knownRootCerts.containsAll(scmRootCaCerts)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Objects(X509Certificate) inside knownRootCerts and scmRootCaCerts are same when both have same content? Can you verify if it is fine. If not it will always enter into this condition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two certs are the same when their encoded content is the same. I think this should be good enough. I'm going to discuss this with @fapifta just to be sure and make a local test for it.
|
Converted to draft for now because some additional retry functionality needs to be created for each consumer in the poller. |
|
@ashishkumar50 Please take another look! |
| knownCerts.add(knownCert); | ||
| List<String> certsFromScm = new ArrayList<>(); | ||
| certsFromScm.add(CertificateCodec.getPEMEncodedString(knownCert)); | ||
| certsFromScm.add(CertificateCodec.getPEMEncodedString(newRootCa)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you have small test by just adding knowCert to ensure both cases are fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what you meant by this, but I have refined the first test case to use the knownCert instead of not using certs at all. I hope this fixes it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I meant the same. Thanks for updating it.
fapifta
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
Thanks @Galsza, LGTM +1. |
|
Moving to draft until Sammi's patch is merged: #4943 |
d719380 to
a89ce73
Compare
|
Rebased to Sammi's change, since #4943 is merged now |
|
@fapifta @adoroszlai @ChenSammi this pr is ready now, please take another look! |
| return StringUtils.join(certs.stream() | ||
| .map(X509Certificate::getSerialNumber) | ||
| .map(BigInteger::toString) | ||
| .collect(Collectors.toList()), ", "); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can use stream function to achieve the join, like this
certs.stream().map(X509Certificate::getSerialNumber).map(BigInteger::toString).collect(Collectors.joining(", "));
| LOG.error("Unable to shutdown root ca certificate rotation poller."); | ||
| } | ||
| } catch (InterruptedException e) { | ||
| LOG.error("Error attempting to shutdown.", e); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WARN instead of ERROR is more appropriate here, Line 129 too. Here is an example,
LOG.warn("{} couldn't be stopped gracefully", getClass().getSimpleName());
| } | ||
| } catch (InterruptedException e) { | ||
| LOG.error("Error attempting to shutdown.", e); | ||
| executor.shutdownNow(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can avoid call the executor.shutdownNow() here, for shutdown is already called, and without the explicitly check isInterrupted() or any wait() like function, shutdownNow will have the same effect as shutdown.
| } | ||
| }); | ||
| } catch (IOException e) { | ||
| LOG.error("Error while trying to rotate root ca certificate", e); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Error while trying to rotate root ca certificate
->
Error while trying to poll root ca certificates
| this.knownRootCerts = initiallyKnownRootCaCerts; | ||
| poller = Executors.newScheduledThreadPool(1, | ||
| new ThreadFactoryBuilder().setNameFormat( | ||
| "RootCaRotationPoller") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better use "this.getClass().getSimpleName()" to avoid hard code class name in case it's changed later.
| public static final String HDDS_X509_CA_ROTATION_ACK_TIMEOUT_DEFAULT = | ||
| "PT15M"; | ||
| public static final String HDDS_X509_ROOTCA_CLIENT_POLLING_INTERVAL = | ||
| "hdds.x509.rootca.client.polling.interval"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest change the name to "hdds.x509.rootca.certificate.polling.interval".
| return caAckTimeout; | ||
| } | ||
|
|
||
| public Duration getRootCaClientPollingInterval() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Galsza , the client -> certificate change is missed in this SecurityConfig.java .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ChenSammi thanks I've addressed it now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, let's wait for the CI to pass.
ChenSammi
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The last patch LGTM +1.
* master: (36 commits) HDDS-8990. Intermittent timeout waiting on datanode4 9856 to become available (apache#5039) Revert "HDDS-7750. Incorrect WRITE ACL check. (apache#4992)" HDDS-7750. Incorrect WRITE ACL check. (apache#4992) HDDS-8985. Intermittent timeout exiting safe mode in HA secure tests (apache#5033) HDDS-8593. Add RootCARotationPoller to CertClient (apache#5030) HDDS-7645. Kubernetes check should fail fast if cluster cannot start (apache#5028) HDDS-8981. TestRootedOzoneFileSystem runs out of disk space (apache#5029) HDDS-8592. Fetch and save all root certificates during service's certificate rotation. (apache#5025) HDDS-8981. Disable TestRootedOzoneFileSystem#testSafeMode HDDS-8591. Create scheduler to check for new root ca certificates (apache#4961) HDDS-8979. error validating kustomization.yaml (apache#5024) HDDS-8973. Ozone SCM HA should not allocates duplicate IDs when transferring leadership (apache#5018) HDDS-8970. Snapshot Diff should return path relative to bucket root (apache#5015) HDDS-8975. Clarify SCM HA auto-bootstrap doc (apache#5021) HDDS-8689. Rotate Root CA and Sub CA in SCM. (apache#4943) HDDS-8436. Support setSafeMode(), isFileClosed() FileSystem API (apache#4825) HDDS-8880. Intermittent fork timeout in TestOMRatisSnapshots (apache#5022) HDDS-8962. Ensure docker env is stopped (apache#5011) HDDS-7794. [snapshot] SnapshotDiff should throw better error messages for exception handling (apache#5007) HDDS-7922. [FSO] S3G folder support fso layout filestatus s3A compatibility (apache#4448) ...
What changes were proposed in this pull request?
After the SCMs finished their part of root ca rotation the clients also need to get the new root CA certificate. This is done through a simple polling mechanism that asks the SCMs for the root CAs and once it observes a change it invokes a consumer from the clients to notify them. Integrating this change is going to be done in later items.
What is the link to the Apache JIRA
HDDS-8591
How was this patch tested?
Added unit tests. Functional change is not yet observed in this patch as the polling is not actually integrated into anything yet.