-
Notifications
You must be signed in to change notification settings - Fork 588
HDDS-8689. Rotate Root CA and Sub CA in SCM. #4943
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
15 commits
Select commit
Hold shift + click to select a range
7f91ea4
HDDS-8689. Rotate Root CA and Sub CA in SCM.
ChenSammi 8da1191
fix checkstyle and refactor robot test
ChenSammi 382f9fc
fix findbugs
ChenSammi 35a6a44
rebase fix
ChenSammi cd2dd6e
fix test
ChenSammi 7f3a4a4
fix findbugs
ChenSammi 11c9429
change the way to fetch HA group member count
ChenSammi 0567024
check if rotation can be skipped after get the lock resource
ChenSammi 05097b0
address comments and improve robot test
ChenSammi e5d76d6
rollback the changes in ci.yml for previous debug
ChenSammi 862c21b
disable monitor task in SCMCertificateClient
ChenSammi 7cb7937
improve robot test and move saving root certificate into rocksdb to p…
ChenSammi 494ecc9
address comments
ChenSammi 4287822
override shouldStartCertificateMonitor in SCMCertificateClient
ChenSammi 0e0b8e4
increase max certificate lifetime to reduce the possibility of random…
ChenSammi File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -35,6 +35,7 @@ | |
| import java.security.PrivateKey; | ||
| import java.security.cert.X509Certificate; | ||
| import java.util.ArrayList; | ||
| import java.util.Arrays; | ||
| import java.util.List; | ||
| import java.util.Locale; | ||
| import java.util.concurrent.atomic.AtomicReference; | ||
|
|
@@ -60,12 +61,14 @@ public class ReloadingX509KeyManager extends X509ExtendedKeyManager { | |
| */ | ||
| static final char[] EMPTY_PASSWORD = new char[0]; | ||
| private final AtomicReference<X509ExtendedKeyManager> keyManagerRef; | ||
|
|
||
| /** | ||
| * Current private key and cert used in keyManager. Used to detect if these | ||
| * materials are changed. | ||
| */ | ||
| private PrivateKey currentPrivateKey; | ||
| private List<String> currentCertIdsList = new ArrayList<>(); | ||
| private String alias; | ||
|
|
||
| /** | ||
| * Construct a <code>Reloading509KeystoreManager</code>. | ||
|
|
@@ -85,15 +88,62 @@ public ReloadingX509KeyManager(String type, CertificateClient caClient) | |
| @Override | ||
| public String chooseEngineClientAlias(String[] strings, | ||
| Principal[] principals, SSLEngine sslEngine) { | ||
| return keyManagerRef.get() | ||
| String ret = keyManagerRef.get() | ||
| .chooseEngineClientAlias(strings, principals, sslEngine); | ||
|
|
||
| if (ret == null) { | ||
| /* | ||
| Workaround to address that netty tc-native cannot handle the dynamic | ||
| key and certificate refresh well. What happens is during the setup of | ||
| the grpc channel, an SSLContext is created, which is | ||
| ReferenceCountedOpenSslServerContext in the native tc-native case. | ||
| This class uses the TrustManager's getAcceptedIssuers() as the trusted | ||
| CA certificate list. The list is not updated after channel is built. | ||
| With the list being used to present the Principals during the mTLS | ||
| authentication via the Netty channel under Ratis implementation, | ||
| the counterpart(client) KeyManager's | ||
| chooseEngineClientAlias(String, Principal[], SSLEngine) method is | ||
| called with this old root certificate subject principal, which is now | ||
| not available in the new Key Manager after refreshed, so the method | ||
| will return null, which cause the mutual TLS connection establish | ||
| failure. | ||
|
|
||
| Example error message: | ||
| Engine client aliases for RSA, DH_RSA, EC, EC_RSA, EC_EC, | ||
| O=CID-f9f2b2cf-a784-49d7-8577-5d3b13bf0b46, | ||
| OU=9f52487c-f8f9-45ee-bb56-aca60b56327f, | ||
| [email protected], | ||
| org.apache.ratis.thirdparty.io.netty.handler.ssl.OpenSslEngine@5eec0d10 | ||
| is null | ||
|
|
||
| Example success message: | ||
| Engine client aliases for RSA, DH_RSA, EC, EC_RSA, EC_EC, | ||
| O=CID-f9f2b2cf-a784-49d7-8577-5d3b13bf0b46, | ||
| OU=9f52487c-f8f9-45ee-bb56-aca60b56327f, | ||
| [email protected], | ||
| org.apache.ratis.thirdparty.io.netty.handler.ssl.OpenSslEngine@5eec0d10 | ||
| is scm/sub-ca_key | ||
| */ | ||
| ret = alias; | ||
| LOG.info("Engine client aliases for {}, {}, {} is returned as {}", | ||
| strings == null ? "" : Arrays.toString(strings), | ||
| principals == null ? "" : Arrays.toString(principals), | ||
| sslEngine == null ? "" : sslEngine, ret); | ||
| } | ||
| return ret; | ||
| } | ||
|
|
||
| @Override | ||
| public String chooseEngineServerAlias(String s, Principal[] principals, | ||
| SSLEngine sslEngine) { | ||
| return keyManagerRef.get() | ||
| String ret = keyManagerRef.get() | ||
| .chooseEngineServerAlias(s, principals, sslEngine); | ||
| if (ret == null && LOG.isDebugEnabled()) { | ||
| LOG.debug("Engine server aliases for {}, {}, {} is null", s, | ||
| principals == null ? "" : Arrays.toString(principals), | ||
| sslEngine == null ? "" : sslEngine); | ||
| } | ||
| return ret; | ||
| } | ||
|
|
||
| @Override | ||
|
|
@@ -138,7 +188,7 @@ public ReloadingX509KeyManager loadFrom(CertificateClient caClient) { | |
| try { | ||
| X509ExtendedKeyManager manager = loadKeyManager(caClient); | ||
| if (manager != null) { | ||
| this.keyManagerRef.set(manager); | ||
| keyManagerRef.set(manager); | ||
| LOG.info("ReloadingX509KeyManager is reloaded"); | ||
| } | ||
| } catch (Exception ex) { | ||
|
|
@@ -155,9 +205,8 @@ private X509ExtendedKeyManager loadKeyManager(CertificateClient caClient) | |
| if (currentPrivateKey != null && currentPrivateKey.equals(privateKey) && | ||
| currentCertIdsList.size() > 0 && | ||
| newCertList.size() == currentCertIdsList.size() && | ||
| !newCertList.stream().filter( | ||
| c -> !currentCertIdsList.contains(c.getSerialNumber().toString())) | ||
| .findAny().isPresent()) { | ||
| newCertList.stream().allMatch(c -> | ||
| currentCertIdsList.contains(c.getSerialNumber().toString()))) { | ||
| // Security materials(key and certificates) keep the same. | ||
| return null; | ||
| } | ||
|
|
@@ -166,10 +215,15 @@ private X509ExtendedKeyManager loadKeyManager(CertificateClient caClient) | |
| KeyStore keystore = KeyStore.getInstance(type); | ||
| keystore.load(null, null); | ||
|
|
||
| keystore.setKeyEntry(caClient.getComponentName() + "_key", | ||
| privateKey, EMPTY_PASSWORD, | ||
| alias = caClient.getComponentName() + "_key"; | ||
| keystore.setKeyEntry(alias, privateKey, EMPTY_PASSWORD, | ||
| newCertList.toArray(new X509Certificate[0])); | ||
|
|
||
| LOG.info("Key manager is loaded with certificate chain"); | ||
| for (X509Certificate x509Certificate : newCertList) { | ||
| LOG.info(x509Certificate.toString()); | ||
| } | ||
|
|
||
| KeyManagerFactory keyMgrFactory = KeyManagerFactory.getInstance( | ||
| KeyManagerFactory.getDefaultAlgorithm()); | ||
| keyMgrFactory.init(keystore, EMPTY_PASSWORD); | ||
|
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We had a long discussion on this one today, and on Tuesday as well, let me summarize what we have found and agreed on.
The initial problem with the workaround from my side was that it just seems inappropriate to preserve all the old key manager references, and go over all of them, I wanted to understand why it works, and what else we can do.
The final cause for the problem as we determined is within the Netty tcnative code as you have commented as well. The following happens:
During the setup of the connections there is an
SSLContextto be created, that uses theTrustManager'sgetAcceptedIssuers()method, and within theReferenceCountedOpenSslServerContextthe results are cached to the native layer via BIO. This cached value is not updated ever after initialization, but based on our debugging, that value is used to present to theKeyManagercounterpart for mTLS authentication when thisSSLContextis used for communication via the bidirectional Netty channel under the Ratis implementation.During the communication, the KeyManager's
chooseEngineClientAlias(String, Principal[], SSLEngine)method is being called, and the cached data is used to present the Principals to thechooseEngineClientAlias()method.The original
KeyManagerwe have the reference for has a mechanism to select the certificate and key alias based on the provided principals, and after the first rotation, the newKeyManagerwill not have anything to provide to the original certificates that were present to the native layer via thegetAcceptedIssuers()call, therefore once we have rotated the certificate, thechooseEngineClientAlias()call returned null, and therefore in the Ratis layer the mTLS authentication failed, then was retried without a chance to succeed.The workaround works, as at the end of the day it goes back in time to the first
KeyManager, which still is able to resolve and then return the same alias that was/is used for all the earlier and the currentKeyManagerto store the certificates to be used after the actual rotation.Based on these we agreed that this class should not have the workaround implemented this way, but instead we should stick to the alias we use at the creation of the
KeyManagerin the load methods.The reason is that, we just put one private key and one certificate to every
KeyManagerinstance when we create it in theloadKeyManager()method, and we can be sure that once there is query for a key and certificate, we want to use the only key and certificate that is present.We discussed an alternative, where we use the same
SubjectDNfor all the rootCA certificates, but that causes operational trouble when we want to understand certificate chaining during debugging.Furthermore I believe that as we have just one key and one certificate in every KeyManager, we can safely return the same alias from all the methods that are there to request an alias (
chooseEngineClientAlias,chooseEngineServerAlias,getClientAliases,chooseClientAlias,getServerAliases,chooseServerAlias), and we can safely return the only key and only certificate from thegetPrivateKeyandgetCertificateChainmethods, but that is up for you to consider @ChenSammi I am fine with delegating the rest down to a non-custom engine providedKeyManagerinstance.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, it's a wonder discussion. @fapifta ,thanks for the further digging. I will update the comments with more detail info explain the work around the the issue. My current thought is keeping the workaround in a small scope as possbile, try not to bring any unnecessary side effect. So besides chooseEngineClientAlias which needs the workaround now, for other KeyManager functions which doesn't affect by this issue, I tend to keep them untouched, unless later we found they are affected too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, we can look into that later, agree to keep the scope low, just wanted to note that this might as well be a good approach.