Skip to content

Conversation

@Galsza
Copy link
Contributor

@Galsza Galsza commented Jul 6, 2023

What changes were proposed in this pull request?

After all preparation has been made for the root ca rotation poller, it can be started in DefaultCertificateClient. SCMCertificateClient should be ommited from this since SCM will get Root CA rotation done separately.

Note: these are the same changes created in #5001 but due to the rebasing complexity I found it easier to recreate the pull request from scratch

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-8593

How was this patch tested?

https://github.com/Galsza/ozone/actions/runs/5478158390
unit/integration tests, more extended docker based testing will be created in a separate jira

getLogger().info("CertificateLifetimeMonitor is disabled for {}",
component);
}
startRootCaRotationPoller();
Copy link
Contributor

@ChenSammi ChenSammi Jul 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SCMCertificateClient doesn't need this poller. You can move this under line 174 if statement. shouldStartCertificateMonitor can be changed to shouldStartCertificateMonitorService to cover both the service's lifetime monitor, rotation and root ca certificates monitor and fetch.

@ChenSammi
Copy link
Contributor

ChenSammi commented Jul 7, 2023

@Galsza , could you add some tests in ozonesecure/test-root-ca-rotation.sh, verify that after root ca rotation is finished, this poller will detect the changes of root ca certificates, trigger the leaf service rotation ahead of time, and there are on disk ROOTCA-1.crt and ROOTCA-2.crt file under it's metadata directory?

@Galsza
Copy link
Contributor Author

Galsza commented Jul 7, 2023

Unfortunately there is no good way to distinguish between the leaf cert rotation due to cert expiry and due to root ca cert rotation. This is because it's very difficult to come up with a grace period/rotation time which ensures that the tests are short and the cert rotation isn't timed right at the root ca cert rotation. Based on the logs of the datanode it could be tested, but robot tests can't check logs.

I have checked manually in the logs that the CertificateRenewerService is indeed triggered with forceRenewal = true during root ca rotation.

The test now is updated to look for the ROOTCA-2.crt file, which should appear in the datanode's metadata directory.

Copy link
Contributor

@ChenSammi ChenSammi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Galsza, the patch LGTM, +1.

@ChenSammi ChenSammi merged commit 3bf9b69 into apache:master Jul 9, 2023
errose28 added a commit to errose28/ozone that referenced this pull request Jul 10, 2023
* master: (36 commits)
  HDDS-8990. Intermittent timeout waiting on datanode4 9856 to become available (apache#5039)
  Revert "HDDS-7750. Incorrect WRITE ACL check. (apache#4992)"
  HDDS-7750. Incorrect WRITE ACL check. (apache#4992)
  HDDS-8985. Intermittent timeout exiting safe mode in HA secure tests (apache#5033)
  HDDS-8593. Add RootCARotationPoller to CertClient (apache#5030)
  HDDS-7645. Kubernetes check should fail fast if cluster cannot start (apache#5028)
  HDDS-8981. TestRootedOzoneFileSystem runs out of disk space (apache#5029)
  HDDS-8592. Fetch and save all root certificates during service's certificate rotation. (apache#5025)
  HDDS-8981. Disable TestRootedOzoneFileSystem#testSafeMode
  HDDS-8591. Create scheduler to check for new root ca certificates (apache#4961)
  HDDS-8979. error validating kustomization.yaml (apache#5024)
  HDDS-8973. Ozone SCM HA should not allocates duplicate IDs when transferring leadership (apache#5018)
  HDDS-8970. Snapshot Diff should return path relative to bucket root (apache#5015)
  HDDS-8975. Clarify SCM HA auto-bootstrap doc (apache#5021)
  HDDS-8689. Rotate Root CA and Sub CA in SCM. (apache#4943)
  HDDS-8436. Support setSafeMode(), isFileClosed() FileSystem API (apache#4825)
  HDDS-8880. Intermittent fork timeout in TestOMRatisSnapshots (apache#5022)
  HDDS-8962. Ensure docker env is stopped (apache#5011)
  HDDS-7794. [snapshot] SnapshotDiff should throw better error messages for exception handling (apache#5007)
  HDDS-7922. [FSO] S3G folder support fso layout filestatus s3A compatibility (apache#4448)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants