Skip to content

Conversation

@kaijchen
Copy link
Member

@kaijchen kaijchen commented Apr 19, 2022

What changes were proposed in this pull request?

WIP: this PR depends on #3233 and #3226.

This pull request completes the open key cleanup service outlined in the parent Jira HDDS-4120. It implements the OpenKeyCleanupService class, and starts and stops the service in KeyManagerImpl. The following configurations have been defined to specify the service's behavior:

  1. ozone.open.key.cleanup.service.interval (default value 24 hours)
  2. ozone.open.key.expire.threshold (default value 1 week)
  3. ozone.open.key.cleanup.limit.per.task (default value 1000 keys)

See ozone-defaults.xml for their corresponding descriptions.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-4123

How was this patch tested?

Integration test TestOpenKeyCleanupService

@kaijchen kaijchen marked this pull request as ready for review June 6, 2022 12:24
@kaijchen
Copy link
Member Author

kaijchen commented Jun 6, 2022

@errose28 @captainzmc PTAL.

@kaijchen
Copy link
Member Author

kaijchen commented Jun 7, 2022

@captainzmc captainzmc requested review from captainzmc and errose28 June 7, 2022 11:22
Copy link
Member

@captainzmc captainzmc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @kaijchen for the working, LGTM. Just have some minor comments here.

@captainzmc
Copy link
Member

Hi @errose28, this task is the last part of HDDS-4120, through which open key can be cleaned normally. Could you also help review this?

@kaijchen
Copy link
Member Author

kaijchen commented Jun 7, 2022

Thanks @captainzmc for the review. I think there is one more task left after this, HDDS-6769.

@captainzmc
Copy link
Member

Hi @kaijchen , did you tested this patch in local cluster? For example, keep writing failed keys. The clean up service keep deleting in the background. Run a few days for stability test to see if there's anything problem.

@kaijchen
Copy link
Member Author

Yes, I have tested it in a local cluster. From the log we can see open key cleanup service works as expected.
The loglevel was changed to INFO for testing purpose.

2022-06-17 15:52:14,891 [OpenKeyCleanupService#0] INFO org.apache.hadoop.ozone.om.OpenKeyCleanupService: running open key cleanup task
2022-06-17 15:52:14,892 [OpenKeyCleanupService#0] INFO org.apache.hadoop.ozone.om.OpenKeyCleanupService: running open key cleanup task
2022-06-17 16:02:14,891 [OpenKeyCleanupService#0] INFO org.apache.hadoop.ozone.om.OpenKeyCleanupService: running open key cleanup task
2022-06-17 16:02:14,922 [OpenKeyCleanupService#0] INFO org.apache.hadoop.ozone.om.OpenKeyCleanupService: Number of expired keys submitted for deletion: 20, elapsed time: 31ms
2022-06-17 16:02:14,922 [OpenKeyCleanupService#0] INFO org.apache.hadoop.ozone.om.OpenKeyCleanupService: running open key cleanup task
2022-06-17 16:12:14,891 [OpenKeyCleanupService#0] INFO org.apache.hadoop.ozone.om.OpenKeyCleanupService: running open key cleanup task
2022-06-17 16:12:14,891 [OpenKeyCleanupService#0] INFO org.apache.hadoop.ozone.om.OpenKeyCleanupService: running open key cleanup task

@kaijchen
Copy link
Member Author

Limiting number of expired keys is working as expected.

2022-06-18 22:42:14,910 [OpenKeyCleanupService#0] INFO org.apache.hadoop.ozone.om.OpenKeyCleanupService: running open key cleanup task
2022-06-18 22:42:14,976 [OpenKeyCleanupService#0] INFO org.apache.hadoop.ozone.om.OpenKeyCleanupService: running open key cleanup task
2022-06-18 22:52:14,910 [OpenKeyCleanupService#0] INFO org.apache.hadoop.ozone.om.OpenKeyCleanupService: running open key cleanup task
2022-06-18 22:52:14,990 [OpenKeyCleanupService#0] INFO org.apache.hadoop.ozone.om.OpenKeyCleanupService: Number of expired keys submitted for deletion: 1000, elapsed time: 80ms
2022-06-18 22:52:14,990 [OpenKeyCleanupService#0] INFO org.apache.hadoop.ozone.om.OpenKeyCleanupService: running open key cleanup task
2022-06-18 23:02:14,910 [OpenKeyCleanupService#0] INFO org.apache.hadoop.ozone.om.OpenKeyCleanupService: running open key cleanup task
2022-06-18 23:02:14,995 [OpenKeyCleanupService#0] INFO org.apache.hadoop.ozone.om.OpenKeyCleanupService: Number of expired keys submitted for deletion: 1000, elapsed time: 86ms
2022-06-18 23:02:14,996 [OpenKeyCleanupService#0] INFO org.apache.hadoop.ozone.om.OpenKeyCleanupService: running open key cleanup task
2022-06-18 23:12:14,910 [OpenKeyCleanupService#0] INFO org.apache.hadoop.ozone.om.OpenKeyCleanupService: running open key cleanup task
2022-06-18 23:12:14,979 [OpenKeyCleanupService#0] INFO org.apache.hadoop.ozone.om.OpenKeyCleanupService: Number of expired keys submitted for deletion: 108, elapsed time: 69ms
2022-06-18 23:12:14,979 [OpenKeyCleanupService#0] INFO org.apache.hadoop.ozone.om.OpenKeyCleanupService: running open key cleanup task
2022-06-18 23:22:14,910 [OpenKeyCleanupService#0] INFO org.apache.hadoop.ozone.om.OpenKeyCleanupService: running open key cleanup task
2022-06-18 23:22:14,965 [OpenKeyCleanupService#0] INFO org.apache.hadoop.ozone.om.OpenKeyCleanupService: running open key cleanup task
2022-06-18 23:32:14,910 [OpenKeyCleanupService#0] INFO org.apache.hadoop.ozone.om.OpenKeyCleanupService: running open key cleanup task

@captainzmc
Copy link
Member

Limiting number of expired keys is working as expected.

The tests looked good, thanks @kaijchen working on this. Let's merge this PR first, if other modifications are needed, we can submit a new patch.

@captainzmc captainzmc merged commit 321f585 into apache:master Jun 20, 2022
@kaijchen
Copy link
Member Author

Thanks @captainzmc for reviewing and merging this.

@kaijchen kaijchen deleted the HDDS-4123 branch June 20, 2022 02:23
errose28 added a commit to errose28/ozone that referenced this pull request Jun 23, 2022
* master: (34 commits)
  HDDS-6868 Add S3Auth information to thread local (apache#3527)
  HDDS-6877. Keep replication port unchanged when restarting datanode in MiniOzoneCluster (apache#3510)
  HDDS-6907. OFS should create buckets with FILE_SYSTEM_OPTIMIZED layout. (apache#3528)
  HDDS-6875. Migrate parameterized tests in hdds-common to JUnit5 (apache#3513)
  HDDS-6924. OBJECT_STORE isn't flat namespaced (apache#3533)
  HDDS-6899. [EC] Remove warnings and errors from console during online reconstruction of data. (apache#3522)
  HDDS-6695. Enable SCM Ratis by default for new clusters only (apache#3499)
  HDDS-4123. Integrate OM Open Key Cleanup Service Into Existing Code (apache#3319)
  HDDS-6882. Correct exit code for invalid arguments passed to command-line tools. (apache#3517)
  HDDS-6890. EC: Fix potential wrong replica read with over-replicated container. (apache#3523)
  HDDS-6902. Duplicate mockito-core entries in pom.xml (apache#3525)
  HDDS-6752. Migrate tests with rules in hdds-server-scm to JUnit5 (apache#3442)
  HDDS-6806. EC: Implement the EC Reconstruction coordinator. (apache#3504)
  HDDS-6829. Limit the no of inflight replication tasks in SCM. (apache#3482)
  HDDS-6898. [SCM HA finalization] Modify acceptance test configuration to speed up test finalization (apache#3521)
  HDDS-6577. Configurations to reserve HDDS volume space. (apache#3484)
  HDDS-6870 Clean up isTenantAdmin to use UGI (apache#3503)
  HDDS-6872. TestAuthorizationV4QueryParser should pass offline (apache#3506)
  HDDS-6840. Add MetaData volume information to the SCM and OM - UI (apache#3488)
  HDDS-6697. EC: ReplicationManager - create class to detect EC container health issues (apache#3512)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants