Skip to content

Conversation

@smengcl
Copy link
Contributor

@smengcl smengcl commented Jul 17, 2025

What changes were proposed in this pull request?

This patch makes ozone.snapshot.filtering.service.interval reconfigurable to allow live enabling/disabling/restarting SST filtering service in addition to service interval reconfiguration.

A fix has also been included in this PR: any negative value (<= 0) for ozone.snapshot.filtering.service.interval would disable the SstFilteringService now. Otherwise it throws IllegalArgumentException from scheduleWithFixedDelay.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-13464

How was this patch tested?

  • Add reconfiguration test case. Example OM log output during reconfiguration:
2025-07-21 19:28:38,278 [main] INFO  utils.BackgroundService (BackgroundService.java:shutdown(178)) - Shutting down service SstFilteringService
2025-07-21 19:28:38,278 [main] INFO  utils.BackgroundService (BackgroundService.java:start(118)) - Starting service SstFilteringService with interval 30000 milliseconds
2025-07-21 19:28:38,282 [main] INFO  utils.BackgroundService (BackgroundService.java:shutdown(178)) - Shutting down service SstFilteringService
2025-07-21 19:28:38,282 [main] INFO  om.KeyManagerImpl (KeyManagerImpl.java:startSnapshotSstFilteringService(377)) - SstFilteringService is disabled.
2025-07-21 19:28:38,282 [main] INFO  utils.BackgroundService (BackgroundService.java:start(118)) - Starting service SstFilteringService with interval 60000 milliseconds
  • Also tested manually in Docker dev env
bash-5.1$ ozone admin reconfig --service=OM --address=om:9862 properties
OM: Node [om:9862] Reconfigurable properties:
ozone.administrators
ozone.directory.deleting.service.interval
ozone.key.deleting.limit.per.task
ozone.om.server.list.max.size
ozone.om.volume.listall.allowed
ozone.readonly.administrators
ozone.snapshot.filtering.service.interval
ozone.thread.number.dir.deletion
bash-5.1$ ozone admin reconfig --service=OM --address=om:9862 status
OM: Reconfiguring status for node [om:9862]: no task was found.

Stopping SstFilteringService:

bash-5.1$ sed -i '/<\/configuration>/i\
<property><name>ozone.snapshot.filtering.service.interval</name><value>-1</value></property>
' /etc/hadoop/ozone-site.xml
bash-5.1$ tail -3 /etc/hadoop/ozone-site.xml
<property><name>ozone.filesystem.snapshot.enabled</name><value>true</value></property>
  <property><name>ozone.snapshot.filtering.service.interval</name><value>-1</value></property>
</configuration>
bash-5.1$ ozone admin reconfig --service=OM --address=om:9862 start
OM: Started reconfiguration task on node [om:9862].
bash-5.1$ ozone admin reconfig --service=OM --address=om:9862 status
OM: Reconfiguring status for node [om:9862]: started at Tue Jul 22 01:55:58 UTC 2025 and finished at Tue Jul 22 01:55:58 UTC 2025.
SUCCESS: Changed property ozone.snapshot.filtering.service.interval
	From: "1m"
	To: "-1"

From OM logs:

...
2025-07-22 02:24:53,341 [Reconfiguration Task] INFO conf.ReconfigurableBase: Change property: ozone.snapshot.filtering.service.interval from "1m" to "-1".
2025-07-22 02:24:53,341 [Reconfiguration Task] INFO utils.BackgroundService: Shutting down service SstFilteringService
2025-07-22 02:24:53,342 [Reconfiguration Task] INFO om.KeyManagerImpl: SstFilteringService is disabled.

Starting SstFilteringService again:

bash-5.1$ sed -i '/<name>ozone\.snapshot\.filtering\.service\.interval<\/name>/{
  N; s|<value>.*</value>|<value>20s</value>|
}' /etc/hadoop/ozone-site.xml
bash-5.1$ tail -3 /etc/hadoop/ozone-site.xml
<property><name>ozone.filesystem.snapshot.enabled</name><value>true</value></property>
  <property><name>ozone.snapshot.filtering.service.interval</name><value>20s</value></property>
</configuration>
bash-5.1$ ozone admin reconfig --service=OM --address=om:9862 start
OM: Started reconfiguration task on node [om:9862].
bash-5.1$ ozone admin reconfig --service=OM --address=om:9862 status
OM: Reconfiguring status for node [om:9862]: started at Tue Jul 22 01:59:51 UTC 2025 and finished at Tue Jul 22 01:59:51 UTC 2025.
SUCCESS: Changed property ozone.snapshot.filtering.service.interval
	From: "-1"
	To: "20s"

OM logs this time:

2025-07-22 02:26:39,834 [Reconfiguration Task] INFO conf.ReconfigurableBase: Change property: ozone.snapshot.filtering.service.interval from "-1" to "20s".
2025-07-22 02:26:39,835 [Reconfiguration Task] INFO utils.BackgroundService: Starting service SstFilteringService with interval 20000 milliseconds

Note if the config is not changed, ReconfigurationHandler will not be triggered (thus SstFilteringService will not be restarted in this case):

bash-5.1$ tail -3 /etc/hadoop/ozone-site.xml
<property><name>ozone.filesystem.snapshot.enabled</name><value>true</value></property>
  <property><name>ozone.snapshot.filtering.service.interval</name><value>20s</value></property>
</configuration>
bash-5.1$ ozone admin reconfig --service=OM --address=om:9862 start
OM: Started reconfiguration task on node [om:9862].
bash-5.1$ ozone admin reconfig --service=OM --address=om:9862 status
OM: Reconfiguring status for node [om:9862]: started at Tue Jul 22 02:01:14 UTC 2025 and finished at Tue Jul 22 02:01:14 UTC 2025.

OM logs:

2025-07-22 02:01:14,455 [Reconfiguration Task] INFO conf.ReconfigurationHandler: Reconfiguration complete. No properties were changed.

@smengcl smengcl requested a review from swamirishi July 17, 2025 22:50
@jojochuang jojochuang added the snapshot https://issues.apache.org/jira/browse/HDDS-6517 label Jul 18, 2025
Copy link
Contributor

@jojochuang jojochuang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Loos good to me. But why make it reconfigurble?

Copy link
Contributor

@swamirishi swamirishi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM @smengcl thanks for the patch

@swamirishi
Copy link
Contributor

swamirishi commented Jul 18, 2025

Loos good to me. But why make it reconfigurble?

This would give us flexibility to disable/enable sst filtering service without having to restart OM. Through a workaround we can use the snapshot feature to get a consistent point in time view of the om rocksdb(on all the 3 oms) when sst filtering service is disabled.

@jojochuang
Copy link
Contributor

Ok please update jira. Right now it's blank empty.

@smengcl smengcl marked this pull request as ready for review July 22, 2025 02:03
Copy link
Contributor

@adoroszlai adoroszlai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @smengcl for the patch.

@smengcl
Copy link
Contributor Author

smengcl commented Jul 22, 2025

Thanks @smengcl for the patch.

Thanks @adoroszlai ! I have addressed your comments.

…pshotSSTFilteringServiceInterval for consistency with configuration key naming.
@smengcl smengcl force-pushed the HDDS-13464-reconf branch from ce81733 to 13d5c57 Compare July 22, 2025 16:44
@adoroszlai
Copy link
Contributor

Thanks @smengcl for updating the patch.

@smengcl
Copy link
Contributor Author

smengcl commented Jul 22, 2025

Thanks @jojochuang @swamirishi @adoroszlai for the reviews and comments.

@smengcl smengcl merged commit c6c86f9 into apache:master Jul 22, 2025
81 of 82 checks passed
@smengcl smengcl deleted the HDDS-13464-reconf branch July 22, 2025 18:53
errose28 added a commit to errose28/ozone that referenced this pull request Jul 30, 2025
* master: (730 commits)
  HDDS-13083. Handle cases where block deletion generates tree file before scanner (apache#8565)
  HDDS-12982. Reduce log level for snapshot validation failure (apache#8851)
  HDDS-13396. Documentation: Improve the top-level overview page for new users. (apache#8753)
  HDDS-13176. containerIds table value format change to proto from string (apache#8589)
  HDDS-13449. Incorrect Interrupt Handling for DirectoryDeletingService and KeyDeletingService (apache#8817)
  HDDS-2453. Add Freon tests for S3 MPU Keys (apache#8803)
  HDDS-13237. Container data checksum should contain block IDs. (apache#8773)
  HDDS-13489. Fix SCMBlockdeleting unnecessary iteration in corner case. (apache#8847)
  HDDS-13464. Make ozone.snapshot.filtering.service.interval reconfigurable (apache#8825)
  HDDS-13473. Amend validation for OZONE_OM_SNAPSHOT_DB_MAX_OPEN_FILES (apache#8829)
  HDDS-13435. Add an OzoneManagerAuthorizer interface (apache#8840)
  HDDS-8565. Recon memory leak in NSSummary (apache#8823).
  HDDS-12852. Implement a sliding window counter utility (apache#8498)
  HDDS-12000. Add unit test for RatisContainerSafeModeRule and ECContainerSafeModeRule (apache#8801)
  HDDS-13092. Container scanner should trigger volume scan when marking a container unhealthy (apache#8603)
  HDDS-13070. OM Follower changes to create and place sst files from hardlink file. (apache#8761)
  HDDS-13482. Mark testWriteStateMachineDataIdempotencyWithClosedContainer as flaky
  HDDS-13481. Fix success latency metric in SCM panels of deletion grafana dashboard (apache#8835)
  HDDS-13468. Update default value of ozone.scm.ha.dbtransactionbuffer.flush.interval. (apache#8834)
  HDDS-13410. Control block deletion for each DN from SCM. (apache#8767)
  ...

hadoop-hdds/common/src/main/java/org/apache/hadoop/hdds/scm/container/ContainerReplicaInfo.java
hadoop-ozone/cli-admin/src/main/java/org/apache/hadoop/hdds/scm/cli/container/ReconcileSubcommand.java
hadoop-ozone/cli-admin/src/test/java/org/apache/hadoop/hdds/scm/cli/container/TestReconcileSubcommand.java
jojochuang pushed a commit to jojochuang/ozone that referenced this pull request Jul 31, 2025
swamirishi pushed a commit to swamirishi/ozone that referenced this pull request Dec 3, 2025
…l reconfigurable (apache#8825)

(cherry picked from commit c6c86f9)

Conflicts:
hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/fs/ozone/TestOzoneFsSnapshot.java
hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/om/TestOmSnapshot.java
hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/reconfig/TestOmReconfiguration.java
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OzoneManager.java
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

snapshot https://issues.apache.org/jira/browse/HDDS-6517

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants