Skip to content

Conversation

@peterxcli
Copy link
Member

@peterxcli peterxcli commented Mar 25, 2025

What changes were proposed in this pull request?

//ozone-default.xml
<property>
  <name>ozone.om.fs.snapshot.max.limit</name>
  <value>1000</value>
  <tag>OZONE, OM, MANAGEMENT</tag>
  <description>
    The maximum number of filesystem snapshot allowed in an Ozone Manager.
  </description>
</property>

The value of the above property is passed to RDBStore but it is never used.

We should verify & limit the number of snapshots created in OMSnapshotCreateRequest.

  • Get non-deleted rows count directly from NumSnapshotActive metrics.
    • It would be prepared on the OM node restart, and be inc/dec on TestOMSnapshotCreateRequest/TestOMSnapshotDeleteRequest. So it would reflect the real snapshotTable row count.
      /**
      * Iterate the Snapshot table, check the status
      * for every snapshot and update OMMetrics.
      */
      private void updateActiveSnapshotMetrics()
      throws IOException {
      long activeGauge = 0;
      long deletedGauge = 0;
      try (TableIterator<String, ? extends
      KeyValue<String, SnapshotInfo>> keyIter =
      metadataManager.getSnapshotInfoTable().iterator()) {
      while (keyIter.hasNext()) {
      SnapshotInfo info = keyIter.next().getValue();
      SnapshotInfo.SnapshotStatus snapshotStatus =
      info.getSnapshotStatus();
      if (snapshotStatus == SnapshotInfo.SnapshotStatus.SNAPSHOT_ACTIVE) {
      activeGauge++;
      } else if (snapshotStatus ==
      SnapshotInfo.SnapshotStatus.SNAPSHOT_DELETED) {
      deletedGauge++;
      }
      }
      }
      metrics.setNumSnapshotActive(activeGauge);
      metrics.setNumSnapshotDeleted(deletedGauge);
      }

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-12596

How was this patch tested?

CI:
https://github.com/peterxcli/ozone/actions/runs/14679627514
https://github.com/peterxcli/ozone/actions/runs/14683601913

@peterxcli
Copy link
Member Author

The value of the above property is passed to RDBStore but it is never used.

https://issues.apache.org/jira/browse/HDDS-12690

@peterxcli
Copy link
Member Author

@szetszwo @SaketaChalamchala please take a look if you have time. Thanks!

Copy link
Contributor

@hemantk-12 hemantk-12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @peterxcli for working on this.

@hemantk-12 hemantk-12 requested a review from swamirishi March 25, 2025 19:05
@hemantk-12
Copy link
Contributor

@SaketaChalamchala can you please take a look?

@peterxcli
Copy link
Member Author

@hemantk-12 Do you know how to sync the changes in interface-client proto file to proto.lock?

@adoroszlai
Copy link
Contributor

Do you know how to sync the changes in interface-client proto file to proto.lock?

You shouldn't. proto.lock captures the previous release's state, so it's updated only before releasing Ozone. It is used as a reference to check for incompatible changes.

@peterxcli
Copy link
Member Author

Do you know how to sync the changes in interface-client proto file to proto.lock?

You shouldn't. proto.lock captures the previous release's state, so it's updated only before releasing Ozone. It is used as a reference to check for incompatible changes.

Got it. Thanks @adoroszlai!

Copy link
Contributor

@hemantk-12 hemantk-12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Source code looks good. Left some comments on tests. Please take a look.

Copy link
Contributor

@swamirishi swamirishi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@peterxcli thanks for working on the patch. I have a few suggestions for making this implementation correct in case of concurrent ratis requests.

checkSnapshotActive(toSnapInfo, false);
}

public void validateSnapshotLimit() throws IOException {
Copy link
Contributor

@swamirishi swamirishi Mar 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make this method synchronized and have a parameter in memory which keeps the number of snapshots creation in flight in memory. On hindsight I believe we should not put this check inside validateAndUpdateCache since this can impact ratis replays. @hemantk-12
The in memory count should be increased once the pre execute passes inside a synchronized block and should be decremented once the snapshot gets added on to the snapshot chain in validateAndUpdate cache. The total number of snapshots at any time would be:
totalSnapshots = snapshotChainManager.size() + inmemoryCounter

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To add more details:
Ratis replay would not be deterministic across various om nodes if the config is not matching across all nodes. We should only look at leader. Since validateAndUpdateCache runs on all nodes, this could lead to an inconsistent state.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename this function addSnapshotLimitCheck() or something

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This approach has an issue. preExecute runs only on a single node (the leader node, if I recall correctly), whereas validateAndUpdateCache runs on all nodes. If we decrement in validateAndUpdateCache, other nodes will keep decrementing without a corresponding increment, causing the value to go negative. Additionally, if leadership changes, it could result in exceeding the allowed limit.

IMO, we can simplify this by relying on the SnapshotChain count, which will act as a soft limit initially and eventually become a hard limit. I doubt that applyingTransaction is slower than the write request rate to the extent that it would create a backlog of snapshot creation requests.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hemantk-12 We can ignore the inflight counter on the followers. We would be just using this counter on the leader. We don't have to be worried about the value in the followers. On leader switch this number can be reset to 0 again.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we can ignore the inflight counter on the followers and start from 0, but if there is a leadership change while preExecute has completed and validateAndUpdateCache has not. When a new leader (former follower) tries to apply the transactions (basically executing validateAndUpdateCache), inflight counter will go negative.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@swamirishi and I were discussing this, and one way to fix it is to implement notifyLeaderReady() and notifyNotLeader() similar to SCMStateMachine for OzoneManager and reset the inflight counter on leadership change.

cc: @swamirishi @SaketaChalamchala

@peterxcli peterxcli requested a review from hemantk-12 April 3, 2025 08:58
@swamirishi swamirishi added the snapshot https://issues.apache.org/jira/browse/HDDS-6517 label Apr 15, 2025
Copy link
Contributor

@hemantk-12 hemantk-12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @peterxcli for addressing all the review comments.
Overall changes look good to me.

@peterxcli
Copy link
Member Author

peterxcli commented Apr 25, 2025

agreed this section is prone to race conditions.
Semaphore would be a more appropriate solution, but it will be require a larger rewrite.

Agreed, this section is prone to race conditions.
A semaphore would be a more appropriate solution, but it would require a larger rewrite.

@jojochuang I have refactored that section by putting it into a synchronized block. I think it now has the same effect as a Semaphore:

Changed to use atomicReference to handle the exception and place the increment logic within the AtomicReference#updateAndGet block

@peterxcli peterxcli requested a review from hemantk-12 April 26, 2025 09:10
…stream/master' into hdds12596-om-fs-snapshot-max-limit-is-not-enforced
Copy link
Contributor

@hemantk-12 hemantk-12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @peterxcli for the improvement.

LGTM+1.

@hemantk-12
Copy link
Contributor

@swamirishi can you please take another look?

@jojochuang
Copy link
Contributor

jojochuang commented May 1, 2025

BTW ozone.om.fs.snapshot.max.limit of 1000 is probably too low. We have users at a few thousand snapshots already and will scale up even further. It looks like 1000 is not a problem. We should raise it a lot higher. That's a separate task.

EDIT: opened https://issues.apache.org/jira/browse/HDDS-12948 to track this task

Copy link
Contributor

@jojochuang jojochuang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like snapshotLimitCheck() can still be prone to race conditions. But it should be fine in this case. The race is adding currentSnapshotNum to inFlightSnapshotCount. But the result is not saved so it doesn't cause data loss or anything. It's a guardrail and it's okay even if it's off by a few.

@jojochuang jojochuang merged commit f345492 into apache:master May 2, 2025
43 checks passed
@peterxcli peterxcli deleted the hdds12596-om-fs-snapshot-max-limit-is-not-enforced branch May 2, 2025 06:39
@peterxcli
Copy link
Member Author

Thanks @hemantk-12, @swamirishi, @jojochuang and @SaketaChalamchala for the review, and @adoroszlai for the clarification of my proto.lock question!

adoroszlai added a commit that referenced this pull request May 2, 2025
@adoroszlai
Copy link
Contributor

@jojochuang @peterxcli This caused build failure on master due to other recent changes around OM metadata classes. The compile error can be fixed by simply adding the required import, but the test is failing after that. Thus I had to revert this change, sorry.

As a sanity check, I recommend running local build and checkstyle/pmd before merging PRs.

gh pr checkout 8157
git merge --no-commit origin/master
mvn -DskipRecon -DskipShade -DskipTests clean verify
hadoop-ozone/dev-support/checks/checkstyle.sh
hadoop-ozone/dev-support/checks/pmd.sh

(Note: check exit code of pmd, Maven always shows success.)

When there are related changes on master, I also recommend merging it into the PR branch to trigger complete CI.

public void snapshotLimitCheck() throws IOException, OMException {
OmMetadataManagerImpl omMetadataManager = (OmMetadataManagerImpl) ozoneManager.getMetadataManager();
SnapshotChainManager snapshotChainManager = omMetadataManager.getSnapshotChainManager();
int currentSnapshotNum = snapshotChainManager.getGlobalSnapshotChain().size();
Copy link
Contributor

@swamirishi swamirishi May 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@peterxcli Can't we call this inside inflightSnapshotCount.updateAndGet() function that would make this threadsafe and consistent.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was off the previous week so couldn't review this PR in time. Can you please make this change in a follow up jira.

Copy link
Member Author

@peterxcli peterxcli May 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure and thanks for the review! Created HDDS-12952 for this.

vtutrinov pushed a commit to vtutrinov/ozone that referenced this pull request May 13, 2025
vtutrinov pushed a commit to vtutrinov/ozone that referenced this pull request May 13, 2025
aswinshakil added a commit to aswinshakil/ozone that referenced this pull request May 16, 2025
…239-container-reconciliation

Commits: 113 commits

8ceb5c3 Revert "HDDS-11232. Spare InfoBucket RPC call for the FileSystem#getFileStatus calls for the general case. (apache#6988)" (apache#8358)
8ecf4a8 HDDS-12993. Use DatanodeID in ReconNodeManager. (apache#8456)
3e09a11 HDDS-13024. HTTP connect to 0.0.0.0 failed (apache#8439)
c3cb14f HDDS-12914. Bump awssdk to 2.31.40, test ResumableFileDownload (apache#8455)
c6b47e9 HDDS-4677. Document Ozone Ports and Connection End Points (apache#8226)
ae0d757 HDDS-13044. Remove DatanodeDetails#getUuid usages from hdds-common/client/container-service (apache#8462)
2bf3f16 HDDS-12568. Implement MiniOzoneCluster.Service for Recon (apache#8452)
e59d251 HDDS-13046. Add .vscode to .gitignore (apache#8461)
50e6d61 HDDS-13022. Split up exclusive size tracking for key and directory cleanup in SnapshotInfo (apache#8433)
6db647d HDDS-12966. DBDefinitionFactory should not throw InvalidArnException (apache#8454)
fe2875c HDDS-12948. [Snapshot] Increase `ozone.om.fs.snapshot.max.limit` default value to 10k (apache#8376)
215bdc2 HDDS-11904. Fix HealthyPipelineSafeModeRule logic. (apache#8386)
dba7557 HDDS-12989. Throw CodecException for the Codec byte[] methods (apache#8444)
d9c73f0 HDDS-11298. Support owner field in the S3 listBuckets request (apache#8315)
7a5129d HDDS-12810. Check and reserve space atomically in VolumeChoosingPolicy (apache#8360)
d51ccf7 HDDS-13016. Add a getAllNodeCount() method to NodeManager. (apache#8445)
8b27fb9 HDDS-12299. Merge OzoneAclConfig into OmConfig (apache#8383)
96f4f79 HDDS-12501. Remove OMKeyInfo from SST files in backup directory. (apache#8214)
8adb500 HDDS-13021. Make callId unique for each request in AbstractKeyDeletingService (apache#8432)
faeadd6 HDDS-13000. [Snapshot] listSnapshotDiffJobs throws NPE. (apache#8418)
7c957cc HDDS-11954. List keys command gets wrong info from BasicOmKeyInfo (apache#8401)
0697383 HDDS-13027. Workaround for GitHub connection failures (apache#8443)
0d5f933 HDDS-12919. Remove redundant FileLock from ChunkWrite (apache#8435)
395892a HDDS-13012. [Snapshot] Add audit logs for listSnapshotDiffJobs, snapshotDiff and cancelSnapshotDiff. (apache#8425)
d92f0ed HDDS-12653. Add option in `ozone debug log container list` to filter by Health State (apache#8415)
78f8a02 HDDS-12645. Improve output of `ozone debug replicas chunk-info` (apache#8146)
f32470c HDDS-12995. Split ListOptions for better reuse (apache#8413)
cdd1b13 HDDS-13019. Bump jline to 3.30.0 (apache#8428)
1651408 HDDS-13018. Bump common-custom-user-data-maven-extension to 2.0.2 (apache#8429)
8f6222e HDDS-12949. Cleanup SCMSafeModeManager (apache#8390)
adb385f HDDS-13002. Use DatanodeID in ContainerBalancerTask (apache#8421)
2b2a796 HDDS-12969. Use DatanodeID in XceiverClientGrpc (apache#8430)
3c0bafa HDDS-13001. Use DatanodeID in SCMBlockDeletingService. (apache#8420)
8448bc2 HDDS-12187. Use hadoop-dependency-client as parent, not dependency (apache#8416)
582fd03 HDDS-13015. Bump zstd-jni to 1.5.7-3 (apache#8426)
ec12f98 HDDS-12970. Use DatanodeID in Pipeline (apache#8414)
3f48a18 HDDS-12179. Improve Container Log to also capture the Volume Information (apache#7858)
034b497 HDDS-12964. Add minimum required version of Docker Compose to RunningViaDocker page (apache#8385)
c0a6b42 HDDS-12947. Add CodecException (apache#8411)
70949c5 HDDS-11603. Reclaimable Key Filter for Snapshots garbage reclaimation (apache#8392)
08283f3 HDDS-12087. TransactionToDNCommitMap too large causes GC to pause for a long time (apache#8347)
f47df78 HDDS-12958. [Snapshot] Add ACL check regression tests for snapshot operations. (apache#8419)
1cc8445 HDDS-12207. Unify output of ozone debug replicas verify checks (apache#8248)
f087d0b HDDS-12994. Use DatanodeID in ReconSCMDBDefinition. (apache#8417)
d6a7723 HDDS-12776. ozone debug CLI command to list all Duplicate open containers (apache#8409)
c78aeb0 HDDS-12951. EC: Log when falling back to reconstruction read (apache#8408)
412f22d HDDS-12959. Eliminate hdds-hadoop-dependency-server (apache#8384)
df701dc HDDS-12996. Workaround for Docker Compose concurrent map writes (apache#8412)
e6daae4 HDDS-12972. Use DatanodeID in ContainerReplica. (apache#8396)
c1103ae HDDS-12877. Support StorageClass field in the S3 HeadObject request (apache#8351)
4135384 HDDS-12973. Add javadoc for CompactionNode() and make getCompactionNodeGraph return ConcurrentMap (apache#8395)
4f2e13c HDDS-12954. Do not throw IOException for checksum. (apache#8387)
49b8fbd HDDS-12971. Use DatanodeID in Node2PipelineMap (apache#8403)
d2da18f HDDS-12346. Reduce code duplication among TestNSSummaryTask classes (apache#8287)
82b73e3 HDDS-11856. Set DN state machine thread priority higher than command handler thread. (apache#8253)
d4f2734 HDDS-12689. Import BOM for AWS SDK, declare dependencies (apache#8406)
4775e76 HDDS-12975. Fix percentage of blocks deleted in grafana dashboard (apache#8398)
ac0d696 HDDS-12968. [Recon] Fix column visibility issue in Derby during schema upgrade finalization. (apache#8393)
e71dcf6 HDDS-11981. Add annotation for registering feature validator based on a generic version (apache#7603)
254297c HDDS-12562. Reclaimable Directory entry filter for reclaiming deleted directory entries (apache#8055)
4f467c8 HDDS-12978. Remove TestMultipartObjectGet (apache#8400)
a99f207 HDDS-12967. Skip CommonChunkManagerTestCases.testFinishWrite if fuser cannot be started (apache#8389)
d29d76b HDDS-12697. Ozone debug CLI to display details of a single container (apache#8264)
1d1bc88 HDDS-12974. Docker could not parse extra host IP (apache#8397)
7e675d7 HDDS-12053. Make print-log-dag command run locally and offline (apache#8016)
d3faab3 HDDS-12561. Reclaimable Rename entry filter for reclaiming renaming entries (apache#8054)
af1f98c HDDS-10822. Tool to omit raft log in OM. (apache#8154)
3201ca4 HDDS-12952. Make OmSnapshotManager#snapshotLimitCheck thread-safe and consistent (apache#8381)
522c88d HDDS-12963. Clean up io.grpc dependencies (apache#8382)
fa8bd9d HDDS-12916. Support ETag in listObjects response (apache#8356)
fdc77db HDDS-12300. Merge OmUpgradeConfig into OmConfig (apache#8378)
623e144 HDDS-12956. Bump vite to 4.5.14 (apache#8375)
8c8eaf1 HDDS-12944. Reduce timeout for integration check (apache#8374)
40d2e00 HDDS-11141. Avoid log flood due due pipeline close in XceiverServerRatis (apache#8325)
9fe1dba HDDS-12942. Init layout version config should not be public (apache#8373)
452e7aa HDDS-12596. OM fs snapshot max limit is not enforced (apache#8377)
8b095d5 HDDS-12795. Rename heartbeat and first election configuration name (apache#8249)
bee8164 HDDS-12920. Configure log4j to gzip rolled over service log files (apache#8357)
e16a50f HDDS-12934. Split submodule for Freon. (apache#8367)
b1e9511 HDDS-12925. Update datanode volume used space on container deletion (apache#8364)
440bc82 Revert "HDDS-12596. OM fs snapshot max limit is not enforced (apache#8157)"
f345492 HDDS-12596. OM fs snapshot max limit is not enforced (apache#8157)
ee7b1dc HDDS-12901. Introduce EventExecutorMetrics instead of setting the metrics props unsafely (apache#8371)
810e148 HDDS-12939. Remove UnknownPipelineStateException. (apache#8372)
560fcdf HDDS-12728. Add Ozone 2.0.0 to compatibility acceptance tests (apache#8361)
5815a47 HDDS-12933. Remove the table names declared in OmMetadataManagerImpl (apache#8370)
ee32fa5 HDDS-12560. Reclaimable Filter for Snaphost Garbage Collections (apache#8053)
45374ea HDDS-12932. Rewrite OMDBDefinition (apache#8362)
5cb6dd8 HDDS-12575. Set default JUnit5 timeout via property (apache#8348)
8efc0cd HDDS-11633. Delete message body too large, causing SCM to fail writing raft log (apache#8354)
ac9d9fd HDDS-12915. Intermittent failure in testCreatePipelineThrowErrorWithDataNodeLimit (apache#8359)
86039e8 HDDS-12848. Create new submodule for ozone admin (apache#8292)
2d0f8cb HDDS-12833. Remove the CodecRegistry field from DBStoreBuilder (apache#8327)
c71b393 HDDS-12921. UnusedPrivateField violations in tests (apache#8353)
9f3dd01 HDDS-12917. cp: option '--update' doesn't allow an argument (apache#8346)
a14b395 HDDS-12922. Use OMDBDefinition in GeneratorOm and FSORepairTool (apache#8355)
c8a98d6 HDDS-12892. OM Tagging Request incorrectly sets full path as key name for FSO (apache#8345)
ade69e3 HDDS-12649. Include name of volume or bucket in length validation error (apache#8322)
c68308d HDDS-12599. Create an ozone debug CLI command to list all the containers based on final state (apache#8282)
319d5a4 HDDS-12773. bad substitution in bats test (apache#8290)
6f5e02a HDDS-12900. (addendum: fix pmd) Use OMDBDefinition in OmMetadataManagerImpl (apache#8337)
2a7000d HDDS-12900. Use OMDBDefinition in OmMetadataManagerImpl (apache#8337)
9c0c66c HDDS-12915. Mark testCreatePipelineThrowErrorWithDataNodeLimit as flaky
a73e052 HDDS-12907. Enable FieldDeclarationsShouldBeAtStartOfClass PMD rule (apache#8344)
d083f82 HDDS-12905. Move field declarations to start of class in ozone-common (apache#8342)
3f90e1c HDDS-12906. Move field declarations to start of class in ozone-manager module (apache#8343)
403fb97 HDDS-12878. Move field declarations to start of class in tests (apache#8308)
63d5c73 HDDS-12912. Remove deprecated `PipelineManager#closePipeline(Pipeline, boolean)` (apache#8340)
cf1fb88 HDDS-12902. Shutdown executor in CloseContainerCommandHandler and ECReconstructionCoordinator (apache#8341)
825ba02 HDDS-9585. Improve import/export log in ContainerLogger (apache#8330)
b70d35a HDDS-12889. Enable AppendCharacterWithChar PMD rule (apache#8324)
cd308ea HDDS-12904. Move field declarations to start of class in other hdds modules (apache#8336)
4905286 HDDS-12899. Move field declarations to start of class in hdds-server-scm (apache#8332)

Conflicts:
	hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/utils/ContainerLogger.java
	hadoop-hdds/container-service/src/test/java/org/apache/hadoop/ozone/container/keyvalue/TestKeyValueHandler.java
	hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/ContainerReplica.java
	hadoop-ozone/cli-admin/src/main/java/org/apache/hadoop/hdds/scm/cli/container/ReconcileSubcommand.java
	hadoop-ozone/recon-codegen/src/main/java/org/apache/ozone/recon/schema/ContainerSchemaDefinition.java
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

snapshot https://issues.apache.org/jira/browse/HDDS-6517

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants