HDDS-13006. Use yaml files to host Ozone snapshot local properties #569

jojochuang · 2025-06-04T17:17:26Z

What changes were proposed in this pull request?

Use one .yaml file for each Ozone snapshot DB to store local properties for each Ozone snapshop DB checkpoint. e.g. isSstFiltered

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-13006

How was this patch tested?

TBD

gemini-code-assist

Hello @jojochuang, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

Summary of Changes

Gemini here, providing a summary for this pull request. Based on the title and description, the primary goal of this PR is to introduce the use of YAML files to store local properties for Ozone snapshot DB checkpoints. This allows for per-OM properties, such as the isSstFiltered flag, to be stored alongside the snapshot checkpoint data. However, the patches included in this PR also contain several other distinct changes, including enhancements to the Container Balancer configuration validation, additions to the Freon random key generator tool, improvements to the ozone debug replicas verify command, and refactoring within the Ozone Manager's KeyManagerImpl for handling pending deletions.

Highlights

Snapshot Local Properties: Introduces an interface (OmSnapshotLocalPropertyManager) and a helper method (OmSnapshotManager.getSnapshotLocalPropertyPath) to manage per-OM local properties for snapshot DB checkpoints using YAML files.
Container Balancer Validation: Adds a new validation rule in ContainerBalancer to ensure that hdds.container.balancer.move.timeout minus hdds.container.balancer.move.replication.timeout and hdds.scm.replication.event.timeout.datanode.offset is greater than or equal to 9 minutes. A corresponding test case is added.
Freon Random Key Generator Enhancement: Adds a --bucket-layout option to the RandomKeyGenerator tool, allowing users to specify the bucket layout (e.g., OBJECT_STORE, FILE_SYSTEM_OPTIMIZED) when creating buckets. Adds a test for this new option.
Ozone Debug Replicas Verify Enhancement: Adds a new verifier (ContainerStateVerifier) to the ozone debug replicas verify command to check the state of containers and their replicas on datanodes and compare it with the state in SCM. Introduces a --container-state flag to enable this check and a --container-cache-size option for caching container info.
OM KeyManagerImpl Refactoring: Refactors methods related to retrieving pending deletion entries (getPendingDeletionKeys, getPendingDeletionSubDirs, getPendingDeletionSubFiles) in KeyManagerImpl to use a common helper method (gatherSubPathsWithIterator) and standardizes the use of Table.KeyValue vs KeyValue imports.

Changelog

Click here to see the changelog

hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/balancer/ContainerBalancer.java
- Added imports for Duration and TimeUnit (lines 22, 24).
- Added a new validation check for balancer timeouts (lines 465-480).
hadoop-hdds/server-scm/src/test/java/org/apache/hadoop/hdds/scm/container/balancer/TestContainerBalancer.java
- Added a test case to verify the new balancer timeout validation rule (lines 208-226).
hadoop-ozone/dist/src/main/smoketest/debug/ozone-debug-tests.robot
- Removed unused variable ${OM_SERVICE_ID} (line 30).
- Modified ozone debug replicas verify command to include --block-existence and --container-state flags (line 40).
hadoop-ozone/freon/src/main/java/org/apache/hadoop/ozone/freon/RandomKeyGenerator.java
- Added imports for BucketArgs and BucketLayout (lines 63, 71).
- Added --bucket-layout command line option (lines 196-200).
- Added getBucketMapSize method for testing (lines 657-665).
- Modified createBucket to use the specified bucket layout if provided (lines 778-785).
- Made getBucket method visible for testing (line 909).
hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/TestContainerBalancerOperations.java
- Adjusted moveReplicationTimeout value in testContainerBalancerCLIOperations (line 85).
- Adjusted moveReplicationTimeout value in testIfCBCLIOverridesConfigs (line 150).
hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/freon/TestRandomKeyGenerator.java
- Added imports for OzoneBucket and BucketLayout (lines 24, 25).
- Added testBucketLayoutOption to verify the new bucket layout functionality (lines 191-210).
hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/shell/TestOzoneDebugShell.java
- Added --container-state flag to the replicas verify command in testReplicasVerifyCmd (line 106).
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/KeyManagerImpl.java
- Added imports for Table.KeyValue and WithParentObjectId (lines 134, 164).
- Changed type parameters from Table.KeyValue to KeyValue in getPendingDeletionKeys (lines 739, 747, 753, 765).
- Changed type parameters from Table.KeyValue to KeyValue in getTableEntries (lines 800, 801, 805, 816).
- Changed type parameters from Table.KeyValue to KeyValue in getRenamesKeyEntries (lines 838, 840, 843).
- Changed type parameters from Table.KeyValue to KeyValue in getDeletedKeyEntries (lines 887, 889, 892).
- Changed type parameters from Table.KeyValue to KeyValue in getDeletedDirEntries (line 2192).
- Refactored getPendingDeletionSubDirs to use gatherSubPathsWithIterator (lines 2198-2203).
- Added generic helper method gatherSubPathsWithIterator (lines 2205-2238).
- Refactored getPendingDeletionSubFiles to use gatherSubPathsWithIterator (lines 2240-2247).
- Changed type parameters from Table.KeyValue to KeyValue in listStatus (line 1856).
- Changed type parameters from Table.KeyValue to KeyValue in getIteratorForKeyInTableCache (lines 1913, 1918).
- Changed type parameters from Table.KeyValue to KeyValue in findKeyInDbWithIterator (lines 1936, 1941).
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OmSnapshotLocalPropertyManager.java
- Added new interface OmSnapshotLocalPropertyManager for managing snapshot local properties (lines 1-88).
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OmSnapshotManager.java
- Added static helper method getSnapshotLocalPropertyPath (lines 767-770).
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/request/file/OMFileRequest.java
- Added helper method getKeyInfoWithFullPath for OmDirectoryInfo (lines 723-729).
- Added helper method getKeyInfoWithFullPath for OmKeyInfo (lines 731-736).
hadoop-ozone/tools/src/main/java/org/apache/hadoop/ozone/debug/replicas/BlockExistenceVerifier.java
- Added setId(datanode.getID()) when building the pipeline (line 59).
hadoop-ozone/tools/src/main/java/org/apache/hadoop/ozone/debug/replicas/ContainerStateVerifier.java
- Added new class ContainerStateVerifier to check container and replica states (lines 1-194).
hadoop-ozone/tools/src/main/java/org/apache/hadoop/ozone/debug/replicas/ReplicasVerify.java
- Added --container-cache-size option (lines 70-76).
- Added ContainerStateVerifier if --container-state is enabled (lines 91-93).
- Added doExecuteReplicaState flag to Verification inner class (lines 236-241).
hadoop-ozone/tools/src/main/java/org/apache/hadoop/ozone/debug/replicas/chunk/ChunkKeyHandler.java
- Changed JSON key name from ecIndex to replicaIndex (line 199).

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in issue comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Code review time is here,
Changes flow, make logic clear.
Fix the bugs, add new features,
Build a system that endures.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces the OmSnapshotLocalPropertyManager interface as per HDDS-13006, and also includes several other enhancements and refactorings across ContainerBalancer, Freon, Ozone debug tools, and KeyManagerImpl. The changes are generally well-implemented.

Key highlights:

The refactoring in KeyManagerImpl to consolidate logic for deleting sub-paths is a good improvement for maintainability.
The new ContainerStateVerifier enhances the capabilities of the ozone debug replicas verify tool.
The addition of bucket layout support in Freon is useful for testing.

Please ensure that the testing (marked as TBD in the PR description) is completed and documented. There are a couple of minor points to address, detailed in the specific comments.

Summary of Findings

Magic Number in ContainerBalancer: A magic number (9 minutes) is used in ContainerBalancer validation. Consider making it a named constant or configurable for better clarity and maintainability.
Logging Practice in ContainerStateVerifier: System.err.println is used in ContainerStateVerifier for an invalid cache size warning. It's recommended to use an SLF4J logger for consistency and better log management.
Testing Status: The PR description indicates that testing is 'TBD'. Comprehensive testing is essential before merging.
Code Refactoring: The refactoring in KeyManagerImpl to use gatherSubPathsWithIterator is a positive change, reducing code duplication.
New Features: The new ContainerStateVerifier and Freon's bucket layout support are valuable additions.
Test Configuration: The test configuration in TestContainerBalancerOperations.java will now throw an exception due to the new validation logic. Consider updating the test configuration to be valid.

Merge Readiness

The pull request introduces valuable features and refactorings. However, the testing status is 'TBD', and there are a couple of medium-severity issues (related to a magic number and test configuration) and a low severity issue (related to logging practice) that should be addressed. Once these points are resolved and testing is completed, the PR should be in good shape for merging. As an AI, I cannot approve PRs; please ensure human reviewers provide final approval.

gemini-code-assist · 2025-06-04T17:19:44Z

...erver-scm/src/main/java/org/apache/hadoop/hdds/scm/container/balancer/ContainerBalancer.java

+    long datanodeOffset = ozoneConfiguration.getTimeDuration("hdds.scm.replication.event.timeout.datanode.offset",
+        Duration.ofMinutes(6).toMillis(), TimeUnit.MILLISECONDS);
+    if ((conf.getMoveTimeout().toMillis() - conf.getMoveReplicationTimeout().toMillis() - datanodeOffset)
+        < Duration.ofMinutes(9).toMillis()) {


The magic number 9 (minutes) is used here as a threshold. Could this be defined as a named constant (e.g., MINIMUM_TIME_BUFFER_FOR_MOVE_OPERATIONS_MINUTES = 9) at the class or configuration level? This would improve readability and maintainability, especially if this value has specific significance or might need adjustment in the future.

For example:

private static final Duration MINIMUM_OPERATION_BUFFER_DURATION = Duration.ofMinutes(9); // ... if ((conf.getMoveTimeout().toMillis() - conf.getMoveReplicationTimeout().toMillis() - datanodeOffset) < MINIMUM_OPERATION_BUFFER_DURATION.toMillis()) { // ... }

private static final Duration MINIMUM_OPERATION_BUFFER_DURATION = Duration.ofMinutes(9);

gemini-code-assist · 2025-06-04T17:19:44Z

.../integration-test/src/test/java/org/apache/hadoop/ozone/TestContainerBalancerOperations.java

    Optional<Integer> balancingInterval = Optional.of(70);
    Optional<Integer> moveTimeout = Optional.of(65);
-    Optional<Integer> moveReplicationTimeout = Optional.of(55);
+    Optional<Integer> moveReplicationTimeout = Optional.of(50);


The move replication timeout is set to 50 minutes, while the move timeout is set to 65 minutes. Given the new validation logic added to ContainerBalancer.java, this configuration will now throw an InvalidContainerBalancerConfigurationException. It would be good to update this test to ensure the configuration is valid, or loosen the validation to allow for this case.

Style Guide References

…llation. (apache#8719)

… value (apache#8745)

…Volume Full (apache#8460)

…tions (apache#8674)

Co-authored-by: Copilot <[email protected]> Generated-by: Google Gemini Pro 2.5 + Gemini Cli

…bility (apache#8735)

Generated-by: Claude 3.7 Sonnet (then manually reviewed and tweaked)

…ocalPropertyYamlImpl`. Generated-by: Claude 3.7 Sonnet (then manually reviewed and tweaked)

…ests.

…ogic in `SstFilteringService`

…pshotLocalDataYaml`

…, not selective ones.

jojochuang · 2025-07-10T00:42:07Z

/gemini review

github-actions · 2025-07-10T00:42:17Z

No such command. gemini Available commands:

/close : Close pending pull request temporary
/help : Show all the available comment commands
/label : add new label to the issue: /label <label>
/pending : Add a REQUESTED_CHANGE type review to mark issue non-mergeable: /pending <reason>
/ready : Dismiss all the blocking reviews by github-actions bot
/retest : provide help on how to trigger new CI build

gemini-code-assist

Code Review

This pull request introduces a significant number of unrelated changes, making it difficult to review and increasing the risk of introducing unintended issues. The PR's title and description state its purpose is to 'Use yaml files to host Ozone snapshot local properties', but it also includes:

Major documentation updates and new design documents for unrelated features (e.g., Full Volume Handling, S3 API documentation).
Refactoring of CLI command options into a mixin.
Changes to test scripts and test infrastructure (docker-compose, robot framework).
Changes to Recon, including UI updates.
Changes to byteman integration.

It is highly recommended to split these unrelated changes into separate pull requests. This will make each PR smaller, more focused, and easier to review and merge.

My review focuses on the core changes related to the snapshot YAML properties, but I've also included a few comments on the unrelated changes to highlight the issue.

gemini-code-assist · 2025-07-10T00:43:58Z

hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OmSnapshotLocalData.java

+  public void setLastCompactionTime(Long lastCompactionTime) {
+    this.lastCompactionTime = lastCompactionTime;
+  }


The setLastCompactionTime method accepts a Long object, but the lastCompactionTime field is a primitive long. If a null value is passed to this method, it will result in a NullPointerException when unboxing. To prevent this, the parameter type should be changed to the primitive long.

Suggested change

public void setLastCompactionTime(Long lastCompactionTime) {

this.lastCompactionTime = lastCompactionTime;

}

public void setLastCompactionTime(long lastCompactionTime) {

this.lastCompactionTime = lastCompactionTime;

}

gemini-code-assist · 2025-07-10T00:43:58Z

hadoop-hdds/docs/content/design/full-volume-handling.md

+---
+title: Full Volume Handling
+summary: Immediately trigger Datanode heartbeat on detecting full volume
+date: 2025-05-12
+jira: HDDS-12929
+status: implemented 
+author: Siddhant Sangwan, Sumit Agrawal
+---
+
+<!--
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+> **Note**: The feature described here was implemented in [pull request #8590](https://github.com/apache/ozone/pull/8590). This document reflects the final, merged design.
+
+## Summary
+Trigger datanode heartbeat immediately when the container being written to is (close) to full, or volume is full, 
+or container is unhealthy, while handling a write request. The immediate heartbeat will contain close container action. 
+The overall objective is to avoid Datanode disks from getting completely full and preventing degradation of write 
+performance. 
+
+## Problem
+When a Datanode volume is close to full, the SCM may not be immediately aware of this because storage reports are only
+sent to it every one minute (`HDDS_NODE_REPORT_INTERVAL_DEFAULT = "60s"`). Additionally, SCM only has stale
+information about the current size of a container because container size is only updated when an Incremental Container
+Report (event based, for example when a container transitions from open to closing state) is received or a Full
+Container Report (`HDDS_CONTAINER_REPORT_INTERVAL_DEFAULT = "60m"`) is received. This can lead to the SCM
+over-allocating blocks to containers on a Datanode volume that has already reached the min free space boundary. 
+
+In the future, in https://issues.apache.org/jira/browse/HDDS-12151 we plan to fail writes for containers on a volume 
+that has reached the min free space boundary. 
+So, in the future, when the Datanode fails writes for such a volume, overall write performance will drop because the 
+client will have to request for a different set of blocks.
+
+Before this change, a close container action was queued to be sent in the next heartbeat, when:
+1. Container was at 90% capacity.
+2. Volume was full, __counting committed space__. That is `available - reserved - committed - min free space <= 0`.
+3. Container was `UNHEALTHY`.
+
+But since the next heartbeat could be sent up to 30 seconds later in the worst case, this reaction time was too slow. 
+This design proposes sending Datanode heartbeat immediately so SCM can get the close container action 
+immediately. This will help in reducing performance drop because of write failures when 
+https://issues.apache.org/jira/browse/HDDS-12151 is implemented in the future.
+
+### The definition of a full volume
+Previously, a volume was considered full if the following method returned true. It accounts for available space, 
+committed space, min free space and reserved space (`available` already considers `reserved` space):
+```java
+  private static long getUsableSpace(
+      long available, long committed, long minFreeSpace) {
+    return available - committed - minFreeSpace;
+  }
+```
+Counting committed space here, _when sending close container action_, is a bug - we only want to close the container if 
+`available - reserved - minFreeSpace <= 0`. 
+
+## Non Goals
+Failing the write if it exceeds the min free space boundary (https://issues.apache.org/jira/browse/HDDS-12151) is not 
+discussed here.
+
+## Proposed Solution
+
+### Proposal for immediately triggering Datanode heartbeat
+
+We will immediately trigger the heartbeat when:
+1. The container is (close) to full (this is existing behaviour, the container full check already exists).
+2. The volume is __full EXCLUDING committed space__ (`available - reserved available - min free <= 0`). This is because 
+   when a volume 
+   is full INCLUDING committed space (`available - reserved available - committed - min free <= 0`), open containers 
+   can still accept writes. So the current behaviour of sending a close container action when volume is full including 
+   committed space is a bug.
+3. The container is unhealthy (this is existing behaviour).
+
+Logic to trigger a heartbeat immediately already exists - we just need to call the method when needed. So, in 
+`HddsDispatcher`, when handling a request: 
+1. For every write request:
+   1. Check the above three conditions
+      1. If true, queue the `ContainerAction` to the context as before, then immediately trigger the heartbeat using:
+```
+context.getParent().triggerHeartbeat();
+```
+
+#### Throttling
+Throttling is required so the Datanode doesn't send multiple immediate heartbeats for the same container. We can have 
+per-container throttling, where we trigger an immediate heartbeat once for a particular container, and after that 
+container actions for that container can only be sent in the regular, scheduled heartbeats. Meanwhile, immediate 
+heartbeats can still be triggered for different containers.
+
+Here's a visualisation to show this. The letters (A, B, C etc.) denote events and timestamp is the time at which 
+an event occurs.
+```
+Write Call 1:
+/ A, timestamp: 0/-------------/B, timestamp: 5/
+
+Write Call 2, in-parallel with 1:
+------------------------------ /C, timestamp: 5/
+
+Write Call 3, in-parallel with 1 and 2:
+---------------------------------------/D, timestamp: 7/
+
+Write Call 4:
+------------------------------------------------------------------------/E, timestamp: 30/
+
+Events:
+A: Last, regular heartbeat
+B: Volume 1 detected as full while writing to Container 1, heartbeat triggered
+C: Volume 1 again detected as full while writing to Container 2, heartbeat triggered
+D: Container 1 detected as full, heartbeat throttled
+E: Volume 1 detected as full while writing to Container 2, Container Action sent in regular heartbeat 
+```
+
+A simple and thread safe way to implement this is to have an `AtomicBoolean` for each container in the `ContainerData` 
+class that can be used to check if an immediate heartbeat has already been triggered for that container. The memory 
+impact of introducing a new member variable in `ContainerData` for each container is quite small. Consider:
+
+- A Datanode with 600 TB disk space.
+- Container size is 5 GB.
+- Number of containers = 600 TB / 5 GB = 120,000.
+- For each container, we'll have an `AtomicBoolean`, which just has one field that counts: `private volatile int value`.
+  - Static fields don't count.
+- An int is 4 bytes in Java. Assuming that other overhead for a Java object brings the size of the object to ~20 bytes.
+- 20 bytes * 120,000 = ~2.4 MB.
+
+For code implementation, see https://github.com/apache/ozone/pull/8590.
+
+## Alternatives
+### Preventing over allocation of blocks in the SCM
+The other part of the problem is that SCM has stale information about the size of the container and ends up
+over-allocating blocks (beyond the container's 5 GB size) to the same container. Solving this problem is complicated.
+We could track how much space we've allocated to a container in the SCM - this is doable on the surface but won't
+actually work well. That's because SCM is asked for a block (256 MB), but SCM doesn't know how much data a client will
+actually write to that block file. The client may only write 1 MB, for example. So SCM could track that it has already
+allocated 5 GB to a container, and will open another container for incoming requests, but the client may actually only
+write 1GB. This would lead to a lot of open containers when we have 10k requests/second.
+
+At this point, we've decided not to do anything about this.
+
+### Regularly sending open container reports
+Sending open container reports regularly (every 30 seconds for example) can help a little bit, but won't solve the
+problem. We won't take this approach for now.
+
+### Sending storage reports in immediate heartbeats
+We considered triggering an immediate heartbeat every time the Datanode detects a volume is full 
+while handling a write request, with per volume throttling. To each immediate heartbeat, we would the storage 
+reports of all volumes in the Datanode + Close Container Action for the particular container being written to. While 
+this would update the storage stats of a volume in the SCM faster, which the SCM can subsequently use to decide 
+whether to include that Datanode in a new pipeline, the per volume throttling has a drawback. It wouldn't let us 
+send close container actions for other containers. We decided not to take this approach.  
+
+## Implementation Plan
+1. HDDS-13045: For triggering immediate heartbeat. Already merged - https://github.com/apache/ozone/pull/8590.
+2. HDDS-12151: Fail a write call if it exceeds min free space boundary (not the focus of this doc, just mentioned 
+   here).


This new design document for 'Full Volume Handling' (JIRA HDDS-12929) appears to be unrelated to the main goal of this pull request, which is about using YAML files for snapshot properties (JIRA HDDS-13006). Please move this and other unrelated documentation changes to a separate pull request to keep this one focused.

gemini-code-assist · 2025-07-10T00:43:58Z

...zone/cli-admin/src/main/java/org/apache/hadoop/hdds/scm/cli/datanode/NodeSelectionMixin.java

+  public String getNodeId() {
+    return !Strings.isNullOrEmpty(selection.nodeId) ? selection.nodeId :
+        !Strings.isNullOrEmpty(selection.id) ? selection.id :
+            !Strings.isNullOrEmpty(selection.uuid) ? selection.uuid : "";
+  }


This change seems unrelated to the main purpose of this pull request (snapshot properties in YAML). It would be better to move this and other related CLI refactoring changes to a separate pull request for better focus and reviewability.

Additionally, the chained ternary operator can be difficult to read. A series of if statements would improve readability.

public String getNodeId() { if (!Strings.isNullOrEmpty(selection.nodeId)) { return selection.nodeId; } if (!Strings.isNullOrEmpty(selection.id)) { return selection.id; } if (!Strings.isNullOrEmpty(selection.uuid)) { return selection.uuid; } return ""; }

gemini-code-assist · 2025-07-10T00:43:58Z

hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OmSnapshotLocalData.java

+
+  /**
+   * Adds an entry to the compacted SST file list.
+   * @param ver Version number (TODO: to be clarified)


This Javadoc comment contains a TODO to clarify the ver parameter. This should be resolved before merging to improve code clarity and maintainability.

gemini-code-assist · 2025-07-10T00:43:59Z

...op-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OmSnapshotLocalDataYaml.java

+  public static OmSnapshotLocalDataYaml getFromYamlStream(InputStream input) throws IOException {
+    OmSnapshotLocalDataYaml dataYaml;
+
+    PropertyUtils propertyUtils = new PropertyUtils();
+    propertyUtils.setBeanAccess(BeanAccess.FIELD);
+    propertyUtils.setAllowReadOnlyProperties(true);
+
+    DumperOptions options = new DumperOptions();
+    Representer representer = new Representer(options);
+    representer.setPropertyUtils(propertyUtils);
+
+    SafeConstructor snapshotDataConstructor = new SnapshotLocalDataConstructor();
+
+    Yaml yaml = new Yaml(snapshotDataConstructor, representer);
+
+    try {
+      dataYaml = yaml.load(input);
+    } catch (YAMLException ex) {
+      // Unchecked exception. Convert to IOException
+      throw new IOException(ex);
+    }
+
+    if (dataYaml == null) {
+      // If Yaml#load returned null, then the file is empty. This is valid yaml
+      // but considered an error in this case since we have lost data about
+      // the snapshot.
+      throw new IOException("Failed to load snapshot file. File is empty.");
+    }
+
+    return dataYaml;
+  }


The logic for creating a Yaml instance is duplicated from the getYamlForSnapshotLocalData() method. To avoid code duplication and improve maintainability, this method should be simplified to call getYamlForSnapshotLocalData() directly.

public static OmSnapshotLocalDataYaml getFromYamlStream(InputStream input) throws IOException { OmSnapshotLocalDataYaml dataYaml; final Yaml yaml = getYamlForSnapshotLocalData(); try { dataYaml = yaml.load(input); } catch (YAMLException ex) { // Unchecked exception. Convert to IOException throw new IOException(ex); } if (dataYaml == null) { // If Yaml#load returned null, then the file is empty. This is valid yaml // but considered an error in this case since we have lost data about // the snapshot. throw new IOException("Failed to load snapshot file. File is empty."); } return dataYaml; }

gemini-code-assist bot reviewed Jun 4, 2025

View reviewed changes

gemini-code-assist bot suggested changes Jun 4, 2025

View reviewed changes

jojochuang mentioned this pull request Jun 4, 2025

HDDS-13006. Use yaml files to host Ozone snapshot local properties apache/ozone#8555

Merged

jojochuang and others added 27 commits July 7, 2025 09:11

HDDS-13268. [Docs] Advanced topics: troubleshooting OM snapshot insta…

39def5d

…llation. (apache#8719)

HDDS-13198. Avoid warning due to wrong min.free.space.percent default…

3a64135

… value (apache#8745)

HDDS-12929. Datanode Should Immediately Trigger Container Close when …

f727051

…Volume Full (apache#8460)

HDDS-13375. Add Ozone logo to the Github README.md (apache#8741)

5af3467

HDDS-12395. Add Mismatched Replica tab to the Containers Page (apache…

9b713d0

…#8407)

HDDS-13401. Mark TestContainerCommandReconciliation as flaky

14748e8

HDDS-13337. Reduce config required to enable Byteman listener (apache…

1315df6

…#8701)

HDDS-13327. Improve log and error handling while starting Recon (apac…

b610fd0

…he#8688)

HDDS-13122. Add acceptance test for ozone repair om compact (apache…

5255c8c

…#8703)

HDDS-13098. Create PicoCLI mixin to standardize datanode selection op…

bca8688

…tions (apache#8674)

HDDS-13162. Improve Container Replica Debugger Tool documentation (ap…

bf9cdd0

…ache#8597)

HDDS-13404. Improve persistent test data dir creation/cleanup (apache…

05c7e62

…#8762)

HDDS-13383. Improve Logs in Ozone documentation. (apache#8744)

03fade0

Co-authored-by: Copilot <[email protected]> Generated-by: Google Gemini Pro 2.5 + Gemini Cli

HDDS-12726. Update Ozone documentation with the latest S3 API compati…

c3fe458

…bility (apache#8735)

HDDS-13006. Use yaml files to host Ozone snapshot local properties

f8555dc

Add implementation OmSnapshotLocalPropertyYamlImpl.

8406572

Generated-by: Claude 3.7 Sonnet (then manually reviewed and tweaked)

Modify SstFilteringService to use OmSnapshotLocalPropertyYamlImpl.

ccac376

Generated-by: Claude 3.7 Sonnet (then manually reviewed and tweaked)

Add unit tests TestOmSnapshotLocalPropertyYamlImpl for `OmSnapshotL…

50f41c9

…ocalPropertyYamlImpl`. Generated-by: Claude 3.7 Sonnet (then manually reviewed and tweaked)

Tweak Impl

156311f

Tweak doc

36f4f6a

Make getProperties() return immutable map.

d83a19b

Cleanup

9689809

Remove OmSnapshotLocalProperty, OmSnapshotLocalPropertyImpl and t…

e5bfeff

…ests.

Add OmSnapshotLocalData and its YAML impl; add UTs; removed added l…

10117e0

…ogic in `SstFilteringService`

Polish

d60fd53

Fix method name

c8747d7

Test: Remove unused variable

2c55315

smengcl added 6 commits July 9, 2025 17:39

Make OmSnapshotLocalData abstract; clean up ser-de method in `OmSna…

120f5a4

…pshotLocalDataYaml`

Fix UT usage

ce59626

Restore SstFilteringService

11d7bac

Remove SnapshotLocalDataRepresenter since we are dumping ALL fields…

bbe8453

…, not selective ones.

Clean up

c15920e

Add version field.

57c5c32

jojochuang force-pushed the HDDS-13006 branch from 4c2d2ea to 57c5c32 Compare July 10, 2025 00:39

gemini-code-assist bot reviewed Jul 10, 2025

View reviewed changes

HDDS-13006. Use yaml files to host Ozone snapshot local properties #569

Are you sure you want to change the base?

HDDS-13006. Use yaml files to host Ozone snapshot local properties #569

Uh oh!

Conversation

jojochuang commented Jun 4, 2025

What changes were proposed in this pull request?

What is the link to the Apache JIRA

How was this patch tested?

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Summary of Changes

Highlights

Changelog

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Summary of Findings

Merge Readiness

Uh oh!

gemini-code-assist bot Jun 4, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jun 4, 2025

Choose a reason for hiding this comment

Style Guide References

Uh oh!

jojochuang commented Jul 10, 2025

Uh oh!

github-actions bot commented Jul 10, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jul 10, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jul 10, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jul 10, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jul 10, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jul 10, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

10 participants