Skip to content

Conversation

@smengcl
Copy link
Contributor

@smengcl smengcl commented Jun 3, 2025

What changes were proposed in this pull request?

  • Use one .yaml file for each Ozone snapshot DB to store local properties for each Ozone snapshop DB checkpoint. e.g. isSstFiltered
  • Tweaked SstFilteringService to use the new implementation, while keeping the old logic around until we have upgrade logic written.

Written with aid with Claude 3.7 Sonnet.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-13006

How was this patch tested?

  • Added unit tests in TestOmSnapshotLocalPropertyYamlImpl.

@smengcl smengcl added the snapshot https://issues.apache.org/jira/browse/HDDS-6517 label Jun 3, 2025
@swamirishi
Copy link
Contributor

Thanks for sharing the base interface for the snapshot property. I have left some comments inline

@jojochuang
Copy link
Contributor

Please take a look at Gemini's review report too: jojochuang#569 (review)

Copy link
Contributor

@jojochuang jojochuang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what I'm looking at... are you missing a few source code files? @smengcl

@smengcl
Copy link
Contributor Author

smengcl commented Jun 4, 2025

Not sure what I'm looking at... are you missing a few source code files? @smengcl

Yup this is intentional. This is a (real) draft. As discussed offline with @swamirishi and @SaketaChalamchala earlier, the first step would be to determine the interface so that @SaketaChalamchala can use it.

smengcl added 7 commits June 7, 2025 23:02
Generated-by: Claude 3.7 Sonnet (then manually reviewed and tweaked)
Generated-by: Claude 3.7 Sonnet (then manually reviewed and tweaked)
…ocalPropertyYamlImpl`.

Generated-by: Claude 3.7 Sonnet (then manually reviewed and tweaked)
@smengcl
Copy link
Contributor Author

smengcl commented Jun 8, 2025

This one is ready for review now. @swamirishi @jojochuang

Copy link
Contributor

@swamirishi swamirishi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on the patch @smengcl IMO we can make the changes in the interface to tightly coupled with the schema of SnapshotMetaData which is not going to change much

// First try to read the flag from YAML file
String yamlPath = OmSnapshotManager.getSnapshotLocalPropertyYamlPath(ozoneConfiguration, snapshotInfo);

try (OmSnapshotLocalProperty localProperties = new OmSnapshotLocalPropertyYamlImpl(yamlPath)) {
Copy link
Contributor

@swamirishi swamirishi Jun 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There could be a potential race condition if multiple threads parallely try to open. It would be better if this comes from snapshotCache or make the cache abstract (ReferenceCountedCache) enough to load any resource and have one implementation for loading OmSnapshot and one reference counted cache implementation for OmSnapshotLocalProperty

Copy link
Contributor

@swamirishi swamirishi Jun 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OmSnapshotLocalProperty can also take a write lock on the snapshotId to prevent SNAPSHOT_PROPERTY_LOCK from getting updated by multiple threads. This could be simpler.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. If we want it to be race-free, an easy thought would be to guard r/w access behind SnapshotCache r/w lock. But one thing I don't like is that it implies we are opening the snapshot DB just for reading the yaml metadata (assuming no major refactoring of the SnapshotCache).

Let me see if flock would work..

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah lock would be simpler. We can have duplicate objects in that case

Copy link
Contributor

@SaketaChalamchala SaketaChalamchala Jun 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We would also need to take a bootstrap lock here

BootstrapStateHandler.java

Copy link
Contributor

@swamirishi swamirishi Jun 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might not need bootstrap handler lock. Since we would be taking a lock on all the background services which would be updating the metadata file and no ratis transaction excepting for create snapshot is going to update this metadata file.

@smengcl
Copy link
Contributor Author

smengcl commented Jun 9, 2025

TODO:

  • Use POJO
  • Add checksum field
  • Add R/W lock

// Dump yaml data into a string to compute its checksum
String snapshotDataYamlStr = yaml.dump(this);

this.checksum = getChecksum(snapshotDataYamlStr);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the YARML library changes output format, or if we add additional fields to it, it can break on upgrade.
I'm not sure how to handle it better though.

Copy link
Contributor Author

@smengcl smengcl Jun 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Checksum should be calculated before it is deserialized into a OmSnapshotLocalData* somehow. I didn't have an elegant way to handle this on top of my mind.

In other words, this behavior might be broken for ContainerData (ContainerDataYaml) as well. Need to check.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed offline we can do this in a separate jira that handles the upgrade. Basically I have an idea that we can use the version field together with the SnapshotLocalDataRepresenter that I removed in 73c1702 (#8555) to make sure that checksum calculation is not broken with new fields being added in later yaml versions.

* @param yamlFile The file to write to
* @throws IOException If there's an error writing to the file
*/
public void writeToYaml(File yamlFile) throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there atomicity or consistency requirements using this class?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. We might use existing SNAPSHOT_DB_LOCK or a new SNAPSHOT_LOCAL_DATA_LOCK lock to lock it during read/write but that would be up to the caller to implement that. It is out of the scope of this PR.

@smengcl smengcl requested a review from swamirishi June 24, 2025 15:20
@smengcl
Copy link
Contributor Author

smengcl commented Jun 24, 2025

The implementation is done in the scope of the jira. Checksum-versioning would be added in a follow-up jira. Pls take another look at this. Thx. @SaketaChalamchala @jojochuang

Copy link
Contributor

@jojochuang jojochuang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. A few comments can be addressed later. Merging it now.

* Sets the last compaction time, in epoch milliseconds.
* @param lastCompactionTime Timestamp of the last compaction
*/
public void setLastCompactionTime(Long lastCompactionTime) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor issue: The setLastCompactionTime method accepts a Long object, but the lastCompactionTime field is a primitive long. If a null value is passed to this method, it will result in a NullPointerException when unboxing. To prevent this, the parameter type should be changed to the primitive long.

Comment on lines +231 to +241
PropertyUtils propertyUtils = new PropertyUtils();
propertyUtils.setBeanAccess(BeanAccess.FIELD);
propertyUtils.setAllowReadOnlyProperties(true);

DumperOptions options = new DumperOptions();
Representer representer = new Representer(options);
representer.setPropertyUtils(propertyUtils);

SafeConstructor snapshotDataConstructor = new SnapshotLocalDataConstructor();

Yaml yaml = new Yaml(snapshotDataConstructor, representer);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: It seems this can be simplified.

Suggested change
PropertyUtils propertyUtils = new PropertyUtils();
propertyUtils.setBeanAccess(BeanAccess.FIELD);
propertyUtils.setAllowReadOnlyProperties(true);
DumperOptions options = new DumperOptions();
Representer representer = new Representer(options);
representer.setPropertyUtils(propertyUtils);
SafeConstructor snapshotDataConstructor = new SnapshotLocalDataConstructor();
Yaml yaml = new Yaml(snapshotDataConstructor, representer);
final Yaml yaml = getYamlForSnapshotLocalData();

However getYamlForSnapshotLocalData() also invokes "representer.addClassTag(OmSnapshotLocalDataYaml.class, SNAPSHOT_YAML_TAG);" which is not repeated here.


/**
* Adds an entry to the compacted SST file list.
* @param ver Version number (TODO: to be clarified)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO

@jojochuang jojochuang marked this pull request as ready for review July 10, 2025 01:03
@jojochuang
Copy link
Contributor

Error:  Used undeclared dependencies found:
Error:     org.yaml:snakeyaml:jar:2.0:compile
Error:  Failed to execute goal org.apache.maven.plugins:maven-dependency-plugin:3.8.1:analyze-only (analyze) on project ozone-manager: Dependency problems found -> [Help 1]

Change-Id: I6f258b5f8eb08affd4438f640471d927a6599c3d
@jojochuang jojochuang merged commit e08b001 into apache:master Jul 10, 2025
42 checks passed
jojochuang pushed a commit to jojochuang/ozone that referenced this pull request Jul 31, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

snapshot https://issues.apache.org/jira/browse/HDDS-6517

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants