Skip to content

Conversation

@ChenSammi
Copy link
Contributor

@ChenSammi ChenSammi commented Feb 19, 2025

What changes were proposed in this pull request?

Remove chunkPath and metadataPath from container yaml file

Changes made in this patch,

  1. add a DATANODE_SCHEMA_V4 Datanode Layout feature
  2. add a "4" schemaVersion for container schema V4
  3. behavior
    1. before feature "DATANODE_SCHEMA_V4" is finalized, new createContainerRequest/importContainer will create V3 container
    2. after feature "DATANODE_SCHEMA_V4" is finalized, createContainerRequest will create V4 container. If import a schema V3 container, it will be automatically converted to V4 schema container. When a V3 container has chance to update its yaml file, it will be converted to V4 automatically too. A schema V2 container will not change, remains as a V2 container.

So if it's a V2/V3 schema container, its yaml file will have chunkPath and metadataPath. If it's a V4 schema container, then its yaml file doesn't have chunkPath and metadataPath.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-6611

How was this patch tested?

  1. new unit tests
  2. enabled Schema V4 for some existing unit tests which test container

@ChenSammi ChenSammi marked this pull request as draft February 19, 2025 10:09
@ChenSammi ChenSammi marked this pull request as ready for review February 19, 2025 10:32
Copy link
Contributor

@adoroszlai adoroszlai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ChenSammi for the patch. Mostly LGTM, few minor improvements suggested.

Comment on lines 350 to 354
if (isFinalized(HDDSLayoutFeature.DATANODE_SCHEMA_V4)) {
return KV_YAML_FIELDS_SCHEMA_V4;
} else {
return Collections.unmodifiableList(KV_YAML_FIELDS);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should return unmodifiableList in both cases.

Suggested change
if (isFinalized(HDDSLayoutFeature.DATANODE_SCHEMA_V4)) {
return KV_YAML_FIELDS_SCHEMA_V4;
} else {
return Collections.unmodifiableList(KV_YAML_FIELDS);
}
List<String> list = isFinalized(HDDSLayoutFeature.DATANODE_SCHEMA_V4)
? KV_YAML_FIELDS_SCHEMA_V4
: KV_YAML_FIELDS;
return Collections.unmodifiableList(list);

Comment on lines 323 to 325
if (!kvData.hasSchema(SCHEMA_V4)) {
kvData.setMetadataPath((String) nodes.get(OzoneConsts.METADATA_PATH));
kvData.setChunksPath((String) nodes.get(OzoneConsts.CHUNKS_PATH));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This condition would need to be updated if SCHEMA_V5 is introduced. Can it be changed to something like: kvData.hasSchemaBefore(SCHEMA_V4)? Please feel free to come up with a better name.

(Same applies to conversion to YAML.)

Comment on lines 87 to 88
conf.setBoolean(OzoneConfigKeys.HDDS_CONTAINER_RATIS_DATASTREAM_ENABLED, true);
conf.setBoolean(OzoneConfigKeys.HDDS_CONTAINER_RATIS_DATASTREAM_RANDOM_PORT, true);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is datastream used in the test? Is it needed?

Comment on lines 340 to 341
// sleep 1s to make sure creationTime will change
Thread.sleep(1000);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we avoid this? Not only does it make tests slower, assertions about timing are flaky when relying on sleep.

If prod code already uses Clock, we can use TestClock in the tests. Otherwise, I'd propose removing assertions about creationTime.

Copy link
Contributor Author

@ChenSammi ChenSammi Feb 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sleep here is for detecting the container directory is recreated during container import. Without this sleep, the creationTime of old directory and new directory most time will be same. I prefer to keep the creationTime here. As the creationTime is set by OS file system, I guess TestClock is not applicable here.

Comment on lines 223 to 227
assertTrue(new File(newContainerData.getContainerPath()).exists());
assertTrue(new File(newContainerData.getChunksPath()).exists());
assertTrue(new File(newContainerData.getMetadataPath()).exists());
if (schemaV3Enabled) {
assertTrue(newContainerData.getDbFile().exists());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: assertThat(<some File>).exists(); provides better error message if the assertion fails.

Comment on lines 346 to 354
assertEquals(newContainerData.getContainerDBType(), oldContainerData.getContainerDBType());
assertEquals(newContainerData.getState(), oldContainerData.getState());
assertEquals(newContainerData.getBlockCount(), oldContainerData.getBlockCount());
assertEquals(newContainerData.getLayoutVersion(), oldContainerData.getLayoutVersion());
assertEquals(newContainerData.getMaxSize(), oldContainerData.getMaxSize());
assertEquals(newContainerData.getBytesUsed(), oldContainerData.getBytesUsed());
assertEquals(newContainerData.getMetadataPath(), oldContainerData.getMetadataPath());
assertEquals(newContainerData.getChunksPath(), oldContainerData.getChunksPath());
assertEquals(newContainerData.getContainerPath(), oldContainerData.getContainerPath());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Please extract a method to avoid duplicating a bunch of assertions in test cases.

Comment on lines 76 to 78
public static final String CONTAINER_SCHEMA_V4_ENABLED =
"hdds.datanode.container.schema.v4.enabled";
public static final boolean CONTAINER_SCHEMA_V4_ENABLED_DEFAULT = true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We had issues with the schema v3 feature flag in the past because if it was enabled then disabled, the schema v3 containers would not get loaded. I don't think we should have config keys for container schema versions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's merely for some existing test cases specifically for V3. Let me check if they can be refactored.

@ChenSammi
Copy link
Contributor Author

@adoroszlai @errose28 , thanks for the review. Comments are addressed. Would you like to take another look?

@adoroszlai
Copy link
Contributor

Thanks @ChenSammi for updating the patch. Let's wait for @errose28 to take a look.

HBASE_SUPPORT(8, "Datanode RocksDB Schema Version 3 has an extra table " +
"for the last chunk of blocks to support HBase.)");
"for the last chunk of blocks to support HBase.)"),
DATANODE_SCHEMA_V4(9, "Container yaml file doesn't require chunksPath and metadataPath");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
DATANODE_SCHEMA_V4(9, "Container yaml file doesn't require chunksPath and metadataPath");
DATANODE_SCHEMA_V4(9, "Container YAML file doesn't require chunksPath and metadataPath");

Comment on lines +246 to +247
// V4: Column families is same as V3,
// removed chunkPath and metadataPath in .container file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// V4: Column families is same as V3,
// removed chunkPath and metadataPath in .container file
/**
* Schema version 4 for Ozone container files.
* <p>
* The column families remain the same as defined in {@link #SCHEMA_V3}.
* However, the {@code chunkPath} and {@code metadataPath}
* fields have been removed in this version of the .container files.
* </p>
*/


Yaml yaml = ContainerDataYaml.getYamlForContainerType(
containerData.getContainerType(),
containerData.getContainerType(), containerData,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
containerData.getContainerType(), containerData,
containerData.getContainerType(),
containerData,

return readContainer(inputFileStream);
KeyValueContainerData containerData = (KeyValueContainerData) readContainer(inputFileStream);
if (containerData.getChunksPath() == null) {
containerData.setChunksPath(containerFile.getParentFile().getParentFile().getAbsolutePath()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please move containerFile.getParentFile().getParentFile().getAbsolutePath() .concat(OZONE_URI_DELIMITER).concat(STORAGE_DIR_CHUNKS) into separate variable.

yamlFields = new ArrayList<>(yamlFields);
yamlFields.add(REPLICA_INDEX);
}
if (((KeyValueContainerData)containerData).olderSchemaThan(SCHEMA_V4)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (((KeyValueContainerData)containerData).olderSchemaThan(SCHEMA_V4)) {
if (((KeyValueContainerData) containerData).olderSchemaThan(SCHEMA_V4)) {

KeyValueContainerData newContainerData =
new KeyValueContainerData(containerID1,
oldContainerData.getLayoutVersion(),
oldContainerData.getMaxSize(), pipeline.getId().getId().toString(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
oldContainerData.getMaxSize(), pipeline.getId().getId().toString(),
oldContainerData.getMaxSize(),
pipeline.getId().getId().toString(),

Comment on lines +331 to +332
KeyValueContainerData data = (KeyValueContainerData) ContainerDataYaml
.readContainer(containerDescriptorYaml);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
KeyValueContainerData data = (KeyValueContainerData) ContainerDataYaml
.readContainer(containerDescriptorYaml);
KeyValueContainerData data = (KeyValueContainerData) ContainerDataYaml.readContainer(containerDescriptorYaml);

newContainer.importContainerData(fis, packer);
}

assertTrue(isContainerEqual(newContainerData, oldContainerData));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use assertThat

newContainer.getContainerData().getChunksPath());
assertEquals(yamlFile.getParentFile().getAbsolutePath(), newContainer.getContainerData().getMetadataPath());
FileTime creationTime2 = (FileTime) Files.getAttribute(
Paths.get(newContainer.getContainerData().getContainerPath()), "creationTime");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please move into a separate variable

assertNotEquals(creationTime1.toInstant(), creationTime2.toInstant());
}

private boolean isContainerEqual(KeyValueContainerData containerData1, KeyValueContainerData containerData2) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do not need such method. Usually matchers have field-by-field comparison methods.

@errose28
Copy link
Contributor

@ChenSammi I've reviewed the code in context with your comment on the jira and I still do not think this requires a schema version. Actually the change should be much simpler without it:

  • For new containers, don't write the values to the yaml
  • For old containers, ignore the existing values except during hash calculation, where we will load all yaml fields.

Checking usages of OzoneConsts#SCHEMA_v4 we see most uses are boilerplate for adding a new schema in general. The places making decisions based on schema version are:

  • container import
    • I don't see any advantage to migrating old containers on import. If we are finalized we can simply ignore the fields if they are present.
  • container file rewrite
    • Same as above, I do not think we should be doing migration here.
  • Yaml hash calculation
    • We can check if the path fields are present to make the same decisions without needing to check a schema version. This is already done in the current patch in ContainerDataYaml#readContainerFile.

@ChenSammi
Copy link
Contributor Author

ChenSammi commented Feb 26, 2025

  * We can check if the path fields are present to make the same decisions without needing to check a schema version. This is already done in the current patch in `ContainerDataYaml#readContainerFile`.

@errose28 , it's different, you can check ContainerUtils#verifyChecksum, where the container yaml file checksum is verified. I would love to adopt it if it can verify both container yaml file with these two fields and without the two fields in an elegant way. One possible solution is read the entire container file into String first, check whether the String contains "chunksPath" and "metadataPath", and then create Yaml object.

@ChenSammi
Copy link
Contributor Author

ChenSammi commented Feb 26, 2025

@ivanzlenko , thank you for the review. I will try to address some of them in next patch. For the rest, I would like to keep them, for example, keeping multiple parameters in one line. Actually one line one parameter is not a recommended style in Ozone, we should avoid to use that. And those style related comments in test case codes.

@ivanzlenko
Copy link
Contributor

ivanzlenko commented Apr 2, 2025

Actually one line one parameter is not a recommended style in Ozone, we should avoid to use that. And those style related comments in test case codes.

In many cases it is very much hurts readability:

callFunc(param1, param2, param3,
    param4, param5,
    aVeryLongParameterNameWhichTakesAlmostAllSpace,
    param7, param8);

Compared to:

callFunc(
    param1,
    param2,
    param3,
    param4,
    param5,
    aVeryLongParameterNameWhichTakesAlmostAllSpace,
    param7,
    param8
);

It is way easier to understand what is going on and which parameters are involved in a function call.
However it maybe worth pull into GitHub discussion.

@adoroszlai
Copy link
Contributor

/pending conflicts, reviews

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Marking this issue as un-mergeable as requested.

Please use /ready comment when it's resolved.

Please note that the PR will be closed after 21 days of inactivity from now. (But can be re-opened anytime later...)

conflicts, reviews

@github-actions
Copy link

Thank you very much for the patch. I am closing this PR temporarily as there was no activity recently and it is waiting for response from its author.

It doesn't mean that this PR is not important or ignored: feel free to reopen the PR at any time.

It only means that attention of committers is not required. We prefer to keep the review queue clean. This ensures PRs in need of review are more visible, which results in faster feedback for all PRs.

If you need ANY help to finish this PR, please contact the community on the mailing list or the slack channel."

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants