HDDS-13176. containerIds table value format change to proto from string #8589

sumitagrawl · 2025-06-09T07:00:22Z

What changes were proposed in this pull request?

containerIds Table format is currently <Long, String> where key is containerId and value is state. This format does not allow any further data to be stored in value like replicationIndex or need concatenation to string.

To provide the extension to use value to add various field in compatible manner, String is changed to ContainerCreateInfo, which is having state as one of member.

Later on need bases, further field can be added in proto like replication index.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-13176

How was this patch tested?

Pending

Copilot

Pull Request Overview

This pull request changes the containerIds table value type from a String to a proto-based ContainerCreateInfo, enabling the future extension of additional fields in the container metadata. Key changes include:

Adding a new proto message (ContainerCreateInfo) to represent container creation info.
Updating tests and codec implementations to support the new ContainerCreateInfo type.
Modifying existing container metadata management classes and upgrade actions to use the new type.

Reviewed Changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated no comments.

Show a summary per file

File	Description
hadoop-hdds/interface-client/src/main/proto/DatanodeClientProtocol.proto	Introduces the ContainerCreateInfo message with a state field.
hadoop-hdds/container-service/src/test/java/org/apache/hadoop/ozone/container/upgrade/TestDatanodeUpgradeToContainerIdsTable.java	Updates test logic to read/write ContainerCreateInfo instead of string.
hadoop-hdds/container-service/src/test/java/org/apache/hadoop/ozone/container/metadata/TestContainerCreateInfo.java	Adds tests for protobuf conversion, codec serialization/deserialization, and codec compatibility.
hadoop-hdds/container-service/src/java/org/apache/hadoop/ozone/container/upgrade/ContainerTableSchemaFinalizeAction.java	Revises the upgrade action to re-write table entries using the new type.
Other files	Update type definitions and usages for containerIdsTable from String to ContainerCreateInfo.

Comments suppressed due to low confidence (2)

hadoop-hdds/interface-client/src/main/proto/DatanodeClientProtocol.proto:562

If using proto3, consider removing the 'required' modifier on the state field (line 563) since proto3 does not support 'required'. Verify that the intended proto syntax version aligns with this usage.

message ContainerCreateInfo {

hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/metadata/ContainerCreateInfo.java:106

Consider adding error handling for cases where the deserialized string does not match any valid state enum value to avoid potential runtime exceptions.

String val = StringCodec.get().fromPersistedFormat(rawData);

...er-service/src/main/java/org/apache/hadoop/ozone/container/metadata/ContainerCreateInfo.java

swamirishi

@sumitagrawl Thank you for working on the patch. I like the overall set of changes. However I have left some comments inline which can potentially have correctness issues.

...er-service/src/main/java/org/apache/hadoop/ozone/container/metadata/ContainerCreateInfo.java

.../main/java/org/apache/hadoop/ozone/container/upgrade/ContainerTableSchemaFinalizeAction.java

...t/java/org/apache/hadoop/ozone/container/upgrade/TestDatanodeUpgradeToContainerIdsTable.java

swamirishi · 2025-06-10T12:38:51Z

@errose28 Can you also please take a look at the upgrade part of the patch

sumitagrawl · 2025-06-11T11:19:58Z

@errose28 Can you also please take a look at the upgrade part of the patch

@swamirishi As discussed, datanode upgrade flow as below, which is non-rolling including all nodes together,

prepare OM node (scm and dn do not have prepare step)
replace the software for om, scm , dn
restart all components
verify the state, Dn implicitly in read only only mode
Finalize SCM which in-turn finalize DN, and then finalize OM
Service will start working with new schema.

As such write operation is not there when dn in implicit upgrade phase. But check is also added in write operation as precaution here.

Updated all comments.

hadoop-hdds/common/src/main/java/org/apache/hadoop/hdds/upgrade/HDDSLayoutFeature.java

...er-service/src/main/java/org/apache/hadoop/ozone/container/metadata/ContainerCreateInfo.java

.../main/java/org/apache/hadoop/ozone/container/upgrade/ContainerTableSchemaFinalizeAction.java

swamirishi

@sumitagrawl Atomicity is still a problem. We need to update the version file and rocksdb column family atomically that would be only possible when we create a new column family and we should read from the new column family after the finalize.

...er-service/src/main/java/org/apache/hadoop/ozone/container/metadata/ContainerCreateInfo.java

.../main/java/org/apache/hadoop/ozone/container/upgrade/ContainerTableSchemaFinalizeAction.java

errose28 · 2025-06-13T15:00:04Z

@sumitagrawl once we finalize the approach can you run some tests on how long this reformat will take and any memory impact it might have? Assume a dense 400TB node which would have ~80,000 container entries in this DB.

swamirishi

@sumitagrawl Overall changes are heading in the right direction I have some opinions on the design please make the changes if you agree with the suggestion

swamirishi · 2025-06-13T15:15:47Z

...ain/java/org/apache/hadoop/ozone/container/metadata/WitnessedContainerMetadataStoreImpl.java

    return containerIdsTable;
  }
+
+  public PreviousVersionBasedTable getPreviousVersionBasedTable() {


How about having a TreeMap loading different defnitions based on the current software layout version for initializing the store?
NavigableMap.floorEntry(currentSoftwareLayoutVersion) can give the right definition.
I believe this can be present in WitnessedCOntainerDBDefinition that could just return the correct definition based on the version.
This would be a much cleaner implementation. WitnessedContainerDBStore could be the same after this then.

swamirishi · 2025-06-13T15:23:00Z

...src/main/java/org/apache/hadoop/ozone/container/metadata/WitnessedContainerDBDefinition.java

  }

-  DBColumnFamilyDefinition<ContainerID, String> getContainerIdsTable() {
+  DBColumnFamilyDefinition<ContainerID, ContainerCreateInfo> getContainerIdsTable() {


This constructor can initialize from the map of Definitions. I couldn't comment on the constructor directly

Suggested change

DBColumnFamilyDefinition<ContainerID, ContainerCreateInfo> getContainerIdsTable() {

DBColumnFamilyDefinition<ContainerID, ContainerCreateInfo> getContainerIdsTable() {

Map<String, DBColumnFamilyDefinition> columnFamilies = definitions.floorEntry(currentVersion)

super(columnFamilies);

}

We do not need versioning in definition when creation. Currently its simple map of current definition its handling through out code.

Code redability gets reduced and may need maintain updating for every LayoutFeatureVersion gets added and error prone.

Current code is for specific version handling only during upgrade and its simple to point which one its referring. So considering current case, its not required and such case is rare.

If this issue is frequent, then there is an issue of design of Table which can not support backward compatibility. And we should use proto to handle same.

IMO, this change is not required.

...tainer-service/src/main/java/org/apache/hadoop/ozone/container/common/impl/ContainerSet.java

swamirishi · 2025-06-13T15:31:49Z

@sumitagrawl once we finalize the approach can you run some tests on how long this reformat will take and any memory impact it might have? Assume a dense 400TB node which would have ~80,000 container entries in this DB.

80000 entries shouldn't be that big of a deal. It should run in a few seconds. Loading the JVM might take longer than the upgrade process itself.

sumitagrawl · 2025-06-13T16:14:28Z

80000

@errose28 the finalize feature is run over Local Mac, its taking 1-2 second only

Code sample modified to TestDatanodeUpgradeToContainerIdsTable#testContainerTableAccessBeforeAndAfterUpgrade

    for (int i = 0; i < 80000; ++i) {
      tableWithStringCodec.put(ContainerID.valueOf(i), OPEN.name());
    }

    long start = Time.monotonicNow();
    dsm.finalizeUpgrade();
    System.out.println("Time take for upgrade " + (Time.monotonicNow() - start) + "ms");
    System.out.println("Estimated key count " + metadataStore.getContainerIdsTable().getEstimatedKeyCount());

Time take for upgrade 1233ms
Estimated key count 80001

swamirishi

@sumitagrawl thanks the changes almost look good to go. Have a few nitpicks and also one major thing that needs to be fixed please check the comments inline.

...src/main/java/org/apache/hadoop/ozone/container/metadata/WitnessedContainerDBDefinition.java

...ain/java/org/apache/hadoop/ozone/container/metadata/WitnessedContainerMetadataStoreImpl.java

swamirishi · 2025-06-23T14:01:59Z

.../main/java/org/apache/hadoop/ozone/container/upgrade/ContainerTableSchemaFinalizeAction.java

+    WitnessedContainerMetadataStore metadataStore = arg.getContainer().getWitnessedContainerMetadataStore();
+    Table<ContainerID, ContainerCreateInfo> previousTable
+        = ((WitnessedContainerMetadataStoreImpl) metadataStore).getPreviousVersionBasedTable().getContainerIdsTable();
+    Table<ContainerID, ContainerCreateInfo> currTable =


We need to truncate the entire table before copying entries. We could do by individual deletes or maybe use the deleteRange api which would cover the entire LongRange
[0-9]

updated with deleteRange.

swamirishi

@sumitagrawl I have a better idea to implement the changed containerIds table right now we are touching multiple classes unnecessarily.

...tainer-service/src/main/java/org/apache/hadoop/ozone/container/common/impl/ContainerSet.java

swamirishi · 2025-07-22T08:45:44Z

...ain/java/org/apache/hadoop/ozone/container/metadata/WitnessedContainerMetadataStoreImpl.java

-  public Table<ContainerID, String> getContainerIdsTable() {
-    return containerIdsTable;
+  public Table<ContainerID, ContainerCreateInfo> getContainerCreateInfoTable() {
+    if (!VersionedDatanodeFeatures.isFinalized(HDDSLayoutFeature.WITNESSED_CONTAINER_DB_PROTO_VALUE)) {


Let us create a wrapper Table implementation which would automatically get updated to the finalized table once the layout upgrade is finalized. We can return the wrapped Table interface which would mean the caller of this method can keep this value in memory and we wouldn't have to update anything on the caller behaviour. Right now the caller is having to call witnessedMetadataStore.getContainerCreateInfoTable().put or delete everytime we can completely get this abstracted out.

We could have this Wrapped object present as member in the class itself. So on finalize all we need to update is a boolean value for the present in the wrapped table.

This can lead to bugs if called incorrectly. It is important that we abstract this out.

https://issues.apache.org/jira/browse/HDDS-13488

Created above JIRA for reference, I do not agree for caching table, as it need re-update after upgrade or other cases in future to update the reference.

It can be more discussed if have real value.

We have multiple instances in the code base where we cache the reference to the table. The expectation from what I understand is that once the metadata store is initialized the caller is free to cache the reference right now this model doesn't prevent that. We should either come up with a model which prevents caching the reference altogether which would mean redesigning the interface or we need to make the implementation abstracted so that it is fool proof to reference caching.

A few instances...

ozone/hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/ListIterator.java

Lines 300 to 302 in 5d1b43d

MinHeapIterator(OMMetadataManager omMetadataManager, String prefixKey,

String startKey, String volumeName, String bucketName,

Table... tables) throws IOException {

ozone/hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/service/QuotaRepairTask.java

Lines 336 to 339 in 48c985f

private <VALUE> void recalculateUsages(

Table<String, VALUE> table, Map<String, CountPair> prefixUsageMap,

String strType, boolean haveValue) throws UncheckedIOException,

UncheckedExecutionException {

ozone/hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/pipeline/PipelineStateManagerImpl.java

Lines 60 to 65 in 48c985f

public PipelineStateManagerImpl(

Table<PipelineID, Pipeline> pipelineStore, NodeManager nodeManager,

DBTransactionBuffer buffer) throws IOException {

this.pipelineStateMap = new PipelineStateMap();

this.nodeManager = nodeManager;

this.pipelineStore = pipelineStore;

Our entire code base is spread with such use cases.
If you are proposing something different then we should design the interface to prevent future bugs.

It can be taken up with the JIRA as mentioned above for designing and defining interface

swamirishi

As long as we are tracking this as a known issue I am fine with merging this issue. Thanks for the patch LGTM @sumitagrawl

* master: (730 commits) HDDS-13083. Handle cases where block deletion generates tree file before scanner (apache#8565) HDDS-12982. Reduce log level for snapshot validation failure (apache#8851) HDDS-13396. Documentation: Improve the top-level overview page for new users. (apache#8753) HDDS-13176. containerIds table value format change to proto from string (apache#8589) HDDS-13449. Incorrect Interrupt Handling for DirectoryDeletingService and KeyDeletingService (apache#8817) HDDS-2453. Add Freon tests for S3 MPU Keys (apache#8803) HDDS-13237. Container data checksum should contain block IDs. (apache#8773) HDDS-13489. Fix SCMBlockdeleting unnecessary iteration in corner case. (apache#8847) HDDS-13464. Make ozone.snapshot.filtering.service.interval reconfigurable (apache#8825) HDDS-13473. Amend validation for OZONE_OM_SNAPSHOT_DB_MAX_OPEN_FILES (apache#8829) HDDS-13435. Add an OzoneManagerAuthorizer interface (apache#8840) HDDS-8565. Recon memory leak in NSSummary (apache#8823). HDDS-12852. Implement a sliding window counter utility (apache#8498) HDDS-12000. Add unit test for RatisContainerSafeModeRule and ECContainerSafeModeRule (apache#8801) HDDS-13092. Container scanner should trigger volume scan when marking a container unhealthy (apache#8603) HDDS-13070. OM Follower changes to create and place sst files from hardlink file. (apache#8761) HDDS-13482. Mark testWriteStateMachineDataIdempotencyWithClosedContainer as flaky HDDS-13481. Fix success latency metric in SCM panels of deletion grafana dashboard (apache#8835) HDDS-13468. Update default value of ozone.scm.ha.dbtransactionbuffer.flush.interval. (apache#8834) HDDS-13410. Control block deletion for each DN from SCM. (apache#8767) ... hadoop-hdds/common/src/main/java/org/apache/hadoop/hdds/scm/container/ContainerReplicaInfo.java hadoop-ozone/cli-admin/src/main/java/org/apache/hadoop/hdds/scm/cli/container/ReconcileSubcommand.java hadoop-ozone/cli-admin/src/test/java/org/apache/hadoop/hdds/scm/cli/container/TestReconcileSubcommand.java

…ng (apache#8589)

sumitagrawl and others added 4 commits May 20, 2025 11:00

HDDS-12926. remove *.tmp.* exclusion in DU

c1d94aa

Merge branch 'apache:master' into master

0eabb6e

Merge branch 'apache:master' into master

d39a809

HDDS-13176. containerIds table value format change to proto from string

44e2cb8

sumitagrawl requested a review from Copilot June 9, 2025 07:03

This comment was marked as outdated.

Sign in to view

sumitagrawl added 4 commits June 9, 2025 13:13

fix checkstyle

15f8143

fix test failure

9e5522a

fix copilot comment

99b6ef3

add test cases

82eeb0d

sumitagrawl requested review from Copilot and swamirishi June 10, 2025 07:19

Copilot AI reviewed Jun 10, 2025

View reviewed changes

sumitagrawl requested a review from siddhantsangwan June 10, 2025 07:19

fix checkstyle

1f39476

sumitagrawl marked this pull request as ready for review June 10, 2025 08:18

swamirishi reviewed Jun 10, 2025

View reviewed changes

...er-service/src/main/java/org/apache/hadoop/ozone/container/metadata/ContainerCreateInfo.java Outdated Show resolved Hide resolved

swamirishi requested changes Jun 10, 2025

View reviewed changes

swamirishi requested a review from errose28 June 10, 2025 12:38

fix review comment

bd5f494

sumitagrawl requested a review from swamirishi June 11, 2025 11:15

fix checkstyle

f89fc31

errose28 reviewed Jun 12, 2025

View reviewed changes

review comment fix

ec8e08f

sumitagrawl requested a review from errose28 June 13, 2025 05:52

swamirishi requested changes Jun 13, 2025

View reviewed changes

swamirishi marked this pull request as draft June 13, 2025 05:59

fix review comments

fc5ba25

fix review comment

b3c3c12

sumitagrawl requested a review from swamirishi June 13, 2025 14:08

swamirishi requested changes Jun 13, 2025

View reviewed changes

fix review comment

6a1d9b2

sumitagrawl requested a review from swamirishi June 13, 2025 16:04

sumitagrawl marked this pull request as ready for review June 13, 2025 16:20

sumitagrawl and others added 3 commits June 16, 2025 12:08

Merge branch 'apache:master' into master

2f8040c

Merge branch 'apache:master' into master

788aad7

Merge remote-tracking branch 'origin/master' into HDDS-13176

1052e97

swamirishi requested changes Jun 23, 2025

View reviewed changes

fix review comments

6e22964

sumitagrawl requested a review from swamirishi June 23, 2025 16:19

sumitagrawl and others added 8 commits June 26, 2025 09:28

Merge branch 'apache:master' into master

2850588

Merge remote-tracking branch 'origin/master' into HDDS-13176

ec08da8

fix failure after merge

fa05a56

Merge branch 'apache:master' into master

d14ae78

Merge remote-tracking branch 'origin/master' into HDDS-13176

8b28beb

checkstyle fix

073b74b

Merge branch 'apache:master' into master

b4dc071

Merge remote-tracking branch 'origin/master' into HDDS-13176

c1ff360

swamirishi requested changes Jul 22, 2025

View reviewed changes

sumitagrawl requested a review from swamirishi July 22, 2025 09:21

review fix

9af70bd

swamirishi approved these changes Jul 22, 2025

View reviewed changes

sumitagrawl merged commit 96f1db8 into apache:master Jul 23, 2025
80 of 82 checks passed

jojochuang pushed a commit to jojochuang/ozone that referenced this pull request Jul 31, 2025

HDDS-13176. containerIds table value format change to proto from stri…

9fe2bbc

…ng (apache#8589)

	MinHeapIterator(OMMetadataManager omMetadataManager, String prefixKey,
	String startKey, String volumeName, String bucketName,
	Table... tables) throws IOException {

	private <VALUE> void recalculateUsages(
	Table<String, VALUE> table, Map<String, CountPair> prefixUsageMap,
	String strType, boolean haveValue) throws UncheckedIOException,
	UncheckedExecutionException {

	public PipelineStateManagerImpl(
	Table<PipelineID, Pipeline> pipelineStore, NodeManager nodeManager,
	DBTransactionBuffer buffer) throws IOException {
	this.pipelineStateMap = new PipelineStateMap();
	this.nodeManager = nodeManager;
	this.pipelineStore = pipelineStore;

HDDS-13176. containerIds table value format change to proto from string #8589

HDDS-13176. containerIds table value format change to proto from string #8589

Uh oh!

Conversation

sumitagrawl commented Jun 9, 2025

What changes were proposed in this pull request?

What is the link to the Apache JIRA

How was this patch tested?

Uh oh!

This comment was marked as outdated.

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

swamirishi left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

swamirishi commented Jun 10, 2025

Uh oh!

sumitagrawl commented Jun 11, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

swamirishi left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

errose28 commented Jun 13, 2025

Uh oh!

swamirishi left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

swamirishi Jun 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

swamirishi commented Jun 13, 2025

Uh oh!

sumitagrawl commented Jun 13, 2025

Uh oh!

swamirishi left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

swamirishi left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

swamirishi left a comment •

edited

Loading

swamirishi Jun 13, 2025 •

edited

Loading

swamirishi Jul 22, 2025 •

edited

Loading