-
Notifications
You must be signed in to change notification settings - Fork 589
HDDS-13176. containerIds table value format change to proto from string #8589
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This pull request changes the containerIds table value type from a String to a proto-based ContainerCreateInfo, enabling the future extension of additional fields in the container metadata. Key changes include:
- Adding a new proto message (ContainerCreateInfo) to represent container creation info.
- Updating tests and codec implementations to support the new ContainerCreateInfo type.
- Modifying existing container metadata management classes and upgrade actions to use the new type.
Reviewed Changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| hadoop-hdds/interface-client/src/main/proto/DatanodeClientProtocol.proto | Introduces the ContainerCreateInfo message with a state field. |
| hadoop-hdds/container-service/src/test/java/org/apache/hadoop/ozone/container/upgrade/TestDatanodeUpgradeToContainerIdsTable.java | Updates test logic to read/write ContainerCreateInfo instead of string. |
| hadoop-hdds/container-service/src/test/java/org/apache/hadoop/ozone/container/metadata/TestContainerCreateInfo.java | Adds tests for protobuf conversion, codec serialization/deserialization, and codec compatibility. |
| hadoop-hdds/container-service/src/java/org/apache/hadoop/ozone/container/upgrade/ContainerTableSchemaFinalizeAction.java | Revises the upgrade action to re-write table entries using the new type. |
| Other files | Update type definitions and usages for containerIdsTable from String to ContainerCreateInfo. |
Comments suppressed due to low confidence (2)
hadoop-hdds/interface-client/src/main/proto/DatanodeClientProtocol.proto:562
- If using proto3, consider removing the 'required' modifier on the state field (line 563) since proto3 does not support 'required'. Verify that the intended proto syntax version aligns with this usage.
message ContainerCreateInfo {
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/metadata/ContainerCreateInfo.java:106
- Consider adding error handling for cases where the deserialized string does not match any valid state enum value to avoid potential runtime exceptions.
String val = StringCodec.get().fromPersistedFormat(rawData);
...er-service/src/main/java/org/apache/hadoop/ozone/container/metadata/ContainerCreateInfo.java
Outdated
Show resolved
Hide resolved
swamirishi
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sumitagrawl Thank you for working on the patch. I like the overall set of changes. However I have left some comments inline which can potentially have correctness issues.
...er-service/src/main/java/org/apache/hadoop/ozone/container/metadata/ContainerCreateInfo.java
Outdated
Show resolved
Hide resolved
.../main/java/org/apache/hadoop/ozone/container/upgrade/ContainerTableSchemaFinalizeAction.java
Outdated
Show resolved
Hide resolved
...t/java/org/apache/hadoop/ozone/container/upgrade/TestDatanodeUpgradeToContainerIdsTable.java
Show resolved
Hide resolved
|
@errose28 Can you also please take a look at the upgrade part of the patch |
@swamirishi As discussed, datanode upgrade flow as below, which is non-rolling including all nodes together,
As such write operation is not there when dn in implicit upgrade phase. But check is also added in write operation as precaution here. Updated all comments. |
hadoop-hdds/common/src/main/java/org/apache/hadoop/hdds/upgrade/HDDSLayoutFeature.java
Outdated
Show resolved
Hide resolved
...er-service/src/main/java/org/apache/hadoop/ozone/container/metadata/ContainerCreateInfo.java
Outdated
Show resolved
Hide resolved
.../main/java/org/apache/hadoop/ozone/container/upgrade/ContainerTableSchemaFinalizeAction.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sumitagrawl Atomicity is still a problem. We need to update the version file and rocksdb column family atomically that would be only possible when we create a new column family and we should read from the new column family after the finalize.
...er-service/src/main/java/org/apache/hadoop/ozone/container/metadata/ContainerCreateInfo.java
Outdated
Show resolved
Hide resolved
...er-service/src/main/java/org/apache/hadoop/ozone/container/metadata/ContainerCreateInfo.java
Show resolved
Hide resolved
.../main/java/org/apache/hadoop/ozone/container/upgrade/ContainerTableSchemaFinalizeAction.java
Outdated
Show resolved
Hide resolved
.../main/java/org/apache/hadoop/ozone/container/upgrade/ContainerTableSchemaFinalizeAction.java
Show resolved
Hide resolved
|
@sumitagrawl once we finalize the approach can you run some tests on how long this reformat will take and any memory impact it might have? Assume a dense 400TB node which would have ~80,000 container entries in this DB. |
swamirishi
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sumitagrawl Overall changes are heading in the right direction I have some opinions on the design please make the changes if you agree with the suggestion
| return containerIdsTable; | ||
| } | ||
|
|
||
| public PreviousVersionBasedTable getPreviousVersionBasedTable() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about having a TreeMap loading different defnitions based on the current software layout version for initializing the store?
NavigableMap.floorEntry(currentSoftwareLayoutVersion) can give the right definition.
I believe this can be present in WitnessedCOntainerDBDefinition that could just return the correct definition based on the version.
This would be a much cleaner implementation. WitnessedContainerDBStore could be the same after this then.
| } | ||
|
|
||
| DBColumnFamilyDefinition<ContainerID, String> getContainerIdsTable() { | ||
| DBColumnFamilyDefinition<ContainerID, ContainerCreateInfo> getContainerIdsTable() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This constructor can initialize from the map of Definitions. I couldn't comment on the constructor directly
| DBColumnFamilyDefinition<ContainerID, ContainerCreateInfo> getContainerIdsTable() { | |
| DBColumnFamilyDefinition<ContainerID, ContainerCreateInfo> getContainerIdsTable() { | |
| Map<String, DBColumnFamilyDefinition> columnFamilies = definitions.floorEntry(currentVersion) | |
| super(columnFamilies); | |
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do not need versioning in definition when creation. Currently its simple map of current definition its handling through out code.
Code redability gets reduced and may need maintain updating for every LayoutFeatureVersion gets added and error prone.
Current code is for specific version handling only during upgrade and its simple to point which one its referring. So considering current case, its not required and such case is rare.
If this issue is frequent, then there is an issue of design of Table which can not support backward compatibility. And we should use proto to handle same.
IMO, this change is not required.
...tainer-service/src/main/java/org/apache/hadoop/ozone/container/common/impl/ContainerSet.java
Outdated
Show resolved
Hide resolved
80000 entries shouldn't be that big of a deal. It should run in a few seconds. Loading the JVM might take longer than the upgrade process itself. |
@errose28 the finalize feature is run over Local Mac, its taking 1-2 second only Code sample modified to TestDatanodeUpgradeToContainerIdsTable#testContainerTableAccessBeforeAndAfterUpgrade Time take for upgrade 1233ms |
swamirishi
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sumitagrawl thanks the changes almost look good to go. Have a few nitpicks and also one major thing that needs to be fixed please check the comments inline.
...src/main/java/org/apache/hadoop/ozone/container/metadata/WitnessedContainerDBDefinition.java
Outdated
Show resolved
Hide resolved
...ain/java/org/apache/hadoop/ozone/container/metadata/WitnessedContainerMetadataStoreImpl.java
Outdated
Show resolved
Hide resolved
| WitnessedContainerMetadataStore metadataStore = arg.getContainer().getWitnessedContainerMetadataStore(); | ||
| Table<ContainerID, ContainerCreateInfo> previousTable | ||
| = ((WitnessedContainerMetadataStoreImpl) metadataStore).getPreviousVersionBasedTable().getContainerIdsTable(); | ||
| Table<ContainerID, ContainerCreateInfo> currTable = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to truncate the entire table before copying entries. We could do by individual deletes or maybe use the deleteRange api which would cover the entire LongRange
[0-9]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated with deleteRange.
swamirishi
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sumitagrawl I have a better idea to implement the changed containerIds table right now we are touching multiple classes unnecessarily.
...tainer-service/src/main/java/org/apache/hadoop/ozone/container/common/impl/ContainerSet.java
Outdated
Show resolved
Hide resolved
...tainer-service/src/main/java/org/apache/hadoop/ozone/container/common/impl/ContainerSet.java
Outdated
Show resolved
Hide resolved
| public Table<ContainerID, String> getContainerIdsTable() { | ||
| return containerIdsTable; | ||
| public Table<ContainerID, ContainerCreateInfo> getContainerCreateInfoTable() { | ||
| if (!VersionedDatanodeFeatures.isFinalized(HDDSLayoutFeature.WITNESSED_CONTAINER_DB_PROTO_VALUE)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let us create a wrapper Table implementation which would automatically get updated to the finalized table once the layout upgrade is finalized. We can return the wrapped Table interface which would mean the caller of this method can keep this value in memory and we wouldn't have to update anything on the caller behaviour. Right now the caller is having to call witnessedMetadataStore.getContainerCreateInfoTable().put or delete everytime we can completely get this abstracted out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could have this Wrapped object present as member in the class itself. So on finalize all we need to update is a boolean value for the present in the wrapped table.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can lead to bugs if called incorrectly. It is important that we abstract this out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Created above JIRA for reference, I do not agree for caching table, as it need re-update after upgrade or other cases in future to update the reference.
It can be more discussed if have real value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have multiple instances in the code base where we cache the reference to the table. The expectation from what I understand is that once the metadata store is initialized the caller is free to cache the reference right now this model doesn't prevent that. We should either come up with a model which prevents caching the reference altogether which would mean redesigning the interface or we need to make the implementation abstracted so that it is fool proof to reference caching.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few instances...
ozone/hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/ListIterator.java
Lines 300 to 302 in 5d1b43d
| MinHeapIterator(OMMetadataManager omMetadataManager, String prefixKey, | |
| String startKey, String volumeName, String bucketName, | |
| Table... tables) throws IOException { |
Lines 336 to 339 in 48c985f
| private <VALUE> void recalculateUsages( | |
| Table<String, VALUE> table, Map<String, CountPair> prefixUsageMap, | |
| String strType, boolean haveValue) throws UncheckedIOException, | |
| UncheckedExecutionException { |
Lines 60 to 65 in 48c985f
| public PipelineStateManagerImpl( | |
| Table<PipelineID, Pipeline> pipelineStore, NodeManager nodeManager, | |
| DBTransactionBuffer buffer) throws IOException { | |
| this.pipelineStateMap = new PipelineStateMap(); | |
| this.nodeManager = nodeManager; | |
| this.pipelineStore = pipelineStore; |
Our entire code base is spread with such use cases.
If you are proposing something different then we should design the interface to prevent future bugs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It can be taken up with the JIRA as mentioned above for designing and defining interface
swamirishi
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As long as we are tracking this as a known issue I am fine with merging this issue. Thanks for the patch LGTM @sumitagrawl
* master: (730 commits) HDDS-13083. Handle cases where block deletion generates tree file before scanner (apache#8565) HDDS-12982. Reduce log level for snapshot validation failure (apache#8851) HDDS-13396. Documentation: Improve the top-level overview page for new users. (apache#8753) HDDS-13176. containerIds table value format change to proto from string (apache#8589) HDDS-13449. Incorrect Interrupt Handling for DirectoryDeletingService and KeyDeletingService (apache#8817) HDDS-2453. Add Freon tests for S3 MPU Keys (apache#8803) HDDS-13237. Container data checksum should contain block IDs. (apache#8773) HDDS-13489. Fix SCMBlockdeleting unnecessary iteration in corner case. (apache#8847) HDDS-13464. Make ozone.snapshot.filtering.service.interval reconfigurable (apache#8825) HDDS-13473. Amend validation for OZONE_OM_SNAPSHOT_DB_MAX_OPEN_FILES (apache#8829) HDDS-13435. Add an OzoneManagerAuthorizer interface (apache#8840) HDDS-8565. Recon memory leak in NSSummary (apache#8823). HDDS-12852. Implement a sliding window counter utility (apache#8498) HDDS-12000. Add unit test for RatisContainerSafeModeRule and ECContainerSafeModeRule (apache#8801) HDDS-13092. Container scanner should trigger volume scan when marking a container unhealthy (apache#8603) HDDS-13070. OM Follower changes to create and place sst files from hardlink file. (apache#8761) HDDS-13482. Mark testWriteStateMachineDataIdempotencyWithClosedContainer as flaky HDDS-13481. Fix success latency metric in SCM panels of deletion grafana dashboard (apache#8835) HDDS-13468. Update default value of ozone.scm.ha.dbtransactionbuffer.flush.interval. (apache#8834) HDDS-13410. Control block deletion for each DN from SCM. (apache#8767) ... hadoop-hdds/common/src/main/java/org/apache/hadoop/hdds/scm/container/ContainerReplicaInfo.java hadoop-ozone/cli-admin/src/main/java/org/apache/hadoop/hdds/scm/cli/container/ReconcileSubcommand.java hadoop-ozone/cli-admin/src/test/java/org/apache/hadoop/hdds/scm/cli/container/TestReconcileSubcommand.java
What changes were proposed in this pull request?
containerIds Table format is currently <Long, String> where key is containerId and value is state. This format does not allow any further data to be stored in value like replicationIndex or need concatenation to string.
To provide the extension to use value to add various field in compatible manner, String is changed to ContainerCreateInfo, which is having state as one of member.
Later on need bases, further field can be added in proto like replication index.
What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-13176
How was this patch tested?