-
Notifications
You must be signed in to change notification settings - Fork 588
HDDS-11259. [hsync] DataNode should verify HBASE_SUPPORT layout version for every PutBlock. #7012
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…on for every PutBlock. Change-Id: I7aab81d99238dad92d94a354f87b607c686c8735
ashishkumar50
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jojochuang, Thanks for the patch.
|
|
||
| db.getStore().putBlockByID(batch, incrementalEnabled, localID, data, | ||
| if (incrementalEnabled.get() && !VersionedDatanodeFeatures.isFinalized( | ||
| HDDSLayoutFeature.HBASE_SUPPORT)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
finalizeBlock(..) in BlockManagerImpl can also have this check and return error before finalize.
if (!VersionedDatanodeFeatures.isFinalized(
HDDSLayoutFeature.HBASE_SUPPORT)) { }
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But since OM also checks upgrade finalize before and call will fail in OM itself, So in this case client will not call DN and I think this check is not required in DN in case of finalizeBlock.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree. We should add the HDDSLayoutFeature.HBASE_SUPPORT check in finalizeBlock too. Although there is check in OM, the check in DN is also preferred. It's like the double insurance.
| HDDSLayoutFeature.HBASE_SUPPORT)) { | ||
| LOG.warn("DataNode has not finalized upgrading to a version that " + | ||
| "supports incremental chunk list. Fallback to full chunk list"); | ||
| incrementalEnabled.set(false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is chance that if pubBlock is first called before HDDSLayoutFeature.HBASE_SUPPORT is finalized, then incrementalEnabled will be set as false. In this case, incrementalEnabled will not come back as "true" without the DN restart.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jojochuang , while going through the code, I found we have two properties currently for incremental chunk list,
"ozone.client.incremental.chunk.list" // client side configure, default is false. For Hbase ozone client, this should be true. For other application ozone client, this can be false.
"ozone.incremental.chunk.list" // server side configure. Can we remove this property? Because it's default value is false, once HBASE_SUPPORT finalized, user needs to change it's value to true, and then restart the DN for that. Once "HDDSLayoutFeature.HBASE_SUPPORT" feature is finalized, we expect this property always be true, right? So If it can be removed, then we only need to check whether to reject a partial chunk list based on "HDDSLayoutFeature.HBASE_SUPPORT" feature is finalized or not on DN side.
BTW, there is one case, client sends a partial chunk list, server side "HDDSLayoutFeature.HBASE_SUPPORT" feature is not finalized yet, then this client request should fail, looks like DatanodeStoreWithIncrementalChunkList#putBlockByID doesn't cover this case yet.
|
BTW, before the "HDDSLayoutFeature.HBASE_SUPPORT" is finalized, the FINALIZE_BLOCKS and LAST_CHUNK_INFO column family will be created by DN due to the RocksDB option setCreateMissingColumnFamilies(true), which I think is not perfect, but is fine. |
Yes IIRC this is true with all layout features that add new column families. The older versions just ignore them. |
1. reject finalizeBlock if upgrade is not finalized. 2. Remove DataNode-side ozone.incremental.chunk.list configuration. 3. Reject client if it attempts to send chunks in incremental chunk list format before upgrade finalizes. Change-Id: I51cfd922937ab8fb1d49b55615b09291a8f7c71f
|
Updated the PR. Address review comments. |
Change-Id: Ia5e0871d5eec9a80a2f3d59b00a9f2b31d202ca1
ashishkumar50
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @jojochuang, LGTM.
| boolean incrementalEnabled = true; | ||
| if (!VersionedDatanodeFeatures.isFinalized(HDDSLayoutFeature.HBASE_SUPPORT)) { | ||
| if (isPartialChunkList(data)) { | ||
| throw new UnsupportedOperationException("DataNode has not finalized " + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure the exception is correct here. On the client side we use the unchecked UnsupportedOperationException because it OK to crash the client at that point. On the server we want to throw an exception type and result code that will be translated back to the client. I think this would be StorageContainerException with a new result code similar to NOT_SUPPORTED_OPERATION_PRIOR_FINALIZATION which exists on the OM.
Note that there's no tests in the PR. Per the mailing thread it looks like we are going to do those after the merge which is ok, but we should try to make sure this is working correctly here because I think this is the first time we have rejected a request based on a layout version on the datanode.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah agreed. I think there are DataNode layout upgrade test code somewhere. Let me find out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Opening a new jira to track adding the test code to validate layout version upgrade behavior. https://issues.apache.org/jira/browse/HDDS-11270
… BlockManagerImpl. Change-Id: I1aed656fbef1a3c3db7a2d5db9b11bf541a23a81
…. It's client's fault. Change-Id: I180d72cac58e400e0b62d30cc404a45ff49936ed
Change-Id: If4be0008dbb4429bd4329a68f239c5851c8dd6f4
errose28
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the updates @jojochuang LGTM from an upgrades perspective. I'm not as familiar with which requests need to be blocked, but I think Ashish is so with both our +1s this is probably good to go.
|
Merged. Thanks @errose28 and @ashishkumar50. |
* master: (181 commits) HDDS-11289. Bump docker-maven-plugin to 0.45.0 (apache#7024) HDDS-11287. Code cleanup in XceiverClientSpi (apache#7043) HDDS-11283. Refactor KeyValueStreamDataChannel to avoid spurious IDE build issues (apache#7040) HDDS-11257. Ozone write does not work when http proxy is set for the JVM. (apache#7036) HDDS-11249. Bump ozone-runner to 20240729-jdk17-1 (apache#7003) HDDS-10517. Recon - Add a UI component for showing DN decommissioning detailed status and info (apache#6724) HDDS-11270. [hsync] Add DN layout version (HBASE_SUPPORT/version 8) upgrade test. (apache#7021) HDDS-11272. Statistics some node status information (apache#7025) HDDS-11278. Move code out of Hadoop util package (apache#7028) HDDS-11274. (addendum) Replace Hadoop annotations/configs with Ozone-specific ones HDDS-11274. Replace Hadoop annotations/configs with Ozone-specific ones (apache#7026) HDDS-11280. Add Synchronize in AbstractCommitWatcher.addAckDataLength (apache#7032) HDDS-11235. Spare InfoBucket RPC call in FileSystem#mkdir() call. (apache#6990) HDDS-11273. Bump commons-compress to 1.26.2 (apache#7023) HDDS-11225. Increase ipc.server.read.threadpool.size (apache#7007) HDDS-11224. Increase hdds.datanode.handler.count (apache#7011) HDDS-11259. [hsync] DataNode should verify HBASE_SUPPORT layout version for every PutBlock. (apache#7012) HDDS-11214. Added config to set rocksDB's max log file size and num of log files (apache#7014) HDDS-11226. Make ExponentialBackoffPolicy maxRetries configurable (apache#6985) HDDS-11260. [hsync] Add Ozone Manager protocol version (apache#7015) ... Conflicts: hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/audit/DNAction.java hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueHandler.java hadoop-hdds/interface-client/src/main/proto/DatanodeClientProtocol.proto hadoop-hdds/server-scm/src/test/java/org/apache/hadoop/hdds/scm/container/TestContainerReportHandler.java
…p-supervisor Merge conflicts are resolved but the change does not yet build. * HDDS-10239-container-reconciliation: (183 commits) HDDS-10376. Add a Datanode API to supply a merkle tree for a given container. (apache#6945) HDDS-11289. Bump docker-maven-plugin to 0.45.0 (apache#7024) HDDS-11287. Code cleanup in XceiverClientSpi (apache#7043) HDDS-11283. Refactor KeyValueStreamDataChannel to avoid spurious IDE build issues (apache#7040) HDDS-11257. Ozone write does not work when http proxy is set for the JVM. (apache#7036) HDDS-11249. Bump ozone-runner to 20240729-jdk17-1 (apache#7003) HDDS-10517. Recon - Add a UI component for showing DN decommissioning detailed status and info (apache#6724) HDDS-10926. Block deletion should update container merkle tree. (apache#6875) HDDS-11270. [hsync] Add DN layout version (HBASE_SUPPORT/version 8) upgrade test. (apache#7021) HDDS-11272. Statistics some node status information (apache#7025) HDDS-11278. Move code out of Hadoop util package (apache#7028) HDDS-11274. (addendum) Replace Hadoop annotations/configs with Ozone-specific ones HDDS-11274. Replace Hadoop annotations/configs with Ozone-specific ones (apache#7026) HDDS-11280. Add Synchronize in AbstractCommitWatcher.addAckDataLength (apache#7032) HDDS-11235. Spare InfoBucket RPC call in FileSystem#mkdir() call. (apache#6990) HDDS-11273. Bump commons-compress to 1.26.2 (apache#7023) HDDS-11225. Increase ipc.server.read.threadpool.size (apache#7007) HDDS-11224. Increase hdds.datanode.handler.count (apache#7011) HDDS-11259. [hsync] DataNode should verify HBASE_SUPPORT layout version for every PutBlock. (apache#7012) HDDS-11214. Added config to set rocksDB's max log file size and num of log files (apache#7014) ... Conflicts: hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/ozoneimpl/OzoneContainer.java
…rrupt-files * HDDS-10239-container-reconciliation: (183 commits) HDDS-10376. Add a Datanode API to supply a merkle tree for a given container. (apache#6945) HDDS-11289. Bump docker-maven-plugin to 0.45.0 (apache#7024) HDDS-11287. Code cleanup in XceiverClientSpi (apache#7043) HDDS-11283. Refactor KeyValueStreamDataChannel to avoid spurious IDE build issues (apache#7040) HDDS-11257. Ozone write does not work when http proxy is set for the JVM. (apache#7036) HDDS-11249. Bump ozone-runner to 20240729-jdk17-1 (apache#7003) HDDS-10517. Recon - Add a UI component for showing DN decommissioning detailed status and info (apache#6724) HDDS-10926. Block deletion should update container merkle tree. (apache#6875) HDDS-11270. [hsync] Add DN layout version (HBASE_SUPPORT/version 8) upgrade test. (apache#7021) HDDS-11272. Statistics some node status information (apache#7025) HDDS-11278. Move code out of Hadoop util package (apache#7028) HDDS-11274. (addendum) Replace Hadoop annotations/configs with Ozone-specific ones HDDS-11274. Replace Hadoop annotations/configs with Ozone-specific ones (apache#7026) HDDS-11280. Add Synchronize in AbstractCommitWatcher.addAckDataLength (apache#7032) HDDS-11235. Spare InfoBucket RPC call in FileSystem#mkdir() call. (apache#6990) HDDS-11273. Bump commons-compress to 1.26.2 (apache#7023) HDDS-11225. Increase ipc.server.read.threadpool.size (apache#7007) HDDS-11224. Increase hdds.datanode.handler.count (apache#7011) HDDS-11259. [hsync] DataNode should verify HBASE_SUPPORT layout version for every PutBlock. (apache#7012) HDDS-11214. Added config to set rocksDB's max log file size and num of log files (apache#7014) ... Conflicts: hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/checksum/ContainerChecksumTreeManager.java hadoop-hdds/container-service/src/test/java/org/apache/hadoop/ozone/container/checksum/TestContainerChecksumTreeManager.java
What changes were proposed in this pull request?
HDDS-11259. [hsync] DataNode should verify HBASE_SUPPORT layout version for every PutBlock.
The layout version check for incremental chunk list support is performed at DataNode startup only, which means that DataNode would reject any such clients until the next restart after finalization.
The check should be moved later inside PutBlocks handling.
What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-11259
How was this patch tested?