-
Notifications
You must be signed in to change notification settings - Fork 588
HDDS-11254. Reconcile commands should be handled by datanode ReplicationSupervisor #7076
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
kerneltime
merged 13 commits into
apache:HDDS-10239-container-reconciliation
from
errose28:HDDS-11254-rep-supervisor
Aug 21, 2024
Merged
HDDS-11254. Reconcile commands should be handled by datanode ReplicationSupervisor #7076
kerneltime
merged 13 commits into
apache:HDDS-10239-container-reconciliation
from
errose28:HDDS-11254-rep-supervisor
Aug 21, 2024
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Depends on PRs that are still open
…p-supervisor Merge conflicts are resolved but the change does not yet build. * HDDS-10239-container-reconciliation: (183 commits) HDDS-10376. Add a Datanode API to supply a merkle tree for a given container. (apache#6945) HDDS-11289. Bump docker-maven-plugin to 0.45.0 (apache#7024) HDDS-11287. Code cleanup in XceiverClientSpi (apache#7043) HDDS-11283. Refactor KeyValueStreamDataChannel to avoid spurious IDE build issues (apache#7040) HDDS-11257. Ozone write does not work when http proxy is set for the JVM. (apache#7036) HDDS-11249. Bump ozone-runner to 20240729-jdk17-1 (apache#7003) HDDS-10517. Recon - Add a UI component for showing DN decommissioning detailed status and info (apache#6724) HDDS-10926. Block deletion should update container merkle tree. (apache#6875) HDDS-11270. [hsync] Add DN layout version (HBASE_SUPPORT/version 8) upgrade test. (apache#7021) HDDS-11272. Statistics some node status information (apache#7025) HDDS-11278. Move code out of Hadoop util package (apache#7028) HDDS-11274. (addendum) Replace Hadoop annotations/configs with Ozone-specific ones HDDS-11274. Replace Hadoop annotations/configs with Ozone-specific ones (apache#7026) HDDS-11280. Add Synchronize in AbstractCommitWatcher.addAckDataLength (apache#7032) HDDS-11235. Spare InfoBucket RPC call in FileSystem#mkdir() call. (apache#6990) HDDS-11273. Bump commons-compress to 1.26.2 (apache#7023) HDDS-11225. Increase ipc.server.read.threadpool.size (apache#7007) HDDS-11224. Increase hdds.datanode.handler.count (apache#7011) HDDS-11259. [hsync] DataNode should verify HBASE_SUPPORT layout version for every PutBlock. (apache#7012) HDDS-11214. Added config to set rocksDB's max log file size and num of log files (apache#7014) ... Conflicts: hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/ozoneimpl/OzoneContainer.java
kerneltime
reviewed
Aug 14, 2024
...e/src/main/java/org/apache/hadoop/ozone/container/checksum/ContainerChecksumTreeManager.java
Show resolved
Hide resolved
...service/src/main/java/org/apache/hadoop/ozone/container/checksum/ReconcileContainerTask.java
Show resolved
Hide resolved
...rvice/src/main/java/org/apache/hadoop/ozone/protocol/commands/ReconcileContainerCommand.java
Outdated
Show resolved
Hide resolved
Contributor
|
This looks good, once the CI passes on your fork we can make it ready to merge. |
Contributor
Author
|
Passing on my fork here |
errose28
added a commit
to errose28/ozone
that referenced
this pull request
Aug 26, 2024
…an-on-error * HDDS-10239-container-reconciliation: (428 commits) HDDS-11081. Use thread-local instance of FileSystem in Freon tests (apache#7091) HDDS-11333. Avoid hard-coded current version in upgrade/xcompat tests (apache#7089) Mark TestPipelineManagerMXBean#testPipelineInfo as flaky Mark TestAddRemoveOzoneManager#testForceBootstrap as flaky HDDS-11352. HDDS-11353. Mark TestOzoneManagerHAWithStoppedNodes as flaky HDDS-11354. Mark TestOzoneManagerSnapshotAcl#testLookupKeyWithNotAllowedUserForPrefixAcl as flaky HDDS-11355. Mark TestMultiBlockWritesWithDnFailures#testMultiBlockWritesWithIntermittentDnFailures as flaky HDDS-11227. Use server default key provider to encrypt/decrypt keys from multiple OMs. (apache#7081) HDDS-11316. Improve Create Key and Chunk IO Dashboards (apache#7075) HDDS-11239. Fix KeyOutputStream's exception handling when calling hsync concurrently (apache#7047) HDDS-11254. Reconcile commands should be handled by datanode ReplicationSupervisor (apache#7076) HDDS-11331. Fix Datanode unable to report for a long time (apache#7090) HDDS-11346. FS CLI gives incorrect recursive volume deletion prompt (apache#7102) HDDS-11349. Add NullPointer handling when volume/bucket tables are not initialized (apache#7103) HDDS-11209. Avoid insufficient EC pipelines in the container pipeline cache (apache#6974) HDDS-11284. refactor quota repair non-blocking while upgrade (apache#7035) HDDS-9790. Add tests for Overview page (apache#6983) HDDS-10904. [hsync] Enable PutBlock piggybacking and incremental chunk list by default (apache#7074) HDDS-11322. [hsync] Block ECKeyOutputStream from calling hsync and hflush (apache#7098) HDDS-11325. Intermittent failure in TestBlockOutputStreamWithFailures#testContainerClose (apache#7099) ... Conflicts: hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/checksum/ContainerChecksumTreeManager.java hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueContainerCheck.java hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueHandler.java hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/ozoneimpl/OzoneContainer.java
errose28
added a commit
to errose28/ozone
that referenced
this pull request
Aug 28, 2024
…rrupt-files * HDDS-10239-container-reconciliation: (61 commits) HDDS-11081. Use thread-local instance of FileSystem in Freon tests (apache#7091) HDDS-11333. Avoid hard-coded current version in upgrade/xcompat tests (apache#7089) Mark TestPipelineManagerMXBean#testPipelineInfo as flaky Mark TestAddRemoveOzoneManager#testForceBootstrap as flaky HDDS-11352. HDDS-11353. Mark TestOzoneManagerHAWithStoppedNodes as flaky HDDS-11354. Mark TestOzoneManagerSnapshotAcl#testLookupKeyWithNotAllowedUserForPrefixAcl as flaky HDDS-11355. Mark TestMultiBlockWritesWithDnFailures#testMultiBlockWritesWithIntermittentDnFailures as flaky HDDS-11227. Use server default key provider to encrypt/decrypt keys from multiple OMs. (apache#7081) HDDS-11316. Improve Create Key and Chunk IO Dashboards (apache#7075) HDDS-11239. Fix KeyOutputStream's exception handling when calling hsync concurrently (apache#7047) HDDS-11254. Reconcile commands should be handled by datanode ReplicationSupervisor (apache#7076) HDDS-11331. Fix Datanode unable to report for a long time (apache#7090) HDDS-11346. FS CLI gives incorrect recursive volume deletion prompt (apache#7102) HDDS-11349. Add NullPointer handling when volume/bucket tables are not initialized (apache#7103) HDDS-11209. Avoid insufficient EC pipelines in the container pipeline cache (apache#6974) HDDS-11284. refactor quota repair non-blocking while upgrade (apache#7035) HDDS-9790. Add tests for Overview page (apache#6983) HDDS-10904. [hsync] Enable PutBlock piggybacking and incremental chunk list by default (apache#7074) HDDS-11322. [hsync] Block ECKeyOutputStream from calling hsync and hflush (apache#7098) HDDS-11325. Intermittent failure in TestBlockOutputStreamWithFailures#testContainerClose (apache#7099) ... Conflicts: hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/checksum/ContainerChecksumTreeManager.java
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
Summary
In the placeholder reconciliation implementation, handling of reconcile container commands from SCM to datanodes are done in the
KeyValueHandler. In the final implementation they should go through theReplicationSupervisorlike replication and reconstruction commands for proper scheduling.This change also makes sure the same
ContainerChecksumTreeManagerinstance is available in all the places it is needed within the datanode, and closes its metrics properly instead of the workaround added in HDDS-10373.Specific Changes
ContainerChecksumTreeManageris unified to one instance for the whole datanode.ReplicationSupervisorshould drop duplicate commands.ReconcileContainerCommandHandlerusesReplicationSupervisorfor queueing. It does not implement it's own thread pool and queue.ReplicationSupervisorfor metrics.ReconstructECContainersCommandHandlerandReplicateContainerCommandHandlerare implemented (total time and invocation count are unused in that class)ReplicationSupervisorMetrics, but that would be a broader change outside the scope of this PR.KeyValueHandlerandHddsDispatcherinstances when only a susbet of the functionality is required were added and existing tests were updated.KeyValueHandlerconstructor allowing us to make changes here to get theContainerChecksumTreeManagerwhere we need it.What is the link to the Apache JIRA
HDDS-11254
How was this patch tested?
TestReconcileCommandHandlertests for metrics were removed since those are now covered byTestReplicationSupervisor.TestReconcileContainerTaskthat it properly detects task failure and equality.