-
Notifications
You must be signed in to change notification settings - Fork 588
HDDS-12810. Check and reserve space atomically in VolumeChoosingPolicy #8360
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HDDS-12810. Check and reserve space atomically in VolumeChoosingPolicy #8360
Conversation
…umeChoosingPolicy#chooseVolume
...src/main/java/org/apache/hadoop/ozone/container/replication/SendContainerRequestHandler.java
Show resolved
Hide resolved
...src/main/java/org/apache/hadoop/ozone/container/replication/DownloadAndImportReplicator.java
Show resolved
Hide resolved
...r-service/src/main/java/org/apache/hadoop/ozone/container/replication/ContainerImporter.java
Show resolved
Hide resolved
...ainer-service/src/main/java/org/apache/hadoop/ozone/container/ozoneimpl/ContainerReader.java
Outdated
Show resolved
Hide resolved
|
I noticed that the push/pull replicator reserves twice the container size retrieved from the local configuration ( @ChenSammi, what are your thoughts on this approach? Does it seem reasonable to you? |
...iner-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueContainer.java
Outdated
Show resolved
Hide resolved
Good point to use the actual container size instead, but we don't know what the actual size is (when committing space) before actually downloading the container, right? |
...ainer-service/src/main/java/org/apache/hadoop/ozone/container/ozoneimpl/ContainerReader.java
Show resolved
Hide resolved
|
@peterxcli Thanks for taking this up, it's a good fix. I've left a few review comments. |
...src/main/java/org/apache/hadoop/ozone/container/replication/SendContainerRequestHandler.java
Outdated
Show resolved
Hide resolved
...test/java/org/apache/hadoop/ozone/container/replication/TestSendContainerRequestHandler.java
Outdated
Show resolved
Hide resolved
...test/java/org/apache/hadoop/ozone/container/replication/TestSendContainerRequestHandler.java
Show resolved
Hide resolved
Right, so I think that info should be brought in proto message. |
|
@siddhantsangwan @ChenSammi I have addressed all review comments, please take another look. Thanks! |
Agree. We can do this in a new JIRA. Generally, the tar zip file of container will have smaller size than it's real size, so reserve a 2* max container size is conservative enough in most cases. But there are container over-allocated case, where container size can be double or triple the max container size as @siddhantsangwan saw in user's environment, have a accurate container size can handle this case welly. |
| exceptionThrown = true; | ||
| return ContainerUtils.logAndReturnError(LOG, ex, request); | ||
| } finally { | ||
| if (exceptionThrown) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- please move this newContainerData.releaseCommitSpace() into the exception handling directly.
- we might first check whether space is committed before release, for newContainer.create can throw out exception from volumeChoosingPolicy.chooseVolume too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2. we might first check whether space is committed before release, for newContainer.create can throw out exception from volumeChoosingPolicy.chooseVolume too.
releaseCommitSpace itself already checked that:
Lines 359 to 361 in 57f254d
| if (unused > 0 && committedSpace) { | |
| getVolume().incCommittedBytes(0 - unused); | |
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay.
...ainer-service/src/main/java/org/apache/hadoop/ozone/container/ozoneimpl/ContainerReader.java
Outdated
Show resolved
Hide resolved
...src/main/java/org/apache/hadoop/ozone/container/replication/DownloadAndImportReplicator.java
Outdated
Show resolved
Hide resolved
…ityVolumeChoosingPolicy is thread-safe now
Created one. https://issues.apache.org/jira/browse/HDDS-12998 Also thanks you for spending time to run test and find ways to make the benchmark data better! |
|
i'm curious why capacity-based is several times slower than round-robin. |
It seems that these two lines are the cause: Lines 58 to 60 in f52069b
Lines 82 to 83 in f52069b
Round-robin doesn't iterate or use randomness — in an ideal case, each call should just run one |
|
Final benchmark result(NUM_VOLUMES=10, NUM_THREADS=100, NUM_ITERATIONS = 100000): Table
Chartscc @ChenSammi |
|
@peterxcli , the data looks same as the last one I shared. Are you getting the same data ? |
|
@ChenSammi I was asking gpt to generate the table and result for me, and I give the previous table image you shared as format example, and I didn't notice it use the data in it directly😥. Sorry I didn't check it. BTW, I updated the test result, and I have no idea why synced choosing policies are performing better than no sync. Maybe f52069b really helped. |
sumitagrawl
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@peterxcli Thanks for working over this, having few minor comments
...iner-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueContainer.java
Outdated
Show resolved
Hide resolved
...ainer-service/src/main/java/org/apache/hadoop/ozone/container/ozoneimpl/ContainerReader.java
Outdated
Show resolved
Hide resolved
...ainer-service/src/main/java/org/apache/hadoop/ozone/container/common/impl/ContainerData.java
Show resolved
Hide resolved
526acc0 to
c9d7470
Compare
Yea, the optimization effect is really good. Here is the data I collected for 100000000 operations. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The last patch LGTM. Thanks @peterxcli . @siddhantsangwan , @sumitagrawl , would you like to take another look?
|
@ChenSammi One comment is pending to be fixed, after that is ok to be merged. |
…lume-space-check-and-reservation-as-an-atomic-operation-in-volume-choosing-policy
Thanks for catching this! Done. |
...iner-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueContainer.java
Show resolved
Hide resolved
sumitagrawl
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
Thanks @ChenSammi for reviewing, optimizing and running the benchmark for this patch! And thanks @siddhantsangwan, @sumitagrawl for the review! |
…239-container-reconciliation Commits: 113 commits 8ceb5c3 Revert "HDDS-11232. Spare InfoBucket RPC call for the FileSystem#getFileStatus calls for the general case. (apache#6988)" (apache#8358) 8ecf4a8 HDDS-12993. Use DatanodeID in ReconNodeManager. (apache#8456) 3e09a11 HDDS-13024. HTTP connect to 0.0.0.0 failed (apache#8439) c3cb14f HDDS-12914. Bump awssdk to 2.31.40, test ResumableFileDownload (apache#8455) c6b47e9 HDDS-4677. Document Ozone Ports and Connection End Points (apache#8226) ae0d757 HDDS-13044. Remove DatanodeDetails#getUuid usages from hdds-common/client/container-service (apache#8462) 2bf3f16 HDDS-12568. Implement MiniOzoneCluster.Service for Recon (apache#8452) e59d251 HDDS-13046. Add .vscode to .gitignore (apache#8461) 50e6d61 HDDS-13022. Split up exclusive size tracking for key and directory cleanup in SnapshotInfo (apache#8433) 6db647d HDDS-12966. DBDefinitionFactory should not throw InvalidArnException (apache#8454) fe2875c HDDS-12948. [Snapshot] Increase `ozone.om.fs.snapshot.max.limit` default value to 10k (apache#8376) 215bdc2 HDDS-11904. Fix HealthyPipelineSafeModeRule logic. (apache#8386) dba7557 HDDS-12989. Throw CodecException for the Codec byte[] methods (apache#8444) d9c73f0 HDDS-11298. Support owner field in the S3 listBuckets request (apache#8315) 7a5129d HDDS-12810. Check and reserve space atomically in VolumeChoosingPolicy (apache#8360) d51ccf7 HDDS-13016. Add a getAllNodeCount() method to NodeManager. (apache#8445) 8b27fb9 HDDS-12299. Merge OzoneAclConfig into OmConfig (apache#8383) 96f4f79 HDDS-12501. Remove OMKeyInfo from SST files in backup directory. (apache#8214) 8adb500 HDDS-13021. Make callId unique for each request in AbstractKeyDeletingService (apache#8432) faeadd6 HDDS-13000. [Snapshot] listSnapshotDiffJobs throws NPE. (apache#8418) 7c957cc HDDS-11954. List keys command gets wrong info from BasicOmKeyInfo (apache#8401) 0697383 HDDS-13027. Workaround for GitHub connection failures (apache#8443) 0d5f933 HDDS-12919. Remove redundant FileLock from ChunkWrite (apache#8435) 395892a HDDS-13012. [Snapshot] Add audit logs for listSnapshotDiffJobs, snapshotDiff and cancelSnapshotDiff. (apache#8425) d92f0ed HDDS-12653. Add option in `ozone debug log container list` to filter by Health State (apache#8415) 78f8a02 HDDS-12645. Improve output of `ozone debug replicas chunk-info` (apache#8146) f32470c HDDS-12995. Split ListOptions for better reuse (apache#8413) cdd1b13 HDDS-13019. Bump jline to 3.30.0 (apache#8428) 1651408 HDDS-13018. Bump common-custom-user-data-maven-extension to 2.0.2 (apache#8429) 8f6222e HDDS-12949. Cleanup SCMSafeModeManager (apache#8390) adb385f HDDS-13002. Use DatanodeID in ContainerBalancerTask (apache#8421) 2b2a796 HDDS-12969. Use DatanodeID in XceiverClientGrpc (apache#8430) 3c0bafa HDDS-13001. Use DatanodeID in SCMBlockDeletingService. (apache#8420) 8448bc2 HDDS-12187. Use hadoop-dependency-client as parent, not dependency (apache#8416) 582fd03 HDDS-13015. Bump zstd-jni to 1.5.7-3 (apache#8426) ec12f98 HDDS-12970. Use DatanodeID in Pipeline (apache#8414) 3f48a18 HDDS-12179. Improve Container Log to also capture the Volume Information (apache#7858) 034b497 HDDS-12964. Add minimum required version of Docker Compose to RunningViaDocker page (apache#8385) c0a6b42 HDDS-12947. Add CodecException (apache#8411) 70949c5 HDDS-11603. Reclaimable Key Filter for Snapshots garbage reclaimation (apache#8392) 08283f3 HDDS-12087. TransactionToDNCommitMap too large causes GC to pause for a long time (apache#8347) f47df78 HDDS-12958. [Snapshot] Add ACL check regression tests for snapshot operations. (apache#8419) 1cc8445 HDDS-12207. Unify output of ozone debug replicas verify checks (apache#8248) f087d0b HDDS-12994. Use DatanodeID in ReconSCMDBDefinition. (apache#8417) d6a7723 HDDS-12776. ozone debug CLI command to list all Duplicate open containers (apache#8409) c78aeb0 HDDS-12951. EC: Log when falling back to reconstruction read (apache#8408) 412f22d HDDS-12959. Eliminate hdds-hadoop-dependency-server (apache#8384) df701dc HDDS-12996. Workaround for Docker Compose concurrent map writes (apache#8412) e6daae4 HDDS-12972. Use DatanodeID in ContainerReplica. (apache#8396) c1103ae HDDS-12877. Support StorageClass field in the S3 HeadObject request (apache#8351) 4135384 HDDS-12973. Add javadoc for CompactionNode() and make getCompactionNodeGraph return ConcurrentMap (apache#8395) 4f2e13c HDDS-12954. Do not throw IOException for checksum. (apache#8387) 49b8fbd HDDS-12971. Use DatanodeID in Node2PipelineMap (apache#8403) d2da18f HDDS-12346. Reduce code duplication among TestNSSummaryTask classes (apache#8287) 82b73e3 HDDS-11856. Set DN state machine thread priority higher than command handler thread. (apache#8253) d4f2734 HDDS-12689. Import BOM for AWS SDK, declare dependencies (apache#8406) 4775e76 HDDS-12975. Fix percentage of blocks deleted in grafana dashboard (apache#8398) ac0d696 HDDS-12968. [Recon] Fix column visibility issue in Derby during schema upgrade finalization. (apache#8393) e71dcf6 HDDS-11981. Add annotation for registering feature validator based on a generic version (apache#7603) 254297c HDDS-12562. Reclaimable Directory entry filter for reclaiming deleted directory entries (apache#8055) 4f467c8 HDDS-12978. Remove TestMultipartObjectGet (apache#8400) a99f207 HDDS-12967. Skip CommonChunkManagerTestCases.testFinishWrite if fuser cannot be started (apache#8389) d29d76b HDDS-12697. Ozone debug CLI to display details of a single container (apache#8264) 1d1bc88 HDDS-12974. Docker could not parse extra host IP (apache#8397) 7e675d7 HDDS-12053. Make print-log-dag command run locally and offline (apache#8016) d3faab3 HDDS-12561. Reclaimable Rename entry filter for reclaiming renaming entries (apache#8054) af1f98c HDDS-10822. Tool to omit raft log in OM. (apache#8154) 3201ca4 HDDS-12952. Make OmSnapshotManager#snapshotLimitCheck thread-safe and consistent (apache#8381) 522c88d HDDS-12963. Clean up io.grpc dependencies (apache#8382) fa8bd9d HDDS-12916. Support ETag in listObjects response (apache#8356) fdc77db HDDS-12300. Merge OmUpgradeConfig into OmConfig (apache#8378) 623e144 HDDS-12956. Bump vite to 4.5.14 (apache#8375) 8c8eaf1 HDDS-12944. Reduce timeout for integration check (apache#8374) 40d2e00 HDDS-11141. Avoid log flood due due pipeline close in XceiverServerRatis (apache#8325) 9fe1dba HDDS-12942. Init layout version config should not be public (apache#8373) 452e7aa HDDS-12596. OM fs snapshot max limit is not enforced (apache#8377) 8b095d5 HDDS-12795. Rename heartbeat and first election configuration name (apache#8249) bee8164 HDDS-12920. Configure log4j to gzip rolled over service log files (apache#8357) e16a50f HDDS-12934. Split submodule for Freon. (apache#8367) b1e9511 HDDS-12925. Update datanode volume used space on container deletion (apache#8364) 440bc82 Revert "HDDS-12596. OM fs snapshot max limit is not enforced (apache#8157)" f345492 HDDS-12596. OM fs snapshot max limit is not enforced (apache#8157) ee7b1dc HDDS-12901. Introduce EventExecutorMetrics instead of setting the metrics props unsafely (apache#8371) 810e148 HDDS-12939. Remove UnknownPipelineStateException. (apache#8372) 560fcdf HDDS-12728. Add Ozone 2.0.0 to compatibility acceptance tests (apache#8361) 5815a47 HDDS-12933. Remove the table names declared in OmMetadataManagerImpl (apache#8370) ee32fa5 HDDS-12560. Reclaimable Filter for Snaphost Garbage Collections (apache#8053) 45374ea HDDS-12932. Rewrite OMDBDefinition (apache#8362) 5cb6dd8 HDDS-12575. Set default JUnit5 timeout via property (apache#8348) 8efc0cd HDDS-11633. Delete message body too large, causing SCM to fail writing raft log (apache#8354) ac9d9fd HDDS-12915. Intermittent failure in testCreatePipelineThrowErrorWithDataNodeLimit (apache#8359) 86039e8 HDDS-12848. Create new submodule for ozone admin (apache#8292) 2d0f8cb HDDS-12833. Remove the CodecRegistry field from DBStoreBuilder (apache#8327) c71b393 HDDS-12921. UnusedPrivateField violations in tests (apache#8353) 9f3dd01 HDDS-12917. cp: option '--update' doesn't allow an argument (apache#8346) a14b395 HDDS-12922. Use OMDBDefinition in GeneratorOm and FSORepairTool (apache#8355) c8a98d6 HDDS-12892. OM Tagging Request incorrectly sets full path as key name for FSO (apache#8345) ade69e3 HDDS-12649. Include name of volume or bucket in length validation error (apache#8322) c68308d HDDS-12599. Create an ozone debug CLI command to list all the containers based on final state (apache#8282) 319d5a4 HDDS-12773. bad substitution in bats test (apache#8290) 6f5e02a HDDS-12900. (addendum: fix pmd) Use OMDBDefinition in OmMetadataManagerImpl (apache#8337) 2a7000d HDDS-12900. Use OMDBDefinition in OmMetadataManagerImpl (apache#8337) 9c0c66c HDDS-12915. Mark testCreatePipelineThrowErrorWithDataNodeLimit as flaky a73e052 HDDS-12907. Enable FieldDeclarationsShouldBeAtStartOfClass PMD rule (apache#8344) d083f82 HDDS-12905. Move field declarations to start of class in ozone-common (apache#8342) 3f90e1c HDDS-12906. Move field declarations to start of class in ozone-manager module (apache#8343) 403fb97 HDDS-12878. Move field declarations to start of class in tests (apache#8308) 63d5c73 HDDS-12912. Remove deprecated `PipelineManager#closePipeline(Pipeline, boolean)` (apache#8340) cf1fb88 HDDS-12902. Shutdown executor in CloseContainerCommandHandler and ECReconstructionCoordinator (apache#8341) 825ba02 HDDS-9585. Improve import/export log in ContainerLogger (apache#8330) b70d35a HDDS-12889. Enable AppendCharacterWithChar PMD rule (apache#8324) cd308ea HDDS-12904. Move field declarations to start of class in other hdds modules (apache#8336) 4905286 HDDS-12899. Move field declarations to start of class in hdds-server-scm (apache#8332) Conflicts: hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/utils/ContainerLogger.java hadoop-hdds/container-service/src/test/java/org/apache/hadoop/ozone/container/keyvalue/TestKeyValueHandler.java hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/ContainerReplica.java hadoop-ozone/cli-admin/src/main/java/org/apache/hadoop/hdds/scm/cli/container/ReconcileSubcommand.java hadoop-ozone/recon-codegen/src/main/java/org/apache/ozone/recon/schema/ContainerSchemaDefinition.java
…anner-builds-mt * HDDS-10239-container-reconciliation: (433 commits) Revert "HDDS-11232. Spare InfoBucket RPC call for the FileSystem#getFileStatus calls for the general case. (apache#6988)" (apache#8358) HDDS-12993. Use DatanodeID in ReconNodeManager. (apache#8456) HDDS-13024. HTTP connect to 0.0.0.0 failed (apache#8439) HDDS-12914. Bump awssdk to 2.31.40, test ResumableFileDownload (apache#8455) HDDS-4677. Document Ozone Ports and Connection End Points (apache#8226) HDDS-13044. Remove DatanodeDetails#getUuid usages from hdds-common/client/container-service (apache#8462) HDDS-12568. Implement MiniOzoneCluster.Service for Recon (apache#8452) HDDS-13046. Add .vscode to .gitignore (apache#8461) HDDS-13022. Split up exclusive size tracking for key and directory cleanup in SnapshotInfo (apache#8433) HDDS-12966. DBDefinitionFactory should not throw InvalidArnException (apache#8454) HDDS-12948. [Snapshot] Increase `ozone.om.fs.snapshot.max.limit` default value to 10k (apache#8376) HDDS-11904. Fix HealthyPipelineSafeModeRule logic. (apache#8386) HDDS-12989. Throw CodecException for the Codec byte[] methods (apache#8444) HDDS-11298. Support owner field in the S3 listBuckets request (apache#8315) HDDS-12810. Check and reserve space atomically in VolumeChoosingPolicy (apache#8360) HDDS-13016. Add a getAllNodeCount() method to NodeManager. (apache#8445) HDDS-12299. Merge OzoneAclConfig into OmConfig (apache#8383) HDDS-12501. Remove OMKeyInfo from SST files in backup directory. (apache#8214) HDDS-13021. Make callId unique for each request in AbstractKeyDeletingService (apache#8432) HDDS-13000. [Snapshot] listSnapshotDiffJobs throws NPE. (apache#8418) ...
…hoosingPolicy (apache#8360) (cherry picked from commit 7a5129d) Conflicts: hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/impl/ContainerData.java hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/volume/RoundRobinVolumeChoosingPolicy.java hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/ozoneimpl/ContainerReader.java hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/replication/ContainerImporter.java hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/replication/DownloadAndImportReplicator.java hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/replication/SendContainerRequestHandler.java hadoop-hdds/container-service/src/test/java/org/apache/hadoop/ozone/container/common/volume/TestRoundRobinVolumeChoosingPolicy.java hadoop-hdds/container-service/src/test/java/org/apache/hadoop/ozone/container/keyvalue/TestKeyValueContainer.java hadoop-hdds/container-service/src/test/java/org/apache/hadoop/ozone/container/keyvalue/TestKeyValueHandler.java hadoop-hdds/container-service/src/test/java/org/apache/hadoop/ozone/container/ozoneimpl/TestContainerReader.java hadoop-hdds/container-service/src/test/java/org/apache/hadoop/ozone/container/replication/TestReplicationSupervisor.java hadoop-hdds/container-service/src/test/java/org/apache/hadoop/ozone/container/replication/TestSendContainerRequestHandler.java




What changes were proposed in this pull request?
If
VolumeChoosingPolicy#chooseVolumeis not synchronized and caller required space reserved before chooseVolume returns, there is no easy way to guarantee chosen volume will have enough space.VolumeChoosingPolicy#chooseVolumeis used for create container and container import. MakeVolumeChoosingPolicy#chooseVolumesynchronized or part of chooseVolume synchronized will add some latency, which is negligible for container import, may noticeable for create container. We can compare the create container latency metrics with and without the synchronization.VolumeChoosingPolicy#chooseVolume, its Java Doc says "The implementations of this interface must be thread-safe.", regarding the space full check and space reservation, it can be done as an atomic operation in the chooseVolume internally, so that there will no over allocation of space due to concurrent container creation and container import.Also need to add test to check if committedBytes is checked in container creation unit tests and container import, container replication unit tests. Besides, we also need the unit tests to verify that if the container creation fails, container import fails, or container replication fails, the resevered committedBytes are released.
What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-12810
How was this patch tested?
Added coverage of commit space would be reserved/released correctly when creating/importing containers in
TestKeyValueContainer,TestDownloadAndImportReplicatorandTestSendContainerRequestHandler. There're also some existing tests covering this change, likeTestContainerPerisistanceandTestOzoneContainer.CI:
https://github.com/peterxcli/ozone/actions/runs/14911933939https://github.com/peterxcli/ozone/actions/runs/15001277972