HDDS-5252. PipelinePlacementPolicy filter out datanodes with not enough space. #2271

guihecheng · 2021-05-20T05:05:51Z

What changes were proposed in this pull request?

Pipeline placement policy should filter out those datanodes that do not even have enough space for a single container, otherwise even if the pipeline creation succeeds, a datanode may quickly report that it fails to create a container and there will be successive failures because we will choose that node again and again.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-5252

How was this patch tested?

a new ut.
manual test.

guihecheng · 2021-05-20T06:59:01Z

@bshashikant @ChenSammi please help review this one, thanks~

…gh space.

GlenGeng-awx · 2021-05-24T07:02:32Z

Thanks @guihecheng for the fix!
I agree with the necessity of the solution. My concern here is the sizeRequired which is 5GB by default, which seems to be not quite viable for a production cluster.

How about adjust it to base on usage percent of the Datanode? For example, a Datanode with usage great then 80% is not a good candidate for new pipeline.

guihecheng · 2021-05-24T07:29:42Z

@GlenGeng Thanks for your comments, I understand your concern, but I think it is a bit different here.
A datanode with usage > 80% is not a good candidate indeed, and should have a lower priority to be chosen, but it should still have a chance to be chosen. There is a possibility that all datanodes have > 80% space in the cluster, then if we filter out them completely then we don't have a writable cluster while actually there are ~20% space left.

Here we try to solve the problem that if there not not even space for a single container then that datanode should not be chosen anyway.
Maybe we could implement a new placement policy for pipelines that have a priority defined according to available space of datanodes to solve the problem you mentioned ?

GlenGeng-awx · 2021-05-24T08:33:05Z

Thanks @guihecheng for the clarificaiton.

One concern about the solution based on sizeRequired is, assume there are 3 DN dockers for test, each has 1GB storage space, will scm be able to create a pipeline from them ?

guihecheng · 2021-05-24T08:40:58Z

@GlenGeng oh, I think not, after this one we should have to set a proper container size for test deploy.

bshashikant

I think the solution is reasonable. Any container allocation will fail at datanode if it does not have space equal to container size anyways. Its better to to not choose such nodes while allocating a new container.
For docker tests, i think the container size is already set to a lower threshold but we can double check on the same.

guihecheng · 2021-05-24T10:18:43Z

@bshashikant Ah, just now I've a talk with @GlenGeng , he suggested that we could do better here.

We could make the sizeRequired to a new config item such as 'pipeline.reserved.space', may be default to 0.
So, with this new item, we could do the following things:

use default, then we keep the same behavior as before, so users/developers who just want to do small tests or try out ozone will not be annoyed by some warning says that they have no space for pipeline creation.
config it to some value for space reservation (should better be > container size), so then we could solve the problem here, too.
Moreover, when we have this 'reserved.space' more than container size, say 20% of the volume capacity, then we can filter out the nodes with all its volumes with usage > 80%, which may not be a good choice for pipeline allocation in production as Glen mentioned.
It seems like we could got overall controll over the space usage of the cluster with this new config item.
And this should not be conflict with the datanode config 'du.reserved', since its use case is that we have multiple consumers for a single disk.

bshashikant · 2021-05-24T11:27:05Z

@bshashikant Ah, just now I've a talk with @GlenGeng , he suggested that we could do better here.

We could make the sizeRequired to a new config item such as 'pipeline.reserved.space', may be default to 0.
So, with this new item, we could do the following things:

use default, then we keep the same behavior as before, so users/developers who just want to do small tests or try out ozone will not be annoyed by some warning says that they have no space for pipeline creation.

config it to some value for space reservation (should better be > container size), so then we could solve the problem here, too.
Moreover, when we have this 'reserved.space' more than container size, say 20% of the volume capacity, then we can filter out the nodes with all its volumes with usage > 80%, which may not be a good choice for pipeline allocation in production as Glen mentioned.
It seems like we could got overall controll over the space usage of the cluster with this new config item.
And this should not be conflict with the datanode config 'du.reserved', since its use case is that we have multiple consumers for a single disk.

I guess, we need to be more precise here. A container creation won't succeed on a datanode unless it has remaining space >= container size as there is hard check there(check RoundRobinVolumeChoosingPolicy#chooseVolume()). Its better to to not choose such datanodes upfront during pipeline allocation in any case.

The pipeline reserved space notion should consider ratis volumes size specifically as well as so as to ensure not allocate any new pipelines on nodes which are very low on ratis log disk space, and should have enough space on all constituent datanodes of the pipeline fo the existing pipelines to get closed as closing a pipeline takes a snapshot and does IO on the raft log disk as well.

This patch itself looks good to me and can be committed and pipeline reserve semantic can be then built on top of this.

guihecheng · 2021-05-24T12:39:45Z

@bshashikant Ah, just now I've a talk with @GlenGeng , he suggested that we could do better here.
We could make the sizeRequired to a new config item such as 'pipeline.reserved.space', may be default to 0.
So, with this new item, we could do the following things:

use default, then we keep the same behavior as before, so users/developers who just want to do small tests or try out ozone will not be annoyed by some warning says that they have no space for pipeline creation.

config it to some value for space reservation (should better be > container size), so then we could solve the problem here, too.
Moreover, when we have this 'reserved.space' more than container size, say 20% of the volume capacity, then we can filter out the nodes with all its volumes with usage > 80%, which may not be a good choice for pipeline allocation in production as Glen mentioned.
It seems like we could got overall controll over the space usage of the cluster with this new config item.
And this should not be conflict with the datanode config 'du.reserved', since its use case is that we have multiple consumers for a single disk.

I guess, we need to be more precise here. A container creation won't succeed on a datanode unless it has remaining space >= container size as there is hard check there(check RoundRobinVolumeChoosingPolicy#chooseVolume()). Its better to to not choose such datanodes upfront during pipeline allocation in any case.

Sure, even if we have the proposed config item, we should ensure that if configed value < container size, we should drop it and use container size instead(with a warning log).

The pipeline reserved space notion should consider ratis volumes size specifically as well as so as to ensure not allocate any new pipelines on nodes which are very low on ratis log disk space, and should have enough space on all constituent datanodes of the pipeline fo the existing pipelines to get closed as closing a pipeline takes a snapshot and does IO on the raft log disk as well.

Yes, ratis log disk should be taken into consideration.

This patch itself looks good to me and can be committed and pipeline reserve semantic can be then built on top of this.

Ok, then I'm not going to update this but build another pr for the proposed solution.
Thanks very much @bshashikant

…ing-upgrade-master-merge * upstream/master: (76 commits) HDDS-5280. Make XceiverClientManager creation when necessary in ContainerOperationClient (apache#2289) HDDS-5272. Make ozonefs.robot execution repeatable (apache#2280) HDDS-5123. Use the pre-created apache/ozone-testkrb5 image during secure acceptance tests (apache#2165) HDDS-4993. Add guardrail for reserved buffer size when DN reads a chunk (apache#2058) HDDS-4936. Change ozone groupId from org.apache.hadoop to org.apache.ozone (apache#2018) HDDS-4043. allow deletion from Trash directory without -skipTrash option (apache#2110) HDDS-4927. Determine over and under utilized datanodes in Container Balancer. (apache#2230) HDDS-5273. Handle unsecure cluster convert to secure cluster for SCM. (apache#2281) HDDS-5158. Add documentation for SCM HA Security. (apache#2205) HDDS-5275. Datanode Report Publisher publishes one extra report after DN shutdown (apache#2283) HDDS-5241. SCM UI should have leader/follower and Primordial SCM information (apache#2260) HDDS-5219. Limit number of bad volumes by dfs.datanode.failed.volumes.tolerated. (apache#2243) HDDS-5252. PipelinePlacementPolicy filter out datanodes with not enough space. (apache#2271) HDDS-5191. Increase default pvc storage size (apache#2219) HDDS-5073. Use ReplicationConfig on client side (apache#2136) HDDS-5250. Build integration tests with Maven cache (apache#2269) HDDS-5236. Require block token for more operations (apache#2254) HDDS-5266 Misspelt words in S3MultipartUploadCommitPartRequest.java line 202 (apache#2279) HDDS-5249. Race Condition between Full and Incremental Container Reports (apache#2268) HDDS-5142. Make generic streaming client/service for container re-replication, data read, scm/om snapshot download (apache#2256) ... Conflicts: hadoop-hdds/common/src/main/java/org/apache/hadoop/hdds/scm/protocol/StorageContainerLocationProtocol.java hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/scm/protocolPB/StorageContainerLocationProtocolClientSideTranslatorPB.java hadoop-hdds/interface-admin/src/main/proto/ScmAdminProtocol.proto hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/server/StorageContainerManager.java hadoop-hdds/server-scm/src/test/java/org/apache/hadoop/hdds/scm/container/MockNodeManager.java hadoop-ozone/dist/src/main/compose/testlib.sh hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/TestStorageContainerManager.java hadoop-ozone/interface-client/src/main/proto/OmClientProtocol.proto hadoop-ozone/ozone-manager/pom.xml hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OzoneManager.java hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/ratis/utils/OzoneManagerRatisUtils.java hadoop-ozone/s3gateway/pom.xml

HDDS-5252. PipelinePlacementPolicy filter out datanodes with not enou…

7f9baef

…gh space.

guihecheng force-pushed the HDDS-5252 branch from e4cef80 to 7f9baef Compare May 21, 2021 04:01

bshashikant reviewed May 24, 2021

View reviewed changes

bshashikant approved these changes May 26, 2021

View reviewed changes

bshashikant merged commit 000e7c5 into apache:master May 26, 2021

guihecheng deleted the HDDS-5252 branch May 26, 2021 06:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

HDDS-5252. PipelinePlacementPolicy filter out datanodes with not enough space. #2271

HDDS-5252. PipelinePlacementPolicy filter out datanodes with not enough space. #2271

Uh oh!

guihecheng commented May 20, 2021

Uh oh!

guihecheng commented May 20, 2021

Uh oh!

GlenGeng-awx commented May 24, 2021

Uh oh!

guihecheng commented May 24, 2021

Uh oh!

GlenGeng-awx commented May 24, 2021

Uh oh!

guihecheng commented May 24, 2021

Uh oh!

bshashikant left a comment •

edited

Loading

Uh oh!

guihecheng commented May 24, 2021

Uh oh!

bshashikant commented May 24, 2021

Uh oh!

guihecheng commented May 24, 2021 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

HDDS-5252. PipelinePlacementPolicy filter out datanodes with not enough space. #2271

HDDS-5252. PipelinePlacementPolicy filter out datanodes with not enough space. #2271

Uh oh!

Conversation

guihecheng commented May 20, 2021

What changes were proposed in this pull request?

What is the link to the Apache JIRA

How was this patch tested?

Uh oh!

guihecheng commented May 20, 2021

Uh oh!

GlenGeng-awx commented May 24, 2021

Uh oh!

guihecheng commented May 24, 2021

Uh oh!

GlenGeng-awx commented May 24, 2021

Uh oh!

guihecheng commented May 24, 2021

Uh oh!

bshashikant left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

guihecheng commented May 24, 2021

Uh oh!

bshashikant commented May 24, 2021

Uh oh!

guihecheng commented May 24, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

bshashikant left a comment •

edited

Loading

guihecheng commented May 24, 2021 •

edited

Loading