HDDS-8254. Close containers when volume reaches utilisation threshold #4583

sadanand48 · 2023-04-18T08:47:04Z

What changes were proposed in this pull request?

Close containers when volume reaches utilisation threshold, If volume is configured with a reserved space, the softlimit would hit when (capacity-reserved) - used <= minFreeSpaceOnVolume.
By default the value is 5GB.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-8254

How was this patch tested?

Unit tests

Send close container Action when volume reaches threshold
Make Replication manager aware of this volume utilisation limit (No change required here as both PushReplicator and DownloadAndImportReplicator when replicating a container to a target datanode respect the VolumeChoosingPolicy which checks if volume has enough space or not , to place this container there and if there is no space available (required space < available (includes reserved space) ) it doesn't choose the volume

sumitagrawl

@sadanand48 Thanks for working over this, IMO,
Currently, this PR, flow is,

if condition met, it notify close of container to SCM, SCM will trigger close after some time
and continue to write
Where,
hdds.datanode.du.reserved -- will always create new container and proceed for write,

So this will have impact:

create many small containers
client may face failure of closed container

But still write will happen till hdds.datanode.du.reserved is met for allocating container, and this property will not provide much value in this regards.

sadanand48 · 2023-04-25T06:30:52Z

Thanks @sumitagrawl for the review, I have now aligned the config to take effect even during container allocation on a volume during its creation. So the flow would be

Volume reaches this threshold usage ratio
Any further writes on these containers would cause the DN to send container action to the SCM.
SCM would receive the action and close these containers.
New client writes will trigger container creation and the VolumeChoosingPolicy would make sure that these volumes are not selected and check this property there too. This way the problem of small containers getting created is eliminated.

client may face failure of closed container

On client retry new containers should be allocated in this case.

siddhantsangwan · 2023-04-25T07:16:34Z

For bigger volumes like 20TB, the default soft limit of 0.9 still leaves 2TB of space that's available for writes. I'm wondering if it's better to define the limit in a different format such as the raw available capacity - something like 5GB?

ChenSammi · 2023-04-25T12:33:36Z

Hi @sadanand48 , can we reuse the ""hdds.datanode.storage.utilization.critical.threshold" property?

sadanand48 · 2023-04-26T06:38:11Z

For bigger volumes like 20TB, the default soft limit of 0.9 still leaves 2TB of space that's available for writes. I'm wondering if it's better to define the limit in a different format such as the raw available capacity - something like 5GB?

Thanks @siddhantsangwan for the comment. This makes sense, I have updated my patch to use capacity instead of percentage. Please take a look.

sadanand48 · 2023-04-26T06:42:28Z

can we reuse the ""hdds.datanode.storage.utilization.critical.threshold" property?

Thanks @ChenSammi for the comment, Now that I have changed the current patch to use capacity, should I still change it to use this property as hdds.datanode.storage.utilization.critical.threshold takes a float that represents percentage.
Also another thing I noticed is that this property is redundant and not used anywhere except to filter output in the SCM JMX , If we don't rename it here , I feel we can remove this property as it can be misleading as configuring it does nothing.

ChenSammi · 2023-04-26T07:11:16Z

can we reuse the ""hdds.datanode.storage.utilization.critical.threshold" property?

Thanks @ChenSammi for the comment, Now that I have changed the current patch to use capacity, should I still change it to use this property as hdds.datanode.storage.utilization.critical.threshold takes a float that represents percentage. Also another thing I noticed is that this property is redundant and not used anywhere except to filter output in the SCM JMX , If we don't rename it here , I feel we can remove this property as it can be misleading as configuring it does nothing.

Hi @sadanand48 , we already have following properties in Ozone now.

hdds.datanode.dir.du.reserved //storage space
hdds.datanode.dir.du.reserved.percent // storage percentage
hdds.datanode.storage.utilization.warning.threshold // threshold
hdds.datanode.storage.utilization.critical.threshold // threshold

From user's point of view, "hdds.datanode.volume.min.free.space" looks very similar to "hdds.datanode.dir.du.reserved" functionally, like another kind of reserved space. Maybe we can use "hdds.datanode.dir.du.reserved" directly? Currently the default value of "hdds.datanode.dir.du.reserved" and " hdds.datanode.dir.du.reserved.percent" are 0. We can change their default value(5GB, like in this patch, and 0.95 for percent), what do you think?

sadanand48 · 2023-04-26T08:38:39Z

From user's point of view, "hdds.datanode.volume.min.free.space" looks very similar to "hdds.datanode.dir.du.reserved" functionally, like another kind of reserved space.

I just gave this a thought and realised that we had introduced this property because the flow is such that

Datanode runs DU/DF periodically and stores the disk stats to a cache , so at any point of time the stats might be behind the actual usage,( although this cache is updated on every write chunk so block data would be accounted in the space but it won't account for rocksdb/raft log metadata)
If we only configure a default for the reserved space, containers will close when usage crosses reserved space. There might be a delay in SCM receiving the close container action and asking the DN's in the pipeline to close it, during this delay client may still write to that container and violate the reserve space

If we are okay with crossing reserved space by a little then we can set a default for reserved space, else we need to have a small buffer like what is defined in this PR , before the reserved space is reached.

ChenSammi · 2023-04-28T07:52:24Z

From user's point of view, "hdds.datanode.volume.min.free.space" looks very similar to "hdds.datanode.dir.du.reserved" functionally, like another kind of reserved space.

I just gave this a thought and realised that we had introduced this property because the flow is such that
1. Datanode runs DU/DF periodically and stores the disk stats to a cache , so at any point of time the stats might be behind the actual usage,( although this cache is updated on every write chunk so block data would be accounted in the space but it won't account for rocksdb/raft log metadata)

2. If we only configure a default for the reserved space, containers will close when usage crosses reserved space. There might be a delay in SCM receiving the close container action and asking the DN's in the pipeline to close it, during this delay client may still write to that container  and violate the reserve space
If we are okay with crossing reserved space by a little then we can set a default for reserved space, else we need to have a small buffer like what is defined in this PR , before the reserved space is reached.

Had a offline discussion with @sadanand48 , here are the agreed points,

This patch can help to reduce the possibility of run out of disk issue on DN.
Container Quota management(lease) will help to prevent DN from out of disk from SCM side. @sumitagrawl .
Besides the container block data, there are RocksDB directory and Ratis directory which are out of track about their storage usage.
Metrics of RocksDB directory usage and Ratis directory usage will help to estimate how much space should be reserved for them to avoid the DN out of space.

siddhantsangwan · 2023-04-28T07:55:12Z

@sadanand48 is this PR ready for review now or are you planning to push more commits?

...iner-service/src/main/java/org/apache/hadoop/ozone/container/common/impl/HddsDispatcher.java

…into HDDS-8254

sadanand48 · 2023-05-02T10:35:06Z

@siddhantsangwan , This is ready for review now, I have added both options for percentage and value and the user can choose any.

siddhantsangwan

@sadanand48 Thanks for working on this. In general, I think we should use available space instead of capacity - used. This is because when reserved space hasn't been configured, capacity - used just means subtracting the space used by ozone from the total capacity. It doesn't take into account space used by other applications besides ozone. VolumeInfo#getAvailable will take this into account so it's likely to be accurate. What do you think?

...-service/src/test/java/org/apache/hadoop/ozone/container/common/impl/TestHddsDispatcher.java

sadanand48 · 2023-05-08T08:38:40Z

I think we should use available space instead of capacity - used

Thanks, Good catch, updated the patch.

sumitagrawl

@sadanand48 LGTM +1

siddhantsangwan

LGTM, thanks for the work @sadanand48

ChenSammi · 2023-05-16T08:38:42Z

...iner-service/src/main/java/org/apache/hadoop/ozone/container/common/impl/HddsDispatcher.java

+      long volumeFreeSpaceToSpare =
+          VolumeUsage.getMinVolumeFreeSpace(conf, volumeCapacity);
+      long volumeAvailable = volume.getAvailable();
+      return (volumeAvailable <= volumeFreeSpaceToSpare);


@sadanand48 , "- vol.getCommittedBytes()" is missing here.

Thanks @sadanand48 . The last patch LGTM, +1.

…apache#4583)

* master: (78 commits) HDDS-8575. Intermittent failure in TestCloseContainerEventHandler.testCloseContainerWithDelayByLeaseManager (apache#4688) HDDS-7241. EC: Reconstruction could fail with orphan blocks. (apache#4718) HDDS-8577. [Snapshot] Disable compaction log when loading metadata for snapshot (apache#4697) HDDS-7080. EC: Offline reconstruction needs better logging (apache#4719) HDDS-8626. Config thread pool in ReplicationServer (apache#4715) HDDS-8616. Underreplication not fixed if all replicas start decommissioning (apache#4711) HDDS-8254. Close containers when volume reaches utilisation threshold (apache#4583) HDDS-8254. Close containers when volume reaches utilisation threshold (apache#4583) HDDS-8615. Explicitly show EC block type in 'ozone debug chunkinfo' command output (apache#4706) HDDS-8623. Delete duplicate getBucketInfo in OMKeyCommitRequest (apache#4712) HDDS-8339. Recon Show the number of keys marked for Deletion in Recon UI. (apache#4519) HDDS-8572. Support CodecBuffer for protobuf v3 codecs. (apache#4693) HDDS-8010. Improve DN warning message when getBlock does not find the block. (apache#4698) HDDS-8621. IOException is never thrown in SCMRatisServer.getRatisRoles(). (apache#4710) HDDS-8463. S3 key uniqueness in deletedTable (apache#4660) HDDS-8584. Hadoop client write slowly when stream enabled (apache#4703) HDDS-7732. EC: Verify block deletion from missing EC containers (apache#4705) HDDS-8581. Avoid random ports in integration tests (apache#4699) HDDS-8504. ReplicationManager: Pass used and excluded node separately for Under and Mis-Replication (apache#4694) HDDS-8576. Close RocksDB instance in RDBStore if RDBStore's initialization fails after RocksDB instance creation (apache#4692) ...

…apache#4583) (cherry picked from commit ab5265b)

sadanand48 marked this pull request as ready for review April 21, 2023 06:42

sadanand48 requested review from errose28 and nandakumar131 April 21, 2023 08:12

sumitagrawl reviewed Apr 21, 2023

View reviewed changes

sadanand48 marked this pull request as draft April 21, 2023 17:46

Sadanand Shenoy added 5 commits April 22, 2023 00:47

HDDS-8254.Close containers when volume reaches utilisation threshold

520d36b

add test

59f9a1a

use same config during container allocates

2926b04

use same config during container allocates

40b2d67

fix tests

c0a2520

sadanand48 force-pushed the HDDS-8254 branch from 2b1af68 to b5af63b Compare April 21, 2023 19:18

fix build failure

cb137c7

sadanand48 force-pushed the HDDS-8254 branch from b5af63b to cb137c7 Compare April 21, 2023 19:56

sadanand48 marked this pull request as ready for review April 24, 2023 06:34

Use fixed size instead of percentage

ec43755

ChenSammi reviewed Apr 28, 2023

View reviewed changes

...iner-service/src/main/java/org/apache/hadoop/ozone/container/common/impl/HddsDispatcher.java Outdated Show resolved Hide resolved

Sadanand Shenoy added 3 commits May 2, 2023 15:53

keep both value and percent configs

214c040

Merge branch 'HDDS-8254' of https://github.com/sadanand48/hadoop-ozone …

e3da21b

…into HDDS-8254

float->long

1f48e7d

fix tests

0ea5352

siddhantsangwan self-requested a review May 3, 2023 11:18

siddhantsangwan reviewed May 4, 2023

View reviewed changes

...-service/src/test/java/org/apache/hadoop/ozone/container/common/impl/TestHddsDispatcher.java Outdated Show resolved Hide resolved

use vol.getAvailable() instead

37f4d9e

sadanand48 requested a review from siddhantsangwan May 9, 2023 06:41

sumitagrawl approved these changes May 9, 2023

View reviewed changes

siddhantsangwan approved these changes May 9, 2023

View reviewed changes

Sadanand Shenoy added 4 commits May 15, 2023 13:59

address comments

611498e

address comments

ffe40a2

address comments

69fcfa4

fix tests

de26ae1

adoroszlai marked this pull request as draft May 15, 2023 13:09

ChenSammi marked this pull request as ready for review May 16, 2023 06:23

ChenSammi reviewed May 16, 2023

View reviewed changes

subtract committed space

3a6b68e

ChenSammi merged commit 03aec4d into apache:master May 16, 2023

ivanzlenko pushed a commit to ivanzlenko/ozone that referenced this pull request May 16, 2023

HDDS-8254. Close containers when volume reaches utilisation threshold (…

ab5265b

…apache#4583)

k5342 pushed a commit to pfnet/ozone that referenced this pull request Aug 14, 2023

HDDS-8254. Close containers when volume reaches utilisation threshold (…

95dab7d

…apache#4583) (cherry picked from commit ab5265b)

kuenishi pushed a commit to pfnet/ozone that referenced this pull request Aug 14, 2023

HDDS-8254. Close containers when volume reaches utilisation threshold (…

5ea59d2

…apache#4583) (cherry picked from commit ab5265b)

HDDS-8254. Close containers when volume reaches utilisation threshold #4583

HDDS-8254. Close containers when volume reaches utilisation threshold #4583

Uh oh!

Conversation

sadanand48 commented Apr 18, 2023 • edited by ashishkumar50 Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

What is the link to the Apache JIRA

How was this patch tested?

Uh oh!

sumitagrawl left a comment

Choose a reason for hiding this comment

Uh oh!

sadanand48 commented Apr 25, 2023

Uh oh!

siddhantsangwan commented Apr 25, 2023

Uh oh!

ChenSammi commented Apr 25, 2023

Uh oh!

sadanand48 commented Apr 26, 2023

Uh oh!

sadanand48 commented Apr 26, 2023

Uh oh!

ChenSammi commented Apr 26, 2023

Uh oh!

sadanand48 commented Apr 26, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ChenSammi commented Apr 28, 2023

Uh oh!

siddhantsangwan commented Apr 28, 2023

Uh oh!

Uh oh!

sadanand48 commented May 2, 2023

Uh oh!

siddhantsangwan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

sadanand48 commented May 8, 2023

Uh oh!

sumitagrawl left a comment

Choose a reason for hiding this comment

Uh oh!

siddhantsangwan left a comment

Choose a reason for hiding this comment

Uh oh!

ChenSammi May 16, 2023

Choose a reason for hiding this comment

Uh oh!

sadanand48 May 16, 2023

Choose a reason for hiding this comment

Uh oh!

ChenSammi May 16, 2023

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

sadanand48 commented Apr 18, 2023 •

edited by ashishkumar50

Loading

sadanand48 commented Apr 26, 2023 •

edited

Loading