Skip to content

Conversation

@vtutrinov
Copy link
Contributor

What changes were proposed in this pull request?

The algorithm of detection of the datanode availability for a new container was different between the SCM and datanodes in case of EC buckets. The SCM doesn't take into account the committed bytes count and serves the client with an invalid pipeline and then the datanodes from the provided pipeline will fail to create new containers

The PR proposes a common way of computing the datanode availability for a new container allocation

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-9807

How was this patch tested?

  • manually in a way described in the jira ticket,
  • new unit test that checks the datanode is not available to allocate a new container in case of increasing committed bytes count

@adoroszlai
Copy link
Contributor

@xichen01 please take a look

@adoroszlai adoroszlai changed the title HDDS-9807. Take into account the hdds volume committed bytes on detecting the datanode availability to allocate a new container on SCM side HDDS-9807. Consider volume committed space when checking if datanode can host new container Dec 4, 2023

long getScmUsed();

long getRemaining();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vtutrinov @adoroszlai
What is the definition of remaining, I noticed that remaining is actually Volume available. available=capacity - reserved - used.

  • So does committed have to be reported to the SCM separately.
  • Or is it added to remaining, which is defined as the amount of space on the Volume that is guaranteed to be available for data storage? available=capacity - reserved - used - commited.
    We use available in a lot of places, and it seems like we should take commited into available.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems it makes sense, I will try to check the approach

Copy link
Contributor Author

@vtutrinov vtutrinov Dec 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xichen01
I've pushed a new commit with the proposed approach and the build is green, as such as the manual test from the jira ticket. But I'm concerned about the datanode storage metrics (output of the ozone admin datanode usageinfo command) and recon's reports - from the user's perspective it will seen contr ordinal:

  • I've written a 200KiB key to the EC-bucket with rs-6-3-1024k replication (the cluster has 10 datanodes with 2Gb storages mounted to /data)
  • datanodes report before the key was written:
Usage Information (1 Datanodes)

UUID         : 8ac06e73-a434-4fd3-9be5-bc271b8f6bf7 
IP Address   : 172.23.0.15 
Hostname     : ozone-datanode10-1.ozone_default 
Capacity     : 2147483648 B (2 GB) 
Total Used   : 31571968 B (30.11 MB) 
Total Used % : 1.47% 
Ozone Used   : 0 B (0 B) 
Ozone Used % : 0.00% 
Remaining    : 2115911680 B (1.97 GB) 
Remaining %  : 98.53% 
Container(s) : 0 

Usage Information (1 Datanodes)

UUID         : 9e3bf76a-8c98-4fa3-b034-c68973d7560b 
IP Address   : 172.23.0.10 
Hostname     : ozone-datanode3-1.ozone_default 
Capacity     : 2147483648 B (2 GB) 
Total Used   : 31571968 B (30.11 MB) 
Total Used % : 1.47% 
Ozone Used   : 0 B (0 B) 
Ozone Used % : 0.00% 
Remaining    : 2115911680 B (1.97 GB) 
Remaining %  : 98.53% 
Container(s) : 0 

Usage Information (1 Datanodes)

UUID         : a5df4b7d-92b3-407a-b56d-aaeb924f0cd6 
IP Address   : 172.23.0.7 
Hostname     : ozone-datanode2-1.ozone_default 
Capacity     : 2147483648 B (2 GB) 
Total Used   : 27365376 B (26.10 MB) 
Total Used % : 1.27% 
Ozone Used   : 0 B (0 B) 
Ozone Used % : 0.00% 
Remaining    : 2120118272 B (1.97 GB) 
Remaining %  : 98.73% 
Container(s) : 0 

Usage Information (1 Datanodes)

UUID         : e5a630fc-ce77-4211-89dc-15ab0ca67fde 
IP Address   : 172.23.0.6 
Hostname     : ozone-datanode8-1.ozone_default 
Capacity     : 2147483648 B (2 GB) 
Total Used   : 31571968 B (30.11 MB) 
Total Used % : 1.47% 
Ozone Used   : 0 B (0 B) 
Ozone Used % : 0.00% 
Remaining    : 2115911680 B (1.97 GB) 
Remaining %  : 98.53% 
Container(s) : 0 

Usage Information (1 Datanodes)

UUID         : 2d00f163-7f22-4395-b82b-e8353c74698b 
IP Address   : 172.23.0.5 
Hostname     : ozone-datanode7-1.ozone_default 
Capacity     : 2147483648 B (2 GB) 
Total Used   : 31571968 B (30.11 MB) 
Total Used % : 1.47% 
Ozone Used   : 0 B (0 B) 
Ozone Used % : 0.00% 
Remaining    : 2115911680 B (1.97 GB) 
Remaining %  : 98.53% 
Container(s) : 0 

Usage Information (1 Datanodes)

UUID         : 3a22dc09-acc5-4270-8007-135056031ed9 
IP Address   : 172.23.0.12 
Hostname     : ozone-datanode9-1.ozone_default 
Capacity     : 2147483648 B (2 GB) 
Total Used   : 31571968 B (30.11 MB) 
Total Used % : 1.47% 
Ozone Used   : 0 B (0 B) 
Ozone Used % : 0.00% 
Remaining    : 2115911680 B (1.97 GB) 
Remaining %  : 98.53% 
Container(s) : 0 

Usage Information (1 Datanodes)

UUID         : 4478a423-0416-429e-b41c-5e58b8ec7ed5 
IP Address   : 172.23.0.16 
Hostname     : ozone-datanode5-1.ozone_default 
Capacity     : 2147483648 B (2 GB) 
Total Used   : 31571968 B (30.11 MB) 
Total Used % : 1.47% 
Ozone Used   : 0 B (0 B) 
Ozone Used % : 0.00% 
Remaining    : 2115911680 B (1.97 GB) 
Remaining %  : 98.53% 
Container(s) : 0 

Usage Information (1 Datanodes)

UUID         : 5ffda4af-e5dd-4b11-a945-7fef0b53356b 
IP Address   : 172.23.0.17 
Hostname     : ozone-datanode1-1.ozone_default 
Capacity     : 2147483648 B (2 GB) 
Total Used   : 31571968 B (30.11 MB) 
Total Used % : 1.47% 
Ozone Used   : 0 B (0 B) 
Ozone Used % : 0.00% 
Remaining    : 2115911680 B (1.97 GB) 
Remaining %  : 98.53% 
Container(s) : 0 

Usage Information (1 Datanodes)

UUID         : 6fe5b830-5031-4a4c-a8fa-ae909c55d187 
IP Address   : 172.23.0.3 
Hostname     : ozone-datanode6-1.ozone_default 
Capacity     : 2147483648 B (2 GB) 
Total Used   : 31571968 B (30.11 MB) 
Total Used % : 1.47% 
Ozone Used   : 0 B (0 B) 
Ozone Used % : 0.00% 
Remaining    : 2115911680 B (1.97 GB) 
Remaining %  : 98.53% 
Container(s) : 0 

Usage Information (1 Datanodes)

UUID         : 7ce23e7c-41bf-4c5e-aea7-63af06049ad0 
IP Address   : 172.23.0.19 
Hostname     : ozone-datanode4-1.ozone_default 
Capacity     : 2147483648 B (2 GB) 
Total Used   : 31571968 B (30.11 MB) 
Total Used % : 1.47% 
Ozone Used   : 0 B (0 B) 
Ozone Used % : 0.00% 
Remaining    : 2115911680 B (1.97 GB) 
Remaining %  : 98.53% 
Container(s) : 0 
  • datanode report after the key was written:
Usage Information (1 Datanodes)

UUID         : 8ac06e73-a434-4fd3-9be5-bc271b8f6bf7 
IP Address   : 172.23.0.15 
Hostname     : ozone-datanode10-1.ozone_default 
Capacity     : 2147483648 B (2 GB) 
Total Used   : 1252982784 B (1.17 GB) 
Total Used % : 58.35% 
Ozone Used   : 0 B (0 B) 
Ozone Used % : 0.00% 
Remaining    : 894500864 B (853.06 MB) 
Remaining %  : 41.65% 
Container(s) : 1 

Usage Information (1 Datanodes)

UUID         : 9e3bf76a-8c98-4fa3-b034-c68973d7560b 
IP Address   : 172.23.0.10 
Hostname     : ozone-datanode3-1.ozone_default 
Capacity     : 2147483648 B (2 GB) 
Total Used   : 1252982784 B (1.17 GB) 
Total Used % : 58.35% 
Ozone Used   : 0 B (0 B) 
Ozone Used % : 0.00% 
Remaining    : 894500864 B (853.06 MB) 
Remaining %  : 41.65% 
Container(s) : 1 

Usage Information (1 Datanodes)

UUID         : a5df4b7d-92b3-407a-b56d-aaeb924f0cd6 
IP Address   : 172.23.0.7 
Hostname     : ozone-datanode2-1.ozone_default 
Capacity     : 2147483648 B (2 GB) 
Total Used   : 1248776192 B (1.16 GB) 
Total Used % : 58.15% 
Ozone Used   : 204800 B (200 KB) 
Ozone Used % : 0.01% 
Remaining    : 898707456 B (857.07 MB) 
Remaining %  : 41.85% 
Container(s) : 1 

Usage Information (1 Datanodes)

UUID         : e5a630fc-ce77-4211-89dc-15ab0ca67fde 
IP Address   : 172.23.0.6 
Hostname     : ozone-datanode8-1.ozone_default 
Capacity     : 2147483648 B (2 GB) 
Total Used   : 31596544 B (30.13 MB) 
Total Used % : 1.47% 
Ozone Used   : 0 B (0 B) 
Ozone Used % : 0.00% 
Remaining    : 2115887104 B (1.97 GB) 
Remaining %  : 98.53% 
Container(s) : 0 

Usage Information (1 Datanodes)

UUID         : 2d00f163-7f22-4395-b82b-e8353c74698b 
IP Address   : 172.23.0.5 
Hostname     : ozone-datanode7-1.ozone_default 
Capacity     : 2147483648 B (2 GB) 
Total Used   : 1252982784 B (1.17 GB) 
Total Used % : 58.35% 
Ozone Used   : 204800 B (200 KB) 
Ozone Used % : 0.01% 
Remaining    : 894500864 B (853.06 MB) 
Remaining %  : 41.65% 
Container(s) : 1 

Usage Information (1 Datanodes)

UUID         : 3a22dc09-acc5-4270-8007-135056031ed9 
IP Address   : 172.23.0.12 
Hostname     : ozone-datanode9-1.ozone_default 
Capacity     : 2147483648 B (2 GB) 
Total Used   : 1252982784 B (1.17 GB) 
Total Used % : 58.35% 
Ozone Used   : 0 B (0 B) 
Ozone Used % : 0.00% 
Remaining    : 894500864 B (853.06 MB) 
Remaining %  : 41.65% 
Container(s) : 1 

Usage Information (1 Datanodes)

UUID         : 4478a423-0416-429e-b41c-5e58b8ec7ed5 
IP Address   : 172.23.0.16 
Hostname     : ozone-datanode5-1.ozone_default 
Capacity     : 2147483648 B (2 GB) 
Total Used   : 1252982784 B (1.17 GB) 
Total Used % : 58.35% 
Ozone Used   : 0 B (0 B) 
Ozone Used % : 0.00% 
Remaining    : 894500864 B (853.06 MB) 
Remaining %  : 41.65% 
Container(s) : 1 

Usage Information (1 Datanodes)

UUID         : 5ffda4af-e5dd-4b11-a945-7fef0b53356b 
IP Address   : 172.23.0.17 
Hostname     : ozone-datanode1-1.ozone_default 
Capacity     : 2147483648 B (2 GB) 
Total Used   : 1252982784 B (1.17 GB) 
Total Used % : 58.35% 
Ozone Used   : 204800 B (200 KB) 
Ozone Used % : 0.01% 
Remaining    : 894500864 B (853.06 MB) 
Remaining %  : 41.65% 
Container(s) : 1 

Usage Information (1 Datanodes)

UUID         : 6fe5b830-5031-4a4c-a8fa-ae909c55d187 
IP Address   : 172.23.0.3 
Hostname     : ozone-datanode6-1.ozone_default 
Capacity     : 2147483648 B (2 GB) 
Total Used   : 1252982784 B (1.17 GB) 
Total Used % : 58.35% 
Ozone Used   : 204800 B (200 KB) 
Ozone Used % : 0.01% 
Remaining    : 894500864 B (853.06 MB) 
Remaining %  : 41.65% 
Container(s) : 1 

Usage Information (1 Datanodes)

UUID         : 7ce23e7c-41bf-4c5e-aea7-63af06049ad0 
IP Address   : 172.23.0.19 
Hostname     : ozone-datanode4-1.ozone_default 
Capacity     : 2147483648 B (2 GB) 
Total Used   : 1252982784 B (1.17 GB) 
Total Used % : 58.35% 
Ozone Used   : 0 B (0 B) 
Ozone Used % : 0.00% 
Remaining    : 894500864 B (853.06 MB) 
Remaining %  : 41.65% 
Container(s) : 1 

WHAT ??? 200KiB written key has changed the remaining storage size of 9 datanodes from 1.97GiB to 853MB, looks too strange, doesn't it?
I would report the committed bytes as a separate report data field as the remaining storage size will be more obvious for the user and the cluster admins

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is because we include commited in the remaining, so when a Container is created, the space that the Container might use will be deducted from the remaining.

If we don't include commited in the remaining, the problem is that the datanode list command shows that there is enough space remaining to create a Container, but in reality it is not possible to create a new Container, because inside the Datanode, the creation of a Container has to take into account commited.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly, and that's why I would provide the committed bytes count in reports (e.g. in datanode heartbeat requests, datanode usageifo or in datanode list cli commands) as a separate field and don't mix it with the remaining bytes count (Devide et impera policy :) )

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it looks like your initial solution is the more appropriate choice.

Slava Tutrinov added 2 commits December 6, 2023 09:39
…ailability as on datanode to allocate a new container
… providing the committed bytes metric separately
@vtutrinov vtutrinov force-pushed the HDDS-9807-scm-correct-algo-to-detect-datanode-availability branch from b1e2f73 to cfcc024 Compare December 6, 2023 08:53
Copy link
Contributor

@xichen01 xichen01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vtutrinov. The patch looks good, there are only a few Comments you can refer to.

when(confing.getStorageSize(eq(HDDS_DATANODE_VOLUME_MIN_FREE_SPACE),
eq(HDDS_DATANODE_VOLUME_MIN_FREE_SPACE_DEFAULT),
eq(StorageUnit.BYTES))).thenReturn(100000.0);
UUID datanodeUuid = spy(UUID.randomUUID());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spy Is it necessary for UUID?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, it isn't, fixed

when(datanodeInfo.getNodeStatus()).thenReturn(nodeStatus);
when(nodeMngr.getNodeByUuid(eq(datanodeUuid))).thenReturn(datanodeInfo);

StorageContainerDatanodeProtocolProtos.StorageReportProto storageReport1 =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can consider using the org.apache.hadoop.hdds.scm.HddsTestUtils#createStorageReport

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

.thenReturn(Collections.singletonList(metaReport));


// 500 committed bytes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: It would be good to comment on the process of calculating.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done



// 500 committed bytes
assertTrue(placementPolicy.isValidNode(datanodeDetails, 100, 4000));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can directly use hasEnoughSpace to assert

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't agree that it would be correct. We check the placement policy's behavior here and its implementations (e.g. org.apache.hadoop.hdds.scm.container.placement.algorithms.SCMContainerPlacementRackScatter) use isValid method to check the datanode availability

", volumes: " + fullVolumes;
}

public static boolean hasVolumeEnoughSpace(long volumeCapacity,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can put it into VolumeUsage, just like getMinVolumeFreeSpace

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

if (reportProto.getRemaining() > dataSizeRequired) {
if (hasVolumeEnoughSpace(reportProto.getCapacity(),
reportProto.getRemaining(), reportProto.getCommitted(),
dataSizeRequired, conf)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The configuration used in getMinVolumeFreeSpace is the Datanode configuration, what if the configuration is different for different Datanodes or not available in SCM?
Maybe volumeFreeSpaceToSpare should also report

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

+ " B", StringUtils.byteDesc(info.getRemaining()));
System.out.printf("%-13s: %s %n", "Remaining %",
PERCENT_FORMAT.format(info.getRemainingRatio()));
System.out.printf("%-13s: %s (%s) %n", "Committed", info.getCommitted()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Committed doesn't seem to be easy to understand outside of the code context, suggesting: Container Preallocated.
And add a Remaining Allocatable (remaining - commited)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

…t, tiny changes in test to check datanode availability depending on committed bytes count
@adoroszlai
Copy link
Contributor

Thanks @vtutrinov for updating the patch. There is a test failure in TestSCMCommonPlacementPolicy.testDatanodeIsInvalidInCaseOfIncreasingCommittedBytes: https://github.com/vtutrinov/ozone/actions/runs/7156214054/job/19485712139#step:5:1750

@vtutrinov
Copy link
Contributor Author

Thanks @vtutrinov for updating the patch. There is a test failure in TestSCMCommonPlacementPolicy.testDatanodeIsInvalidInCaseOfIncreasingCommittedBytes: https://github.com/vtutrinov/ozone/actions/runs/7156214054/job/19485712139#step:5:1750

Fixed

@xichen01
Copy link
Contributor

@vtutrinov The patch looks good.
The only thing we might want to consider at this point is FreeSpaceToSpare. Requires the "remaining allocatable space" greater than FreeSpaceToSpare when create a new Container, so maybe we should add that to the output of the UsageInfo command as well, what do you think?

@vtutrinov
Copy link
Contributor Author

vtutrinov commented Dec 11, 2023

@vtutrinov The patch looks good. The only thing we might want to consider at this point is FreeSpaceToSpare. Requires the "remaining allocatable space" greater than FreeSpaceToSpare when create a new Container, so maybe we should add that to the output of the UsageInfo command as well, what do you think?

Yep, it makes sense, I will do
UPD: done

Copy link
Contributor

@sumitagrawl sumitagrawl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vtutrinov Thanks for working over this, overall LGTM

long committed = vol.getCommittedBytes();
long available = free - committed;
long volumeFreeSpace =
long volumeFreeSpaceTpSpare =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: spelling incorrect, volumeFreeSpaceToSpare

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

System.out.printf("%-13s: %s (%s) %n", "Remaining Allocatable",
(info.getRemaining() - info.getCommitted()) + " B",
StringUtils.byteDesc((info.getRemaining() - info.getCommitted())));
System.out.printf("%-13s: %s (%s) %n", "Free Space To Spare",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should make this newly added lines aligned
Such as:

//..
Remaining    : 4751560704 B (4.43 GB)
Remaining %  : 45.02%
Container Pre-allocated : 0 B (0 B)
Remaining Allocatable   : 4751560704 B (4.43 GB)
Free Space To Spare     : 0 B (0 B)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@vtutrinov vtutrinov force-pushed the HDDS-9807-scm-correct-algo-to-detect-datanode-availability branch from 3b13629 to a6ff69a Compare December 11, 2023 13:00
@xichen01
Copy link
Contributor

@vtutrinov Thank for updating the patch. LGTM + 1 .

@adoroszlai
Copy link
Contributor

@aswinshakil @sadanand48 please review

1 similar comment
@myskov
Copy link
Contributor

myskov commented Dec 19, 2023

@aswinshakil @sadanand48 please review

@adoroszlai adoroszlai merged commit 0e07225 into apache:master Dec 20, 2023
@adoroszlai
Copy link
Contributor

Thanks @vtutrinov for the patch, @sumitagrawl, @xichen01 for the review.

(Since release branch for 1.4 has been cut, we can fix any post-commit comments later, without holding back the upcoming release.)

@symious
Copy link
Contributor

symious commented Dec 21, 2023

@adoroszlai Could you help to confirm if this PR needs to be cherry-picked to ozone-1.4?

@adoroszlai
Copy link
Contributor

adoroszlai commented Dec 21, 2023

@symious @myskov I suggest not including it in 1.4.0, as this had little time to be tested. We can work on a patch release 1.4.1 soon afterwards.

@symious
Copy link
Contributor

symious commented Dec 21, 2023

@adoroszlai Thank you for the confirmation.

@szetszwo
Copy link
Contributor

This may cause NPE; see HDDS-10027.

xichen01 pushed a commit to xichen01/ozone that referenced this pull request Jul 17, 2024
xichen01 pushed a commit to xichen01/ozone that referenced this pull request Jul 17, 2024
xichen01 pushed a commit to xichen01/ozone that referenced this pull request Jul 17, 2024
vtutrinov pushed a commit to vtutrinov/ozone that referenced this pull request Jul 15, 2025
…can host new container (apache#5721)

(cherry picked from commit a74b09e0cdcc7234b883a31e4daea6039788f36d)
swamirishi pushed a commit to swamirishi/ozone that referenced this pull request Dec 3, 2025
…if datanode can host new container (apache#5721)

(cherry picked from commit 0e07225)

 Conflicts:
	hadoop-hdds/docs/themes/ozonedoc/static/swagger-resources/recon-api.yaml
	hadoop-ozone/recon/src/test/java/org/apache/hadoop/ozone/recon/api/TestNSSummaryEndpointWithFSO.java
	hadoop-ozone/recon/src/test/java/org/apache/hadoop/ozone/recon/api/TestNSSummaryEndpointWithLegacy.java
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants