-
Notifications
You must be signed in to change notification settings - Fork 588
HDDS-9807. Consider volume committed space when checking if datanode can host new container #5721
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HDDS-9807. Consider volume committed space when checking if datanode can host new container #5721
Conversation
|
@xichen01 please take a look |
|
|
||
| long getScmUsed(); | ||
|
|
||
| long getRemaining(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vtutrinov @adoroszlai
What is the definition of remaining, I noticed that remaining is actually Volume available. available=capacity - reserved - used.
- So does
committedhave to be reported to the SCM separately. - Or is it added to
remaining, which is defined as the amount of space on the Volume that is guaranteed to be available for data storage?available=capacity - reserved - used - commited.
We useavailablein a lot of places, and it seems like we should takecommitedintoavailable.
Line 473 in 9b3d045
| remaining = volumeInfo.get().getAvailable(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems it makes sense, I will try to check the approach
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@xichen01
I've pushed a new commit with the proposed approach and the build is green, as such as the manual test from the jira ticket. But I'm concerned about the datanode storage metrics (output of the ozone admin datanode usageinfo command) and recon's reports - from the user's perspective it will seen contr ordinal:
- I've written a 200KiB key to the EC-bucket with rs-6-3-1024k replication (the cluster has 10 datanodes with 2Gb storages mounted to /data)
- datanodes report before the key was written:
Usage Information (1 Datanodes)
UUID : 8ac06e73-a434-4fd3-9be5-bc271b8f6bf7
IP Address : 172.23.0.15
Hostname : ozone-datanode10-1.ozone_default
Capacity : 2147483648 B (2 GB)
Total Used : 31571968 B (30.11 MB)
Total Used % : 1.47%
Ozone Used : 0 B (0 B)
Ozone Used % : 0.00%
Remaining : 2115911680 B (1.97 GB)
Remaining % : 98.53%
Container(s) : 0
Usage Information (1 Datanodes)
UUID : 9e3bf76a-8c98-4fa3-b034-c68973d7560b
IP Address : 172.23.0.10
Hostname : ozone-datanode3-1.ozone_default
Capacity : 2147483648 B (2 GB)
Total Used : 31571968 B (30.11 MB)
Total Used % : 1.47%
Ozone Used : 0 B (0 B)
Ozone Used % : 0.00%
Remaining : 2115911680 B (1.97 GB)
Remaining % : 98.53%
Container(s) : 0
Usage Information (1 Datanodes)
UUID : a5df4b7d-92b3-407a-b56d-aaeb924f0cd6
IP Address : 172.23.0.7
Hostname : ozone-datanode2-1.ozone_default
Capacity : 2147483648 B (2 GB)
Total Used : 27365376 B (26.10 MB)
Total Used % : 1.27%
Ozone Used : 0 B (0 B)
Ozone Used % : 0.00%
Remaining : 2120118272 B (1.97 GB)
Remaining % : 98.73%
Container(s) : 0
Usage Information (1 Datanodes)
UUID : e5a630fc-ce77-4211-89dc-15ab0ca67fde
IP Address : 172.23.0.6
Hostname : ozone-datanode8-1.ozone_default
Capacity : 2147483648 B (2 GB)
Total Used : 31571968 B (30.11 MB)
Total Used % : 1.47%
Ozone Used : 0 B (0 B)
Ozone Used % : 0.00%
Remaining : 2115911680 B (1.97 GB)
Remaining % : 98.53%
Container(s) : 0
Usage Information (1 Datanodes)
UUID : 2d00f163-7f22-4395-b82b-e8353c74698b
IP Address : 172.23.0.5
Hostname : ozone-datanode7-1.ozone_default
Capacity : 2147483648 B (2 GB)
Total Used : 31571968 B (30.11 MB)
Total Used % : 1.47%
Ozone Used : 0 B (0 B)
Ozone Used % : 0.00%
Remaining : 2115911680 B (1.97 GB)
Remaining % : 98.53%
Container(s) : 0
Usage Information (1 Datanodes)
UUID : 3a22dc09-acc5-4270-8007-135056031ed9
IP Address : 172.23.0.12
Hostname : ozone-datanode9-1.ozone_default
Capacity : 2147483648 B (2 GB)
Total Used : 31571968 B (30.11 MB)
Total Used % : 1.47%
Ozone Used : 0 B (0 B)
Ozone Used % : 0.00%
Remaining : 2115911680 B (1.97 GB)
Remaining % : 98.53%
Container(s) : 0
Usage Information (1 Datanodes)
UUID : 4478a423-0416-429e-b41c-5e58b8ec7ed5
IP Address : 172.23.0.16
Hostname : ozone-datanode5-1.ozone_default
Capacity : 2147483648 B (2 GB)
Total Used : 31571968 B (30.11 MB)
Total Used % : 1.47%
Ozone Used : 0 B (0 B)
Ozone Used % : 0.00%
Remaining : 2115911680 B (1.97 GB)
Remaining % : 98.53%
Container(s) : 0
Usage Information (1 Datanodes)
UUID : 5ffda4af-e5dd-4b11-a945-7fef0b53356b
IP Address : 172.23.0.17
Hostname : ozone-datanode1-1.ozone_default
Capacity : 2147483648 B (2 GB)
Total Used : 31571968 B (30.11 MB)
Total Used % : 1.47%
Ozone Used : 0 B (0 B)
Ozone Used % : 0.00%
Remaining : 2115911680 B (1.97 GB)
Remaining % : 98.53%
Container(s) : 0
Usage Information (1 Datanodes)
UUID : 6fe5b830-5031-4a4c-a8fa-ae909c55d187
IP Address : 172.23.0.3
Hostname : ozone-datanode6-1.ozone_default
Capacity : 2147483648 B (2 GB)
Total Used : 31571968 B (30.11 MB)
Total Used % : 1.47%
Ozone Used : 0 B (0 B)
Ozone Used % : 0.00%
Remaining : 2115911680 B (1.97 GB)
Remaining % : 98.53%
Container(s) : 0
Usage Information (1 Datanodes)
UUID : 7ce23e7c-41bf-4c5e-aea7-63af06049ad0
IP Address : 172.23.0.19
Hostname : ozone-datanode4-1.ozone_default
Capacity : 2147483648 B (2 GB)
Total Used : 31571968 B (30.11 MB)
Total Used % : 1.47%
Ozone Used : 0 B (0 B)
Ozone Used % : 0.00%
Remaining : 2115911680 B (1.97 GB)
Remaining % : 98.53%
Container(s) : 0
- datanode report after the key was written:
Usage Information (1 Datanodes)
UUID : 8ac06e73-a434-4fd3-9be5-bc271b8f6bf7
IP Address : 172.23.0.15
Hostname : ozone-datanode10-1.ozone_default
Capacity : 2147483648 B (2 GB)
Total Used : 1252982784 B (1.17 GB)
Total Used % : 58.35%
Ozone Used : 0 B (0 B)
Ozone Used % : 0.00%
Remaining : 894500864 B (853.06 MB)
Remaining % : 41.65%
Container(s) : 1
Usage Information (1 Datanodes)
UUID : 9e3bf76a-8c98-4fa3-b034-c68973d7560b
IP Address : 172.23.0.10
Hostname : ozone-datanode3-1.ozone_default
Capacity : 2147483648 B (2 GB)
Total Used : 1252982784 B (1.17 GB)
Total Used % : 58.35%
Ozone Used : 0 B (0 B)
Ozone Used % : 0.00%
Remaining : 894500864 B (853.06 MB)
Remaining % : 41.65%
Container(s) : 1
Usage Information (1 Datanodes)
UUID : a5df4b7d-92b3-407a-b56d-aaeb924f0cd6
IP Address : 172.23.0.7
Hostname : ozone-datanode2-1.ozone_default
Capacity : 2147483648 B (2 GB)
Total Used : 1248776192 B (1.16 GB)
Total Used % : 58.15%
Ozone Used : 204800 B (200 KB)
Ozone Used % : 0.01%
Remaining : 898707456 B (857.07 MB)
Remaining % : 41.85%
Container(s) : 1
Usage Information (1 Datanodes)
UUID : e5a630fc-ce77-4211-89dc-15ab0ca67fde
IP Address : 172.23.0.6
Hostname : ozone-datanode8-1.ozone_default
Capacity : 2147483648 B (2 GB)
Total Used : 31596544 B (30.13 MB)
Total Used % : 1.47%
Ozone Used : 0 B (0 B)
Ozone Used % : 0.00%
Remaining : 2115887104 B (1.97 GB)
Remaining % : 98.53%
Container(s) : 0
Usage Information (1 Datanodes)
UUID : 2d00f163-7f22-4395-b82b-e8353c74698b
IP Address : 172.23.0.5
Hostname : ozone-datanode7-1.ozone_default
Capacity : 2147483648 B (2 GB)
Total Used : 1252982784 B (1.17 GB)
Total Used % : 58.35%
Ozone Used : 204800 B (200 KB)
Ozone Used % : 0.01%
Remaining : 894500864 B (853.06 MB)
Remaining % : 41.65%
Container(s) : 1
Usage Information (1 Datanodes)
UUID : 3a22dc09-acc5-4270-8007-135056031ed9
IP Address : 172.23.0.12
Hostname : ozone-datanode9-1.ozone_default
Capacity : 2147483648 B (2 GB)
Total Used : 1252982784 B (1.17 GB)
Total Used % : 58.35%
Ozone Used : 0 B (0 B)
Ozone Used % : 0.00%
Remaining : 894500864 B (853.06 MB)
Remaining % : 41.65%
Container(s) : 1
Usage Information (1 Datanodes)
UUID : 4478a423-0416-429e-b41c-5e58b8ec7ed5
IP Address : 172.23.0.16
Hostname : ozone-datanode5-1.ozone_default
Capacity : 2147483648 B (2 GB)
Total Used : 1252982784 B (1.17 GB)
Total Used % : 58.35%
Ozone Used : 0 B (0 B)
Ozone Used % : 0.00%
Remaining : 894500864 B (853.06 MB)
Remaining % : 41.65%
Container(s) : 1
Usage Information (1 Datanodes)
UUID : 5ffda4af-e5dd-4b11-a945-7fef0b53356b
IP Address : 172.23.0.17
Hostname : ozone-datanode1-1.ozone_default
Capacity : 2147483648 B (2 GB)
Total Used : 1252982784 B (1.17 GB)
Total Used % : 58.35%
Ozone Used : 204800 B (200 KB)
Ozone Used % : 0.01%
Remaining : 894500864 B (853.06 MB)
Remaining % : 41.65%
Container(s) : 1
Usage Information (1 Datanodes)
UUID : 6fe5b830-5031-4a4c-a8fa-ae909c55d187
IP Address : 172.23.0.3
Hostname : ozone-datanode6-1.ozone_default
Capacity : 2147483648 B (2 GB)
Total Used : 1252982784 B (1.17 GB)
Total Used % : 58.35%
Ozone Used : 204800 B (200 KB)
Ozone Used % : 0.01%
Remaining : 894500864 B (853.06 MB)
Remaining % : 41.65%
Container(s) : 1
Usage Information (1 Datanodes)
UUID : 7ce23e7c-41bf-4c5e-aea7-63af06049ad0
IP Address : 172.23.0.19
Hostname : ozone-datanode4-1.ozone_default
Capacity : 2147483648 B (2 GB)
Total Used : 1252982784 B (1.17 GB)
Total Used % : 58.35%
Ozone Used : 0 B (0 B)
Ozone Used % : 0.00%
Remaining : 894500864 B (853.06 MB)
Remaining % : 41.65%
Container(s) : 1
WHAT ??? 200KiB written key has changed the remaining storage size of 9 datanodes from 1.97GiB to 853MB, looks too strange, doesn't it?
I would report the committed bytes as a separate report data field as the remaining storage size will be more obvious for the user and the cluster admins
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is because we include commited in the remaining, so when a Container is created, the space that the Container might use will be deducted from the remaining.
If we don't include commited in the remaining, the problem is that the datanode list command shows that there is enough space remaining to create a Container, but in reality it is not possible to create a new Container, because inside the Datanode, the creation of a Container has to take into account commited.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exactly, and that's why I would provide the committed bytes count in reports (e.g. in datanode heartbeat requests, datanode usageifo or in datanode list cli commands) as a separate field and don't mix it with the remaining bytes count (Devide et impera policy :) )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it looks like your initial solution is the more appropriate choice.
…ailability as on datanode to allocate a new container
… providing the committed bytes metric separately
b1e2f73 to
cfcc024
Compare
…stead of providing the committed bytes metric separately" This reverts commit cfcc024.
…port and datanode usageinfo
xichen01
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vtutrinov. The patch looks good, there are only a few Comments you can refer to.
| when(confing.getStorageSize(eq(HDDS_DATANODE_VOLUME_MIN_FREE_SPACE), | ||
| eq(HDDS_DATANODE_VOLUME_MIN_FREE_SPACE_DEFAULT), | ||
| eq(StorageUnit.BYTES))).thenReturn(100000.0); | ||
| UUID datanodeUuid = spy(UUID.randomUUID()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
spy Is it necessary for UUID?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, it isn't, fixed
| when(datanodeInfo.getNodeStatus()).thenReturn(nodeStatus); | ||
| when(nodeMngr.getNodeByUuid(eq(datanodeUuid))).thenReturn(datanodeInfo); | ||
|
|
||
| StorageContainerDatanodeProtocolProtos.StorageReportProto storageReport1 = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can consider using the org.apache.hadoop.hdds.scm.HddsTestUtils#createStorageReport
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
| .thenReturn(Collections.singletonList(metaReport)); | ||
|
|
||
|
|
||
| // 500 committed bytes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: It would be good to comment on the process of calculating.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
|
|
||
|
|
||
| // 500 committed bytes | ||
| assertTrue(placementPolicy.isValidNode(datanodeDetails, 100, 4000)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can directly use hasEnoughSpace to assert
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't agree that it would be correct. We check the placement policy's behavior here and its implementations (e.g. org.apache.hadoop.hdds.scm.container.placement.algorithms.SCMContainerPlacementRackScatter) use isValid method to check the datanode availability
| ", volumes: " + fullVolumes; | ||
| } | ||
|
|
||
| public static boolean hasVolumeEnoughSpace(long volumeCapacity, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can put it into VolumeUsage, just like getMinVolumeFreeSpace
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
| if (reportProto.getRemaining() > dataSizeRequired) { | ||
| if (hasVolumeEnoughSpace(reportProto.getCapacity(), | ||
| reportProto.getRemaining(), reportProto.getCommitted(), | ||
| dataSizeRequired, conf)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The configuration used in getMinVolumeFreeSpace is the Datanode configuration, what if the configuration is different for different Datanodes or not available in SCM?
Maybe volumeFreeSpaceToSpare should also report
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
| + " B", StringUtils.byteDesc(info.getRemaining())); | ||
| System.out.printf("%-13s: %s %n", "Remaining %", | ||
| PERCENT_FORMAT.format(info.getRemainingRatio())); | ||
| System.out.printf("%-13s: %s (%s) %n", "Committed", info.getCommitted() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Committed doesn't seem to be easy to understand outside of the code context, suggesting: Container Preallocated.
And add a Remaining Allocatable (remaining - commited)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
…t, tiny changes in test to check datanode availability depending on committed bytes count
|
Thanks @vtutrinov for updating the patch. There is a test failure in |
…not relevant anymore)
Fixed |
|
@vtutrinov The patch looks good. |
Yep, it makes sense, I will do |
sumitagrawl
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vtutrinov Thanks for working over this, overall LGTM
| long committed = vol.getCommittedBytes(); | ||
| long available = free - committed; | ||
| long volumeFreeSpace = | ||
| long volumeFreeSpaceTpSpare = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: spelling incorrect, volumeFreeSpaceToSpare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
…e usageinfo command output
| System.out.printf("%-13s: %s (%s) %n", "Remaining Allocatable", | ||
| (info.getRemaining() - info.getCommitted()) + " B", | ||
| StringUtils.byteDesc((info.getRemaining() - info.getCommitted()))); | ||
| System.out.printf("%-13s: %s (%s) %n", "Free Space To Spare", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should make this newly added lines aligned
Such as:
//..
Remaining : 4751560704 B (4.43 GB)
Remaining % : 45.02%
Container Pre-allocated : 0 B (0 B)
Remaining Allocatable : 4751560704 B (4.43 GB)
Free Space To Spare : 0 B (0 B)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
3b13629 to
a6ff69a
Compare
|
@vtutrinov Thank for updating the patch. LGTM + 1 . |
…ct-algo-to-detect-datanode-availability
|
@aswinshakil @sadanand48 please review |
1 similar comment
|
@aswinshakil @sadanand48 please review |
|
Thanks @vtutrinov for the patch, @sumitagrawl, @xichen01 for the review. (Since release branch for 1.4 has been cut, we can fix any post-commit comments later, without holding back the upcoming release.) |
|
@adoroszlai Could you help to confirm if this PR needs to be cherry-picked to ozone-1.4? |
|
@adoroszlai Thank you for the confirmation. |
|
This may cause NPE; see HDDS-10027. |
…can host new container (apache#5721) (cherry picked from commit 0e07225)
…can host new container (apache#5721) (cherry picked from commit 0e07225)
…can host new container (apache#5721) (cherry picked from commit 0e07225)
…can host new container (apache#5721) (cherry picked from commit a74b09e0cdcc7234b883a31e4daea6039788f36d)
…if datanode can host new container (apache#5721) (cherry picked from commit 0e07225) Conflicts: hadoop-hdds/docs/themes/ozonedoc/static/swagger-resources/recon-api.yaml hadoop-ozone/recon/src/test/java/org/apache/hadoop/ozone/recon/api/TestNSSummaryEndpointWithFSO.java hadoop-ozone/recon/src/test/java/org/apache/hadoop/ozone/recon/api/TestNSSummaryEndpointWithLegacy.java
What changes were proposed in this pull request?
The algorithm of detection of the datanode availability for a new container was different between the SCM and datanodes in case of EC buckets. The SCM doesn't take into account the committed bytes count and serves the client with an invalid pipeline and then the datanodes from the provided pipeline will fail to create new containers
The PR proposes a common way of computing the datanode availability for a new container allocation
What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-9807
How was this patch tested?