Skip to content

Conversation

@whbing
Copy link
Contributor

@whbing whbing commented Sep 23, 2023

What changes were proposed in this pull request?

Add pipeline choose policy impl CapacityPipelineChoosePolicy.

Consider the following scenario:
Our cluster often scales up with new nodes, but the old nodes may already be quite full in terms of writes. The balancer's speed is usually slow, so it's essential to choose nodes with lower usage as much as possible.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-9345

How was this patch tested?

  1. unit test passed
  2. works well in online cluster env, info as follows:

dn storage info now:

Usage Information (100 Datanodes)

UUID         : e844c80f-f86a-458e-9d5b-998ccab26f6b
Capacity     : 191985138794496 B (174.61 TB)
Total Used   : 191880094604207 B (174.51 TB)
Total Used % : 99.95%
Ozone Used % : 99.88%
Remaining    : 105044190289 B (97.83 GB)
Remaining %  : 0.05%

UUID         : a80fad73-d92a-4ae3-a38a-627183550213
Capacity     : 95993097879552 B (87.31 TB)
Total Used   : 95881322125358 B (87.20 TB)
Total Used % : 99.88%
Ozone Used   : 95707847367083 B (87.05 TB)
Ozone Used % : 99.70%
Remaining    : 111775754194 B (104.10 GB)
Remaining %  : 0.12%

UUID         : 8a4b3279-c447-4bd1-815b-53f2b9cbbfcd
Capacity     : 191985138794496 B (174.61 TB)
Total Used   : 191672656451807 B (174.33 TB)
Total Used % : 99.84%
Ozone Used   : 191540205531706 B (174.20 TB)
Ozone Used % : 99.77%
Remaining    : 312482342689 B (291.02 GB)
Remaining %  : 0.16%

UUID         : 27604988-5cb8-4aca-99eb-909af2b5e65b
Capacity     : 95993097879552 B (87.31 TB)
Total Used   : 95823788713950 B (87.15 TB)
Total Used % : 99.82%
Ozone Used   : 95749906064930 B (87.08 TB)
Ozone Used % : 99.75%
Remaining    : 169309165602 B (157.68 GB)
Remaining %  : 0.18%
...
(ignore some )
...
UUID         : bd536fd8-979f-4103-afbc-48927f3d1c7c
Capacity     : 159987615662080 B (145.51 TB)
Total Used   : 25659199434752 B (23.34 TB)
Total Used % : 16.04%
Ozone Used   : 24361891610799 B (22.16 TB)
Ozone Used % : 15.23%
Remaining    : 134328416227328 B (122.17 TB)
Remaining %  : 83.96%

UUID         : 0a07b46c-b4a9-4608-96c5-7312dc80be61
Capacity     : 159987615662080 B (145.51 TB)
Total Used   : 25511656828928 B (23.20 TB)
Total Used % : 15.95%
Ozone Used   : 24156249565246 B (21.97 TB)
Ozone Used % : 15.10%
Remaining    : 134475958833152 B (122.31 TB)
Remaining %  : 84.05%

UUID         : 55c4eb7a-67d7-44b9-9226-16b34cfc9875
Capacity     : 159987615662080 B (145.51 TB)
Total Used   : 25475661058048 B (23.17 TB)
Total Used % : 15.92%
Ozone Used   : 24307327014485 B (22.11 TB)
Ozone Used % : 15.19%
Remaining    : 134511954604032 B (122.34 TB)
Remaining %  : 84.08%

UUID         : 82b96d23-05a0-492f-aae4-c749c5d2e92e
Capacity     : 159987615662080 B (145.51 TB)
Total Used   : 25472433516544 B (23.17 TB)
Total Used % : 15.92%
Ozone Used   : 24204556646585 B (22.01 TB)
Ozone Used % : 15.13%
Remaining    : 134515182145536 B (122.34 TB)
Remaining %  : 84.08%
   <property>
      <name>hdds.scm.pipeline.choose.policy.impl</name>
      <value>org.apache.hadoop.hdds.scm.pipeline.choose.algorithms.CapacityPipelineChoosePolicy</value>
   </property>

The debug log indicates that the node with lower storage rate is selected when selecting pipeline with this pr:

2023-09-25 19:40:12,194 [IPC Server handler 94 on 9863] DEBUG org.apache.hadoop.hdds.scm.PipelineChoosePolicy: Compare the max datanode storage in the two pipelines, 
first : SCMNodeStat{capacity=95993097879552, scmUsed=93070479413248, remaining=2904901464064}, 
second : SCMNodeStat{capacity=159987615662080, scmUsed=51905289928704, remaining=106962467737600}, 
and chosen the second pipeline = Pipeline[ Id: ae66a1e3-5ddf-493c-9751-3998b72182c7, Nodes: b6485c4a-c079-4c04-906d-9087fd785e2f{ip: 10.xxx.xxx.39, host: bigdata-xxx, ports: [REPLICATION=9886, RATIS=9858, RATIS_ADMIN=9857, RATIS_SERVER=9856, STANDALONE=9859], networkLocation: /default-rack, certSerialId: null, persistedOpState: IN_SERVICE, persistedOpStateExpiryEpochSec: 0}61e5e505-c0de-4c33-850f-a21d9b1971be{ip: 10.xxx.xxx.38, host: bigdata-xxx, ports: [REPLICATION=9886, RATIS=9858, RATIS_ADMIN=9857, RATIS_SERVER=9856, STANDALONE=9859], networkLocation: /default-rack, certSerialId: null, persistedOpState: IN_SERVICE, persistedOpStateExpiryEpochSec: 0}e3f99e46-b5f2-46fe-8d13-506498a06ad2{ip: 10.xxx.xxx.39, host: bigdata-xxx, ports: [REPLICATION=9886, RATIS=9858, RATIS_ADMIN=9857, RATIS_SERVER=9856, STANDALONE=9859], networkLocation: /default-rack, certSerialId: null, persistedOpState: IN_SERVICE, persistedOpStateExpiryEpochSec: 0}, ReplicationConfig: RATIS/THREE, State:OPEN, leaderId:e3f99e46-b5f2-46fe-8d13-506498a06ad2, CreationTimestamp2023-09-25T18:59:36.139+08:00[Asia/Shanghai]]

@sodonnel
Copy link
Contributor

Please add some description to the PR about how this new policy would work, why it is needed etc.

@whbing
Copy link
Contributor Author

whbing commented Sep 25, 2023

Please add some description to the PR about how this new policy would work, why it is needed etc.

Added at the beginning of this page

Copy link
Contributor

@siddhantsangwan siddhantsangwan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The overall idea looks good to me. I have some comments below. Haven't checked the tests yet.

Comment on lines 67 to 68
targetPipeline =
!metric1.isGreater(metric2.get()) ? pipeline1 : pipeline2;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's possible that we're checking two pipelines which share a datanode (multi raft), and that datanode is the most used one in both the pipelines. This will result in a tie and we'll choose the first pipeline. I'm wondering if it's better to break the tie by comparing the second most used node in that case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's possible that we're checking two pipelines which share a datanode (multi raft), and that datanode is the most used one in both the pipelines. This will result in a tie and we'll choose the first pipeline. I'm wondering if it's better to break the tie by comparing the second most used node in that case.

@siddhantsangwan Thanks for review ! It's a good idea to consider a second node. I'll update the code later.
( Also, I'm thinking there shouldn't be a need to consider a third node, as that might make the logic quite redundant. )

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I rebased master branch and add a commit "add second compare logic". I tested in test env and debug log is printed as follows.
@siddhantsangwan If you have the time, PTAL again. Thanks!

@whbing
Copy link
Contributor Author

whbing commented Nov 22, 2023

debug log as follows:

2023-11-22 15:45:10,855 [IPC Server handler 95 on default port 9863] DEBUG org.apache.hadoop.hdds.scm.PipelineChoosePolicy: Compare scmUsed in pipelines, first : [SCMNodeMetric{SCMNodeStat{capacity=107374182400, scmUsed=623083520, remaining=86085607424}}, SCMNodeMetric{SCMNodeStat{capacity=107313369088, scmUsed=257377201, remaining=92930936832}}, SCMNodeMetric{SCMNodeStat{capacity=107313369088, scmUsed=257315761, remaining=93191069696}}], second : [SCMNodeMetric{SCMNodeStat{capacity=107374182400, scmUsed=3412590592, remaining=88673959936}}, SCMNodeMetric{SCMNodeStat{capacity=107374182400, scmUsed=3342024704, remaining=97891262464}}, SCMNodeMetric{SCMNodeStat{capacity=107374182400, scmUsed=623083520, remaining=86085607424}}]
2023-11-22 15:45:10,856 [IPC Server handler 95 on default port 9863] DEBUG org.apache.hadoop.hdds.scm.PipelineChoosePolicy: Chosen the first pipeline by compared scmUsed
2023-11-22 15:45:16,467 [IPC Server handler 0 on default port 9863] DEBUG org.apache.hadoop.hdds.scm.PipelineChoosePolicy: Compare scmUsed in pipelines, first : [SCMNodeMetric{SCMNodeStat{capacity=107374182400, scmUsed=623083520, remaining=86085607424}}, SCMNodeMetric{SCMNodeStat{capacity=107313369088, scmUsed=257377201, remaining=92930936832}}, SCMNodeMetric{SCMNodeStat{capacity=107313369088, scmUsed=257315761, remaining=93191049216}}], second : [SCMNodeMetric{SCMNodeStat{capacity=107374182400, scmUsed=3412590592, remaining=88673959936}}, SCMNodeMetric{SCMNodeStat{capacity=107374182400, scmUsed=3342024704, remaining=97891262464}}, SCMNodeMetric{SCMNodeStat{capacity=107374182400, scmUsed=3319808000, remaining=93105278976}}]
2023-11-22 15:45:16,468 [IPC Server handler 0 on default port 9863] DEBUG org.apache.hadoop.hdds.scm.PipelineChoosePolicy: Chosen the first pipeline by compared scmUsed
2023-11-22 15:45:22,142 [IPC Server handler 14 on default port 9863] DEBUG org.apache.hadoop.hdds.scm.PipelineChoosePolicy: Compare scmUsed in pipelines, first : [SCMNodeMetric{SCMNodeStat{capacity=107374182400, scmUsed=3412590592, remaining=88673959936}}, SCMNodeMetric{SCMNodeStat{capacity=107374182400, scmUsed=3342024704, remaining=97891262464}}, SCMNodeMetric{SCMNodeStat{capacity=107374182400, scmUsed=623083520, remaining=86085607424}}], second : [SCMNodeMetric{SCMNodeStat{capacity=107374182400, scmUsed=3412590592, remaining=88673959936}}, SCMNodeMetric{SCMNodeStat{capacity=107374182400, scmUsed=3342024704, remaining=97891262464}}, SCMNodeMetric{SCMNodeStat{capacity=107374182400, scmUsed=3319808000, remaining=93105278976}}]
2023-11-22 15:45:22,143 [IPC Server handler 14 on default port 9863] DEBUG org.apache.hadoop.hdds.scm.PipelineChoosePolicy: Secondary compare because the first round is the same
2023-11-22 15:45:22,143 [IPC Server handler 14 on default port 9863] DEBUG org.apache.hadoop.hdds.scm.PipelineChoosePolicy: Chosen the first pipeline by compared scmUsed
2023-11-22 15:45:27,689 [IPC Server handler 95 on default port 9863] DEBUG org.apache.hadoop.hdds.scm.PipelineChoosePolicy: Compare the same pipeline Pipeline[ Id: f1458feb-0472-4d5f-a490-97029b65dcf5, Nodes: c187d45d-e703-4b6d-a7e7-ec125f5d59f6(zk3/10.96.xx.178)67c72d5b-6fff-4f39-8e9d-ca1ad3628bc3(zk2/10.96.xx.24)43dc44df-f27c-4ade-9651-501fd881a8d6(hadoop3/10.190.xx.5), ReplicationConfig: RATIS/THREE, State:OPEN, leaderId:43dc44df-f27c-4ade-9651-501fd881a8d6, CreationTimestamp2023-11-22T15:43:54.022+08:00[Asia/Chongqing]]
2023-11-22 15:45:27,690 [IPC Server handler 95 on default port 9863] DEBUG org.apache.hadoop.hdds.scm.PipelineChoosePolicy: Chosen the first pipeline by compared scmUsed
2023-11-22 15:45:33,414 [IPC Server handler 0 on default port 9863] DEBUG org.apache.hadoop.hdds.scm.PipelineChoosePolicy: Compare scmUsed in pipelines, first : [SCMNodeMetric{SCMNodeStat{capacity=107374182400, scmUsed=623083520, remaining=86085607424}}, SCMNodeMetric{SCMNodeStat{capacity=107313369088, scmUsed=257377201, remaining=92930936832}}, SCMNodeMetric{SCMNodeStat{capacity=107313369088, scmUsed=257315761, remaining=93191049216}}], second : [SCMNodeMetric{SCMNodeStat{capacity=107374182400, scmUsed=3319808000, remaining=93105278976}}, SCMNodeMetric{SCMNodeStat{capacity=107313369088, scmUsed=257377201, remaining=92930936832}}, SCMNodeMetric{SCMNodeStat{capacity=107313369088, scmUsed=257315761, remaining=93191049216}}]
2023-11-22 15:45:33,415 [IPC Server handler 0 on default port 9863] DEBUG org.apache.hadoop.hdds.scm.PipelineChoosePolicy: Chosen the first pipeline by compared scmUsed
2023-11-22 15:45:39,070 [IPC Server handler 14 on default port 9863] DEBUG org.apache.hadoop.hdds.scm.PipelineChoosePolicy: Compare scmUsed in pipelines, first : [SCMNodeMetric{SCMNodeStat{capacity=107374182400, scmUsed=3319808000, remaining=93105278976}}, SCMNodeMetric{SCMNodeStat{capacity=107313369088, scmUsed=257377201, remaining=92930936832}}, SCMNodeMetric{SCMNodeStat{capacity=107313369088, scmUsed=257315761, remaining=93191049216}}], second : [SCMNodeMetric{SCMNodeStat{capacity=107374182400, scmUsed=623083520, remaining=86085607424}}, SCMNodeMetric{SCMNodeStat{capacity=107313369088, scmUsed=257377201, remaining=92930936832}}, SCMNodeMetric{SCMNodeStat{capacity=107313369088, scmUsed=257315761, remaining=93191049216}}]
2023-11-22 15:45:39,071 [IPC Server handler 14 on default port 9863] DEBUG org.apache.hadoop.hdds.scm.PipelineChoosePolicy: Chosen the second pipeline by compared scmUsed
2023-11-22 15:45:44,801 [IPC Server handler 97 on default port 9863] DEBUG org.apache.hadoop.hdds.scm.PipelineChoosePolicy: Compare scmUsed in pipelines, first : [SCMNodeMetric{SCMNodeStat{capacity=107374182400, scmUsed=3319808000, remaining=93105278976}}, SCMNodeMetric{SCMNodeStat{capacity=107313369088, scmUsed=257377201, remaining=92930936832}}, SCMNodeMetric{SCMNodeStat{capacity=107313369088, scmUsed=257317475, remaining=93190995968}}], second : [SCMNodeMetric{SCMNodeStat{capacity=107374182400, scmUsed=623083520, remaining=86085607424}}, SCMNodeMetric{SCMNodeStat{capacity=107313369088, scmUsed=257377201, remaining=92930936832}}, SCMNodeMetric{SCMNodeStat{capacity=107313369088, scmUsed=257317475, remaining=93190995968}}]
2023-11-22 15:45:44,802 [IPC Server handler 97 on default port 9863] DEBUG org.apache.hadoop.hdds.scm.PipelineChoosePolicy: Chosen the second pipeline by compared scmUsed
2023-11-22 15:45:51,010 [IPC Server handler 14 on default port 9863] DEBUG org.apache.hadoop.hdds.scm.PipelineChoosePolicy: Compare scmUsed in pipelines, first : [SCMNodeMetric{SCMNodeStat{capacity=107374182400, scmUsed=3412590592, remaining=88673959936}}, SCMNodeMetric{SCMNodeStat{capacity=107374182400, scmUsed=3342024704, remaining=97895714816}}, SCMNodeMetric{SCMNodeStat{capacity=107374182400, scmUsed=623083520, remaining=86085607424}}], second : [SCMNodeMetric{SCMNodeStat{capacity=107374182400, scmUsed=3412590592, remaining=88673959936}}, SCMNodeMetric{SCMNodeStat{capacity=107374182400, scmUsed=3342024704, remaining=97895714816}}, SCMNodeMetric{SCMNodeStat{capacity=107374182400, scmUsed=3319808000, remaining=93105278976}}]
2023-11-22 15:45:51,011 [IPC Server handler 14 on default port 9863] DEBUG org.apache.hadoop.hdds.scm.PipelineChoosePolicy: Secondary compare because the first round is the same
2023-11-22 15:45:51,011 [IPC Server handler 14 on default port 9863] DEBUG org.apache.hadoop.hdds.scm.PipelineChoosePolicy: Chosen the first pipeline by compared scmUsed

Format the above log for easy analysis:

2023-11-22 15:45:10,855 [IPC Server handler 95 on default port 9863] DEBUG org.apache.hadoop.hdds.scm.PipelineChoosePolicy: Compare scmUsed in pipelines, 
first :  [SCMNodeMetric{SCMNodeStat{capacity=107374182400, scmUsed=623083520, remaining=86085607424}}, 
          SCMNodeMetric{SCMNodeStat{capacity=107313369088, scmUsed=257377201, remaining=92930936832}}, 
          SCMNodeMetric{SCMNodeStat{capacity=107313369088, scmUsed=257315761, remaining=93191069696}}], 
second : [SCMNodeMetric{SCMNodeStat{capacity=107374182400, scmUsed=3412590592, remaining=88673959936}}, 
          SCMNodeMetric{SCMNodeStat{capacity=107374182400, scmUsed=3342024704, remaining=97891262464}}, 
          SCMNodeMetric{SCMNodeStat{capacity=107374182400, scmUsed=623083520, remaining=86085607424}}]
2023-11-22 15:45:10,856 [IPC Server handler 95 on default port 9863] DEBUG org.apache.hadoop.hdds.scm.PipelineChoosePolicy: Chosen the first pipeline by compared scmUsed

2023-11-22 15:45:16,467 [IPC Server handler 0 on default port 9863] DEBUG org.apache.hadoop.hdds.scm.PipelineChoosePolicy: Compare scmUsed in pipelines,
first :  [SCMNodeMetric{SCMNodeStat{capacity=107374182400, scmUsed=623083520, remaining=86085607424}}, 
          SCMNodeMetric{SCMNodeStat{capacity=107313369088, scmUsed=257377201, remaining=92930936832}}, 
          SCMNodeMetric{SCMNodeStat{capacity=107313369088, scmUsed=257315761, remaining=93191049216}}],
second : [SCMNodeMetric{SCMNodeStat{capacity=107374182400, scmUsed=3412590592, remaining=88673959936}}, 
          SCMNodeMetric{SCMNodeStat{capacity=107374182400, scmUsed=3342024704, remaining=97891262464}},
          SCMNodeMetric{SCMNodeStat{capacity=107374182400, scmUsed=3319808000, remaining=93105278976}}]
2023-11-22 15:45:16,468 [IPC Server handler 0 on default port 9863] DEBUG org.apache.hadoop.hdds.scm.PipelineChoosePolicy: Chosen the first pipeline by compared scmUsed

2023-11-22 15:45:22,142 [IPC Server handler 14 on default port 9863] DEBUG org.apache.hadoop.hdds.scm.PipelineChoosePolicy: Compare scmUsed in pipelines,
first :  [SCMNodeMetric{SCMNodeStat{capacity=107374182400, scmUsed=3412590592, remaining=88673959936}},
          SCMNodeMetric{SCMNodeStat{capacity=107374182400, scmUsed=3342024704, remaining=97891262464}}, 
          SCMNodeMetric{SCMNodeStat{capacity=107374182400, scmUsed=623083520, remaining=86085607424}}],
second : [SCMNodeMetric{SCMNodeStat{capacity=107374182400, scmUsed=3412590592, remaining=88673959936}}, 
          SCMNodeMetric{SCMNodeStat{capacity=107374182400, scmUsed=3342024704, remaining=97891262464}}, 
          SCMNodeMetric{SCMNodeStat{capacity=107374182400, scmUsed=3319808000, remaining=93105278976}}]
2023-11-22 15:45:22,143 [IPC Server handler 14 on default port 9863] DEBUG org.apache.hadoop.hdds.scm.PipelineChoosePolicy: Secondary compare because the first round is the same
2023-11-22 15:45:22,143 [IPC Server handler 14 on default port 9863] DEBUG org.apache.hadoop.hdds.scm.PipelineChoosePolicy: Chosen the first pipeline by compared scmUsed

2023-11-22 15:45:27,689 [IPC Server handler 95 on default port 9863] DEBUG org.apache.hadoop.hdds.scm.PipelineChoosePolicy: Compare the same pipeline Pipeline[ Id: f1458feb-0472-4d5f-a490-97029b65dcf5, Nodes: c187d45d-e703-4b6d-a7e7-ec125f5d59f6(zk3/10.96.xx.178)67c72d5b-6fff-4f39-8e9d-ca1ad3628bc3(zk2/10.96.xx.24)43dc44df-f27c-4ade-9651-501fd881a8d6(hadoop3/10.190.xx.5), ReplicationConfig: RATIS/THREE, State:OPEN, leaderId:43dc44df-f27c-4ade-9651-501fd881a8d6, CreationTimestamp2023-11-22T15:43:54.022+08:00[Asia/Chongqing]]
2023-11-22 15:45:27,690 [IPC Server handler 95 on default port 9863] DEBUG org.apache.hadoop.hdds.scm.PipelineChoosePolicy: Chosen the first pipeline by compared scmUsed

2023-11-22 15:45:33,414 [IPC Server handler 0 on default port 9863] DEBUG org.apache.hadoop.hdds.scm.PipelineChoosePolicy: Compare scmUsed in pipelines,
first :  [SCMNodeMetric{SCMNodeStat{capacity=107374182400, scmUsed=623083520, remaining=86085607424}}, 
          SCMNodeMetric{SCMNodeStat{capacity=107313369088, scmUsed=257377201, remaining=92930936832}}, 
          SCMNodeMetric{SCMNodeStat{capacity=107313369088, scmUsed=257315761, remaining=93191049216}}], 
second : [SCMNodeMetric{SCMNodeStat{capacity=107374182400, scmUsed=3319808000, remaining=93105278976}}, 
          SCMNodeMetric{SCMNodeStat{capacity=107313369088, scmUsed=257377201, remaining=92930936832}}, 
          SCMNodeMetric{SCMNodeStat{capacity=107313369088, scmUsed=257315761, remaining=93191049216}}]
2023-11-22 15:45:33,415 [IPC Server handler 0 on default port 9863] DEBUG org.apache.hadoop.hdds.scm.PipelineChoosePolicy: Chosen the first pipeline by compared scmUsed

2023-11-22 15:45:39,070 [IPC Server handler 14 on default port 9863] DEBUG org.apache.hadoop.hdds.scm.PipelineChoosePolicy: Compare scmUsed in pipelines,
first :  [SCMNodeMetric{SCMNodeStat{capacity=107374182400, scmUsed=3319808000, remaining=93105278976}}, 
          SCMNodeMetric{SCMNodeStat{capacity=107313369088, scmUsed=257377201, remaining=92930936832}}, 
          SCMNodeMetric{SCMNodeStat{capacity=107313369088, scmUsed=257315761, remaining=93191049216}}],
second : [SCMNodeMetric{SCMNodeStat{capacity=107374182400, scmUsed=623083520, remaining=86085607424}}, 
          SCMNodeMetric{SCMNodeStat{capacity=107313369088, scmUsed=257377201, remaining=92930936832}}, 
          SCMNodeMetric{SCMNodeStat{capacity=107313369088, scmUsed=257315761, remaining=93191049216}}]
2023-11-22 15:45:39,071 [IPC Server handler 14 on default port 9863] DEBUG org.apache.hadoop.hdds.scm.PipelineChoosePolicy: Chosen the second pipeline by compared scmUsed

2023-11-22 15:45:44,801 [IPC Server handler 97 on default port 9863] DEBUG org.apache.hadoop.hdds.scm.PipelineChoosePolicy: Compare scmUsed in pipelines,
first :  [SCMNodeMetric{SCMNodeStat{capacity=107374182400, scmUsed=3319808000, remaining=93105278976}}, 
          SCMNodeMetric{SCMNodeStat{capacity=107313369088, scmUsed=257377201, remaining=92930936832}},
          SCMNodeMetric{SCMNodeStat{capacity=107313369088, scmUsed=257317475, remaining=93190995968}}],
second : [SCMNodeMetric{SCMNodeStat{capacity=107374182400, scmUsed=623083520, remaining=86085607424}}, 
          SCMNodeMetric{SCMNodeStat{capacity=107313369088, scmUsed=257377201, remaining=92930936832}}, 
          SCMNodeMetric{SCMNodeStat{capacity=107313369088, scmUsed=257317475, remaining=93190995968}}]
2023-11-22 15:45:44,802 [IPC Server handler 97 on default port 9863] DEBUG org.apache.hadoop.hdds.scm.PipelineChoosePolicy: Chosen the second pipeline by compared scmUsed

2023-11-22 15:45:51,010 [IPC Server handler 14 on default port 9863] DEBUG org.apache.hadoop.hdds.scm.PipelineChoosePolicy: Compare scmUsed in pipelines,
first :  [SCMNodeMetric{SCMNodeStat{capacity=107374182400, scmUsed=3412590592, remaining=88673959936}},
          SCMNodeMetric{SCMNodeStat{capacity=107374182400, scmUsed=3342024704, remaining=97895714816}},
          SCMNodeMetric{SCMNodeStat{capacity=107374182400, scmUsed=623083520, remaining=86085607424}}],
second : [SCMNodeMetric{SCMNodeStat{capacity=107374182400, scmUsed=3412590592, remaining=88673959936}}, 
          SCMNodeMetric{SCMNodeStat{capacity=107374182400, scmUsed=3342024704, remaining=97895714816}}, 
          SCMNodeMetric{SCMNodeStat{capacity=107374182400, scmUsed=3319808000, remaining=93105278976}}]
2023-11-22 15:45:51,011 [IPC Server handler 14 on default port 9863] DEBUG org.apache.hadoop.hdds.scm.PipelineChoosePolicy: Secondary compare because the first round is the same
2023-11-22 15:45:51,011 [IPC Server handler 14 on default port 9863] DEBUG org.apache.hadoop.hdds.scm.PipelineChoosePolicy: Chosen the first pipeline by compared scmUsed

Meet expectations.

Copy link
Contributor

@xichen01 xichen01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@whbing The change looks good. Just a few comment you can refer to.

@Override
public Pipeline choosePipeline(List<Pipeline> pipelineList,
PipelineRequestInformation pri) {
Pipeline pipeline1 = healthPolicy.choosePipeline(pipelineList, pri);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In some Cluster, There's maybe close hundred pipelines. We just compare two Pipeline in here.
Does this make the probability of the largest (in capacity) Pipeline being selected low?

Perhaps a possible solution is to add a configuration that determines how many Pipelines are compared at a time, which takes the value [0, 1]

  • When it is 0, only one Pipeline is selected at a time, which is basically equivalent to the RandomPipelineChoosePolicy.
  • When 1, it compares all Pipelines, and strictly chooses the largest Pipeline in the whole world.

PS: But even if this feature needs to be implemented, I think it can be done in another PR, and when this PR is merged, the current solution will work in a small cluster.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In some Cluster, There's maybe close hundred pipelines. We just compare two Pipeline in here. Does this make the probability of the largest (in capacity) Pipeline being selected low?

Perhaps a possible solution is to add a configuration that determines how many Pipelines are compared at a time, which takes the value [0, 1]

  • When it is 0, only one Pipeline is selected at a time, which is basically equivalent to the RandomPipelineChoosePolicy.
  • When 1, it compares all Pipelines, and strictly chooses the largest Pipeline in the whole world.

PS: But even if this feature needs to be implemented, I think it can be done in another PR, and when this PR is merged, the current solution will work in a small cluster.

@xichen01 Thanks for review ! About the logic of selection, there are links to this original papers in HDFS-11564. The algorithms of choosing 2 random nodes and then placing the container on the lower utilization node is discussed in great depth in this survey paper.
https://pdfs.semanticscholar.org/3597/66cb47572028eb70c797115e987ff203e83f.pdf
In addition, SCMContainerPlacementCapacity#chooseNode also uses this algorithm. So, I wonder if it is not necessary to find the pipeline with minimum storage every time?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have any test result for this algo comparing with random healthy node policy? just to see effectivness of algo.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xichen01 Thanks for review ! About the logic of selection, there are links to this original papers in HDFS-11564. The algorithms of choosing 2 random nodes and then placing the container on the lower utilization node is discussed in great depth in this survey paper.
https://pdfs.semanticscholar.org/3597/66cb47572028eb70c797115e987ff203e83f.pdf
In addition, SCMContainerPlacementCapacity#chooseNode also uses this algorithm. So, I wonder if it is not necessary to find the pipeline with minimum storage every time?

@whbing Understood. For a fairly balanced cluster, such as a new one, this strategy can work very well, providing similar loads to all DataNodes.
However, for a significantly unbalanced cluster, like when adding new nodes, this strategy might be limited, especially in larger clusters.
But for the latter case (adding new nodes), we can also balance it using a balancer

@whbing
Copy link
Contributor Author

whbing commented Nov 23, 2023

Copy link
Contributor

@sumitagrawl sumitagrawl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@whbing Thanks for working over this, IMO looks another approach to replace Random policy. Have few query...

@Override
public Pipeline choosePipeline(List<Pipeline> pipelineList,
PipelineRequestInformation pri) {
Pipeline pipeline1 = healthPolicy.choosePipeline(pipelineList, pri);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have any test result for this algo comparing with random healthy node policy? just to see effectivness of algo.

@xichen01
Copy link
Contributor

@whbing thanks for you update, LGTM +1

@whbing
Copy link
Contributor Author

whbing commented Nov 30, 2023

Run test, got selected result like:

pipeline0 selected count: 62
pipeline1 selected count: 205
pipeline2 selected count: 308
pipeline3 selected count: 425

Copy link
Contributor

@sumitagrawl sumitagrawl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@whbing others LGTM,
are we planning to make policy as default? or its just an option provided. We may need have in defined docs.

@whbing
Copy link
Contributor Author

whbing commented Dec 5, 2023

are we planning to make policy as default? or its just an option provided. We may need have in defined docs.

@sumitagrawl an option provided and NOT change the default value. Add description in ScmConfig.java

Copy link
Contributor

@adoroszlai adoroszlai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @whbing for working on this.

@adoroszlai adoroszlai dismissed their stale review January 18, 2024 18:26

patch updated

Copy link
Contributor

@adoroszlai adoroszlai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot @whbing for updating the patch, LGTM.

Comment on lines +34 to +37
default PipelineChoosePolicy init(final NodeManager nodeManager) {
// override if the policy requires nodeManager
return this;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

@adoroszlai
Copy link
Contributor

@siddhantsangwan @sodonnel please take another look

Copy link
Contributor

@siddhantsangwan siddhantsangwan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Does anyone else have any comments? Let's get this committed, it's been open for a long time now.

@adoroszlai adoroszlai merged commit 73e6f90 into apache:master Jan 22, 2024
@adoroszlai
Copy link
Contributor

Thanks @whbing for the patch, @siddhantsangwan, @sodonnel, @sumitagrawl, @xichen01 for the review.

Tejaskriya pushed a commit to Tejaskriya/ozone that referenced this pull request Jan 24, 2024
k5342 pushed a commit to pfnet/ozone that referenced this pull request Apr 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants