-
Notifications
You must be signed in to change notification settings - Fork 588
HDDS-4475.Extend DatanodeChunkGenerator to write all on all pipelines… #1600
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@sadanand48 , as discuseed offline. Can you make the write chunk tests to run concurrently rather than sequentially? Also, can we also add support for multiple datanodes along with pipeline Ids? |
|
@elek , can you plz have a look at this? |
elek
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will allow passing a list of pipelines as comma-separated by their pipeline ids and the load will be generated on the dns of the provided pipelines.
Thanks for the patch @sadanand48. I don't fully understand this line. Can you please explain the goals in more details?
I have one guess: the goal is to define the pipeline with the help of a datanode host name.
Today we have two options:
- If pipelineId is defined, that will be used
- If pipline is not defined, we check the SCM and will use the first Ratis/THREE open pipeline
In both cases the first datanode in the pipeline will be used to save chunk.
If I understood well, this patch would like to offer a third option: if the datanode is set, the list of the pipelines will be further restricted to the pipelines where the specified datanode is a member.
If this is the goal, it should be possible with adjusting the filter conditions:
temp = pipelinesFromSCM.stream()
.filter(p -> p.getFactor() == ReplicationFactor.THREE)
.filter(p -> datanodeHosts.size() ==0 || pipelineContainsDatanode(p, datanodeHosts))
.findFirst()
.orElseThrow(() -> new IllegalArgumentException(
"Pipeline ID is NOT defined, and no pipeline " +
"has been found with factor=THREE"));(see the second filter)
I agree with @bshashikant, we shouldn't remove the logic of the parallel executions for this specific case.
(But please let me know if I misunderstood something)
|
Thanks @elek for the comments . The current tool takes only 1 pipelineID as parameter and generates chunks on this pipeline.If any arguments not provided, it will by default select the RATIS THREE pipeline and write to it.
|
|
Thanks to explain it, now I got it. In that case you should have additional modification to use the current loop: You should either select the first one OR the pipelines assigned to the datanode. Change And in (You should also initialize the full list of clients and close them...) Also: how would you like to guarantee that if the selected datanode is the leader of bot pipelines? |
|
(You can ping me offline if it was not clear, need help or if my understanding is till not right ;-) ) |
|
Thanks the update @sadanand48 This part seems to be suspicious: It has multiple |
elek
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, thanks the update @sadanand48
I have 3 minor/typo comments, but I like this approach
| } | ||
| if (pipelines.isEmpty()){ | ||
| throw new IllegalArgumentException( | ||
| "Coudln't find the any/the selected pipeline"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| "Coudln't find the any/the selected pipeline"); | |
| "Couldn't find the any/the selected pipeline"); |
| .acquireClient(firstPipeline); | ||
| xceiverClients = new ArrayList<>(); | ||
| xceiverClients.add(xceiverClientSpi); | ||
| LOG.info("Using pipeline {}", firstPipeline.getId()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| LOG.info("Using pipeline {}", firstPipeline.getId()); |
You don't need this line as you log the same information in the loop bellow.
|
|
||
| private XceiverClientSpi xceiverClientSpi; | ||
| @Option(names = {"-d", "--datanodes"}, | ||
| description = "Datanodes to use. ", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| description = "Datanodes to use. ", | |
| description = "Datanodes to use. Test will write to all the existing pipelines which this datanode is member of.", |
|
Merging it. Thanks for the continuous improvement @sadanand48 |
What changes were proposed in this pull request?
Currently, DatanodeChunkGenerator takes a single pipeline as a parameter. This will allow passing a list of pipelines as comma-separated by their pipeline ids and the load will be generated on the dns of the provided pipelines.
What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-4475
How was this patch tested?
Tested on docker.