-
Notifications
You must be signed in to change notification settings - Fork 588
HDDS-2646. Start acceptance tests only if at least one THREE pipeline is available #282
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@ChenSammi You are more experienced with this area. Can you please review this approach / patch? |
| OZONE-SITE.XML_ozone.metadata.dirs=/data/metadata | ||
| OZONE-SITE.XML_ozone.scm.client.address=scm | ||
| OZONE-SITE.XML_ozone.replication=3 | ||
| OZONE-SITE.XML_hdds.scm.safemode.min.datanode=3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not very faimilar with docker-compose. Where do we tell docker-compose to start three datanodes with all these configurations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
test.sh calls start_docker_env from the testlib.sh which calls docker-compose scale datanode=3.
Unfortunately there is no easy way to define the expected number of the containers in the docker-compose.yaml. ( There is a deploy / replicas but it's available only for docker swarm and not for docker-compose)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Thanks for the explanation.
adoroszlai
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @elek for working on fixing acceptance test flakiness.
| OZONE-SITE.XML_ozone.replication=3 | ||
| OZONE-SITE.XML_hdds.datanode.dir=/data/hdds | ||
| OZONE-SITE.XML_hdds.profiler.endpoint.enabled=true | ||
| OZONE-SITE.XML_hdds.scm.safemode.min.datanode=3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will make it impossible to use these environments with a single datanode without modifying the config locally.
I would like to propose an alternative solution:
- define this config in the
environmentsection indocker-compose.yamlusing a variable that defaults to 1:
- "OZONE-SITE.XML_hdds.scm.safemode.min.datanode=${SAFEMODE_MIN_DATANODES:-1}" - set the variable to 3 in
testlib.sh:
export SAFEMODE_MIN_DATANODES=3
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is the problem what I tried to describe in #238 I am fine with the suggested approach but it makes more complex the definition.
What I am thinking is to create a simple compose definition which can work with one datanode (and we can adjust there the replication factor and the s3 storage type as well).
Almost all the tested functionality requires datanode=3, it seems to be enough to have one cluster which can work with one datanode...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I'm fine with the hard-coded values in order to get acceptance tests in a good shape. We can refine it later. Until then, config can be edited locally if needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
During an offline conversation I understood your use case: at some cases it can be useful to make the compose folder usable just with one datanode (eg. when ui / recon or shell scripts is tested).
I pushed a new commit to experiment with your proposal.
|
@elek and @adoroszlai Thanks for explaining this patch and next patch in pipeline to me. Appreciate it. I have committed this patch to the master. @ChenSammi Thanks for the review. |
|
@adoroszlai found a small typo. Fixed with an addendum commit (6c1a9ff) |
What changes were proposed in this pull request?
After HDDS-2034 (or even before?) pipeline creation (or the status transition from ALLOCATE to OPEN) requires at least one pipeline report from all of the datanodes. Which means that the cluster might not be usable even if it's out from the safe mode AND there are at least three datanodes.
It makes all the acceptance tests unstable.
For example in this run.
As you can see the pipeline is created but the the cluster is not usable as it's not yet reporter back by datanode_2:
The quick fix is to configure all the compose clusters to wait until (at least) one pipeline is available. This can be done by adjusting the number of the required datanodes:
What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-2646
How was this patch tested?
If something is wrong, acceptance tests are failing. We need green run from the CI.