-
Notifications
You must be signed in to change notification settings - Fork 588
HDDS-2588. Consolidate compose environments #238
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| OZONE-SITE.XML_ozone.metadata.dirs=/data/metadata | ||
| OZONE-SITE.XML_ozone.handler.type=distributed | ||
| OZONE-SITE.XML_ozone.recon.db.dir=/data/metadata/recon | ||
| OZONE-SITE.XML_ozone.recon.om.db.dir=/data/metadata/recon |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/data/metadata/reon -> /data/metadata/om
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @bharatviswa504 for spotting this. These config values from ozone-recon env as they were.
@avijayanhwx @swagle can Recon use the same directory for both ozone.recon.db.dir and ozone.recon.om.db.dir?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@adoroszlai Yes the same directory can be used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @avijayanhwx. Then I think we can keep it as is.
|
Good idea, overall LGTM. One minor comment. |
|
Thanks @bharatviswa504 for the review.
Unfortunately cannot do that, since |
|
Thanks to work on this @adoroszlai . I always felt the same, that we have too many environments, but I didn't know the right answer. therefore I just share my thoughts this is not a 👍 or 👎 as I don't know which one is the better. 1.THE PHILOSOPHY It's -- at least partially -- a philosophical question, what is Ozone. (And as it's philosophy, I am interested about the opinion of our philosopher of Ozone cc @anuengineer) By default I would say:
But we can also say what what this patch says:
(at lest for Apache Ozone, this is not true for proprietary distributions).
The other question is usability: Do we need Prometheus and Jaeger all the time. Do we need to start them when we would like to test any of the features of ozone? Prometheus: maybe
My third (actual) problem is the guarantee of a cluster. To avoid same flakiness I would like to declare that we need at least 3 datanodes for ozone compose clusters (see. HDDS-2606). But I am not happy with the usability that Ozone can't work if you don't do a We may need a smaller subset of ozone ( |
|
Thanks @elek for your thoughts. I think (1) and (2) are addressed by the followup commit, which extracts monitoring and profiling into separate configs. (Let me know if any of the configs are miscategorized.) They can be mixed in as desired by one of: I need to think (or talk) about your third point (3) more. |
Thanks the update @adoroszlai This approach is very smart, but I have some fear how easy is to understand it. (One additional function of the compose folders to provide simple examples to use ozone.) But let's try out this approach. I am fine with it. Can you please update the README.txt inside |
|
Thanks for the feedback @elek.
Sure, will do, but didn't want to write doc until the code is OK-ed. ;) |
|
Changes in last two commits besides README update:
So |
elek
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @adoroszlai the update.
I am happy with this approach with setting the scale with OZONE_REPLICATION_FACTOR but wouldn't it be more clean to do it inside #282 / HDDS-2646 for all the environments together?
| ports: | ||
| - 9876:9876 | ||
| environment: | ||
| ENSURE_SCM_INITIALIZED: /data/metadata/scm/current/VERSION |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is it better to move out this to a separated file? I think it's easier to overview if it's in this file. Do we really need a separated file to store this one line?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hated moving it out, but the following considerations together made me do it:
- Variable substitution only works in the yaml files, not in configs passed via
env_file. Sosafemode.min.datanodeandozone.replicationneed to be inenvironment. - When merging
common-config, one complete dict overrides the other: depending on the order eithercommon-configor the specific service gets to defineenvironment.
So I moved out these one-liners to avoid duplicating the "two-liners". Plus it seems unlikely that anyone ever wants to change these infrastructure related variables.
Now another alternative occurred to me: we might define a separate dict for the configs to be merged into environment. Let me experiment a bit with this. If it works, we could avoid the separate config files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer the separated common configs. Especially as we have only a few lines of settings they can be included in the common configs all together. But it's not a blocker for now, we can commit it (thanks to explain the reason behind the small files...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ae2bc55 moves these one-liners back to docker-compose.yaml and the separate files are no longer needed.
|
|
||
| declare -ix OZONE_REPLICATION_FACTOR | ||
| : ${OZONE_REPLICATION_FACTOR:=1} | ||
| docker-compose up --scale datanode=${OZONE_REPLICATION_FACTOR} --no-recreate "$@" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please help me to understand why do we need --no-recreate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without --no-recreate, if I run docker-compose up a second time to start Freon after cluster is up, Docker Compose may recreate all existing containers. I noticed that this happens when running the second command from another terminal, while following logs in the original terminal. Not sure if this is really the cause, or what other conditions could trigger it. Using the flag helps avoid this situation.
Do you think it may cause problems in other cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, while I agree the call for re-create is correct here. @adoroszlai You did ask the question. The issue can happen if we have a cluster that was running that has error-ed out. Then re-running this command will not reset the system. But it is probably something that we can live with, or fix much later. I predict for a long time, when someone reports an issue, we will say " make sure you kill all running docker containers". But then, traditionally that is our first debugging step whenever someone tells us that docker based stuff is not stable for them.
Just to make sure, I am +1 and ok with this change. Just responding to your question; that is all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my experience the docker-compose up worked well even from other terminal if nothing has been changed and the docker-compose file set was the same.
Can we start the scm first with docker-compose up -d scm and after everything else with docker-compose up -d with this no-recreate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if nothing has been changed and the docker-compose file set was the same.
But the readme says freon compose file should be added only when datanodes are up, so the set is not the same.
I'm fine with reworking this one to accommodate whatever changes are done in #282. I wanted to avoid blocking the acceptance test fix for a usability improvement. |
I am okay with what this patch says -- in reality, irrespective of what we say, there will be monitoring , tracing and logging collectors in place for most data centers. So irrespective of what we do (Prometheus, Jaeger, Grafana, Fluentd) the system admins will do the right thing for them. We are just show casing that fact that, it is trivial to do this with Ozone. So when someone is evaluating Ozone, the question of how can I really run this service in production is answered via the presence of these tools. I would go a step ahead add these as recipes in the Ozone documentation too. |
elek
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1. Let me commit it now.
It's a newer approach and we can improve it in follow-up jiras if needed.
Thanks @bharatviswa504 and @anuengineer the review and @adoroszlai the patch.
|
Thanks @anuengineer, @avijayanhwx, @bharatviswa504 for review, and @elek for review and commit. |
What changes were proposed in this pull request?
There are a few slightly different sample docker compose environments:
ozone,ozoneperf,ozones3,ozone-recon. This change proposes to merge these 4 by minor additions toozoneperf:reconservice fromozone-reconPlus: also run
ozone-shelltest (frombasicsuite), which is currently run only inozonesecureI also propose to rename
ozoneperftoozonefor simplicity.Consolidating these 4 environments would slightly reduce both code duplication and the time needed for acceptance tests.
https://issues.apache.org/jira/browse/HDDS-2588
How was this patch tested?
Ran acceptance test in
ozonedir. Generated keys using freon, verified that Jaeger, Prometheus, Grafana reflect the operations.Clean CI in private branch.