HDDS-2588. Consolidate compose environments #238

adoroszlai · 2019-11-20T19:36:05Z

What changes were proposed in this pull request?

There are a few slightly different sample docker compose environments: ozone, ozoneperf, ozones3, ozone-recon. This change proposes to merge these 4 by minor additions to ozoneperf:

add recon service from ozone-recon
run GDPR and S3 tests
expose datanode web port (eg. for profiling)

Plus: also run ozone-shell test (from basic suite), which is currently run only in ozonesecure

I also propose to rename ozoneperf to ozone for simplicity.

Consolidating these 4 environments would slightly reduce both code duplication and the time needed for acceptance tests.

https://issues.apache.org/jira/browse/HDDS-2588

How was this patch tested?

Ran acceptance test in ozone dir. Generated keys using freon, verified that Jaeger, Prometheus, Grafana reflect the operations.

Clean CI in private branch.

bharatviswa504 · 2019-11-21T04:09:18Z

hadoop-ozone/dist/src/main/compose/ozone/docker-config

 OZONE-SITE.XML_ozone.metadata.dirs=/data/metadata
-OZONE-SITE.XML_ozone.handler.type=distributed
+OZONE-SITE.XML_ozone.recon.db.dir=/data/metadata/recon
+OZONE-SITE.XML_ozone.recon.om.db.dir=/data/metadata/recon


/data/metadata/reon -> /data/metadata/om

Thanks @bharatviswa504 for spotting this. These config values from ozone-recon env as they were.

@avijayanhwx @swagle can Recon use the same directory for both ozone.recon.db.dir and ozone.recon.om.db.dir?

@adoroszlai Yes the same directory can be used.

Thanks @avijayanhwx. Then I think we can keep it as is.

bharatviswa504 · 2019-11-21T04:12:29Z

Good idea, overall LGTM. One minor comment.
One more suggestion, In a similar way, we can merge ozone-om-ha and ozone-om-ha-s3 also.

adoroszlai · 2019-11-21T07:20:01Z

Thanks @bharatviswa504 for the review.

In a similar way, we can merge ozone-om-ha and ozone-om-ha-s3 also.

Unfortunately cannot do that, since ozone-om-ha is very different (custom docker image with SSH, starts/stops OM manually, etc.).

elek · 2019-11-21T16:06:53Z

Thanks to work on this @adoroszlai . I always felt the same, that we have too many environments, but I didn't know the right answer. therefore I just share my thoughts this is not a 👍 or 👎 as I don't know which one is the better.

1.THE PHILOSOPHY

It's -- at least partially -- a philosophical question, what is Ozone. (And as it's philosophy, I am interested about the opinion of our philosopher of Ozone cc @anuengineer)

By default I would say:

ozone = scm/om/datanode + s3g + recon
ozone-perf = ozone + all the tools to monitor

But we can also say what what this patch says:

ozone: the full ozone experience (scm/om/datanode + s3g + recon + all the monitoring tools).

(at lest for Apache Ozone, this is not true for proprietary distributions).

THE USABILITY

The other question is usability: Do we need Prometheus and Jaeger all the time. Do we need to start them when we would like to test any of the features of ozone?

Prometheus: maybe
Jaeger: I am not sure.

THE GUARANTEE

My third (actual) problem is the guarantee of a cluster. To avoid same flakiness I would like to declare that we need at least 3 datanodes for ozone compose clusters (see. HDDS-2606). But I am not happy with the usability that Ozone can't work if you don't do a docker-compose scale datanode=3

We may need a smaller subset of ozone (ozone-core) where the three datanodes are not required (but not used to run all the different acceptance testing). In this case the usability problem (2) can be solved as we can have a leaner ozone cluster to test locally.

adoroszlai · 2019-11-21T20:36:22Z

Thanks @elek for your thoughts. I think (1) and (2) are addressed by the followup commit, which extracts monitoring and profiling into separate configs. (Let me know if any of the configs are miscategorized.) They can be mixed in as desired by one of:

# no COMPOSE_FILE var                                                  # => only Ozone
export COMPOSE_FILE=docker-compose.yaml:monitoring.yaml                # => add monitoring
export COMPOSE_FILE=docker-compose.yaml:profiling.yaml                 # => add profiling
export COMPOSE_FILE=docker-compose.yaml:monitoring.yaml:profiling.yaml # => add both

I need to think (or talk) about your third point (3) more.

elek · 2019-11-28T12:16:21Z

I think (1) and (2) are addressed by the followup commit, which extracts monitoring and profiling into separate configs

Thanks the update @adoroszlai This approach is very smart, but I have some fear how easy is to understand it. (One additional function of the compose folders to provide simple examples to use ozone.)

But let's try out this approach. I am fine with it.

Can you please update the README.txt inside compose/ozone (currently it's the original ozoneperf readme, It can be simplified but we need to add information about the COMPOSE_FILE=... trick)?

adoroszlai · 2019-11-28T12:34:18Z

Thanks for the feedback @elek.

Can you please update the README.txt

Sure, will do, but didn't want to write doc until the code is OK-ed. ;)

… env. variable

adoroszlai · 2019-12-02T07:27:43Z

Changes in last two commits besides README update:

fix freon-*.yaml:
- versions need to match other yamls
- monitoring config is required for getting Freon spans
set safemode.min.datanode and ozone.replication to value of OZONE_REPLICATION_FACTOR environment variable
add run.sh to make startup simpler (--scale datanodes based on OZONE_REPLICATION_FACTOR)

So docker-compose up without --scale still starts a cluster with 1 datanode, but ozone sh key put works OK, whereas previously it required explicit -r ONE. And getting a 3-datanode cluster is as simple as OZONE_REPLICATION_FACTOR=3 ./run.sh.

elek

Thank you @adoroszlai the update.

I am happy with this approach with setting the scale with OZONE_REPLICATION_FACTOR but wouldn't it be more clean to do it inside #282 / HDDS-2646 for all the environments together?

elek · 2019-12-03T12:54:36Z

hadoop-ozone/dist/src/main/compose/ozone/docker-compose.yaml

    ports:
      - 9876:9876
-    environment:
-      ENSURE_SCM_INITIALIZED: /data/metadata/scm/current/VERSION


Why is it better to move out this to a separated file? I think it's easier to overview if it's in this file. Do we really need a separated file to store this one line?

I hated moving it out, but the following considerations together made me do it:

Variable substitution only works in the yaml files, not in configs passed via env_file. So safemode.min.datanode and ozone.replication need to be in environment.

When merging common-config, one complete dict overrides the other: depending on the order either common-config or the specific service gets to define environment.

So I moved out these one-liners to avoid duplicating the "two-liners". Plus it seems unlikely that anyone ever wants to change these infrastructure related variables.

Now another alternative occurred to me: we might define a separate dict for the configs to be merged into environment. Let me experiment a bit with this. If it works, we could avoid the separate config files.

I would prefer the separated common configs. Especially as we have only a few lines of settings they can be included in the common configs all together. But it's not a blocker for now, we can commit it (thanks to explain the reason behind the small files...)

ae2bc55 moves these one-liners back to docker-compose.yaml and the separate files are no longer needed.

elek · 2019-12-03T12:56:30Z

hadoop-ozone/dist/src/main/compose/ozone/run.sh

+
+declare -ix OZONE_REPLICATION_FACTOR
+: ${OZONE_REPLICATION_FACTOR:=1}
+docker-compose up --scale datanode=${OZONE_REPLICATION_FACTOR} --no-recreate "$@"


Can you please help me to understand why do we need --no-recreate?

Without --no-recreate, if I run docker-compose up a second time to start Freon after cluster is up, Docker Compose may recreate all existing containers. I noticed that this happens when running the second command from another terminal, while following logs in the original terminal. Not sure if this is really the cause, or what other conditions could trigger it. Using the flag helps avoid this situation.

Do you think it may cause problems in other cases?

Well, while I agree the call for re-create is correct here. @adoroszlai You did ask the question. The issue can happen if we have a cluster that was running that has error-ed out. Then re-running this command will not reset the system. But it is probably something that we can live with, or fix much later. I predict for a long time, when someone reports an issue, we will say " make sure you kill all running docker containers". But then, traditionally that is our first debugging step whenever someone tells us that docker based stuff is not stable for them.

Just to make sure, I am +1 and ok with this change. Just responding to your question; that is all.

In my experience the docker-compose up worked well even from other terminal if nothing has been changed and the docker-compose file set was the same.

Can we start the scm first with docker-compose up -d scm and after everything else with docker-compose up -d with this no-recreate?

if nothing has been changed and the docker-compose file set was the same.

But the readme says freon compose file should be added only when datanodes are up, so the set is not the same.

https://github.com/apache/hadoop-ozone/blob/76ad638b47232761a1732281188162e5c31308d8/hadoop-ozone/dist/src/main/compose/ozoneperf/README.md#L47-L51

adoroszlai · 2019-12-03T13:44:29Z

I am happy with this approach with setting the scale with OZONE_REPLICATION_FACTOR but wouldn't it be more clean to do it inside #282 / HDDS-2646 for all the environments together?

I'm fine with reworking this one to accommodate whatever changes are done in #282. I wanted to avoid blocking the acceptance test fix for a usability improvement.

anuengineer · 2019-12-04T19:54:11Z

It's -- at least partially -- a philosophical question, what is Ozone. (And as it's philosophy, I am interested about the opinion of our philosopher of Ozone cc @anuengineer)

I am okay with what this patch says -- in reality, irrespective of what we say, there will be monitoring , tracing and logging collectors in place for most data centers. So irrespective of what we do (Prometheus, Jaeger, Grafana, Fluentd) the system admins will do the right thing for them.

We are just show casing that fact that, it is trivial to do this with Ozone. So when someone is evaluating Ozone, the question of how can I really run this service in production is answered via the presence of these tools. I would go a step ahead add these as recipes in the Ozone documentation too.

elek

+1. Let me commit it now.

It's a newer approach and we can improve it in follow-up jiras if needed.

Thanks @bharatviswa504 and @anuengineer the review and @adoroszlai the patch.

adoroszlai · 2019-12-12T11:01:25Z

Thanks @anuengineer, @avijayanhwx, @bharatviswa504 for review, and @elek for review and commit.

adoroszlai added 2 commits November 20, 2019 16:26

HDDS-2588. Consolidate compose environments

800cda2

HDDS-2588. Rename ozoneperf compose env to ozone

b4b5866

dineshchitlangia requested a review from elek November 20, 2019 20:55

bharatviswa504 reviewed Nov 21, 2019

View reviewed changes

HDDS-2588. Extract monitoring and profiling into composable parts

1677175

elek mentioned this pull request Nov 29, 2019

HDDS-2646. Start acceptance tests only if at least one THREE pipeline is available #282

Merged

adoroszlai added 3 commits November 30, 2019 20:22

Merge remote-tracking branch 'origin/master' into HDDS-2588

2fe6bb8

HDDS-2588. Update README, fix Freon

246e35b

HDDS-2588. Add run.sh for convenience; control replication factor via…

382e2a5

… env. variable

elek reviewed Dec 3, 2019

View reviewed changes

HDDS-2588. No need for one-liner config files

ae2bc55

Merge remote-tracking branch 'origin/master' into HDDS-2588

537da78

elek approved these changes Dec 12, 2019

View reviewed changes

elek closed this in e14f709 Dec 12, 2019

adoroszlai deleted the HDDS-2588 branch December 12, 2019 11:01

HDDS-2588. Consolidate compose environments #238

HDDS-2588. Consolidate compose environments #238

Uh oh!

Conversation

adoroszlai commented Nov 20, 2019

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bharatviswa504 commented Nov 21, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

adoroszlai commented Nov 21, 2019

Uh oh!

elek commented Nov 21, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

adoroszlai commented Nov 21, 2019

Uh oh!

elek commented Nov 28, 2019

Uh oh!

adoroszlai commented Nov 28, 2019

Uh oh!

adoroszlai commented Dec 2, 2019

Uh oh!

elek left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

adoroszlai Dec 6, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

adoroszlai commented Dec 3, 2019

Uh oh!

anuengineer commented Dec 4, 2019

Uh oh!

elek left a comment

Choose a reason for hiding this comment

Uh oh!

adoroszlai commented Dec 12, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

bharatviswa504 commented Nov 21, 2019 •

edited

Loading

elek commented Nov 21, 2019 •

edited

Loading

adoroszlai Dec 6, 2019 •

edited

Loading