Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade to ES 5 #8

Open
wants to merge 15 commits into
base: master
Choose a base branch
from
Open

Conversation

sberryman
Copy link

Please don’t merge this yet, this is a work in progress. Doing this to get feedback from you guys.

@sberryman
Copy link
Author

  1. I'm not really a big fan of elasticsearch-master, elasticsearch-data and elasticsearch (with a service name of elasticsearch-master). I'm guessing this has to do with elasticsearch_master being a "gateway" where you would expose the ports? Wouldn't it be better to have HA running and reverse proxy to the elastic masters?
  2. It is ugly how it just keeps hitting consul every second to see if there are any other services registered as 'elasticsearch-master'. Maybe creating a lock is a good idea?
  3. Seeing warnings for consol 0.8.0 now, any blockers to upgrading that you are aware of?
  4. I'm not cleaning up unzip or tar yet, that will be uncommented in the future.
  5. You had ES_BOOTSTRAP_HOST as an env var for discovery.zen.ping.unicast.hosts: ["${ES_BOOTSTRAP_HOST}"] but it isn't defined in the project.
  6. When it comes to security I lack a lot of common sense :) How does the following code block in Dockerfile look?
# Create and take ownership over required directories
RUN mkdir -p /opt/consul/config \
    && mkdir -p /opt/consul/data \
    && chmod 770 /opt/consul/data \
    && chown -R elasticsearch:elasticsearch /opt/consul \
    && mkdir -p /etc/containerpilot \
    && chmod -R g+w /etc/containerpilot \
    && chmod +x /usr/local/bin/elastic-server.sh \
    && chown -R elasticsearch:elasticsearch /etc/containerpilot

Dockerfile Outdated
COPY /bin/* /usr/local/bin/

# Should we remove unzip?
# RUN apk del unzip tar
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing anything in a different layer doesn't reduce total image size.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, does it if you squash it?

@@ -0,0 +1,6 @@
#!/bin/sh -xe
manage.sh onStart #|| exit $?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anything like this should be in a preStart, not wrapping the main app. That would also eliminate the need for this shell script.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I copied this from the redis autopilot example.
https://github.com/autopilotpattern/redis/blob/master/bin/redis-server-sentinel.sh Where it obviously does a bit more.

Does preStart fire BEFORE you start the co-processes? I'm asking because we need the co-process consul agent to start before we start the onStart function.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did some research and unfortunately this has to be a separate wrapper as the coprocess doesn't start until immediately after preStart. We need the coprocess (consul agent) to start prior to onStart. onStart's job is there to check whether it has joined the consul cluster at which point we can verify if other services exist. If we did this in preStart, the consul agent won't be running and will assume every container should become the master.

Coprocesses are started immediately following the exit of the preStart hook, before polling for health or backend hooks, and before the main application starts. - Containerpilot Docs

I think this could be confusing for developers like me who are just getting started with containerpilot and consul agent running as a coprocess. Is there a reason you guys use CONSUL_AGENT=1 in most of the examples. What is the purpose of making that optional? Meaning, why wouldn't you always use consul agent on each container?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bah, no you're right. This will be fixed in CPv3 because we're forcing people to admit they need a real init system. But in the meantime we're stuck with it.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would a "real" init system change this? I'm not sure what you mean by real init though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In CPv3 we're supporting multiple processes as first-class, rather than having a "main" process and coprocesses. The end user will create the chain of dependencies explicitly, so in this case we'll be able to have everything wait on the consul agent. This is pretty much how every modern init system works, but was something we were avoiding in ContainerPilot originally because of community... obstinance(?) around the notion of there being more than one process in a container. Turns out that's exactly what you want a lot of the time. 😀

Dockerfile Outdated
@@ -1,54 +1,65 @@
FROM debian:jessie
FROM docker.elastic.co/elasticsearch/elasticsearch:5.3.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why swap out for the official image from Elastic? We were using the debian image because it's lean and gives us a good starting point.

There are a couple problems with the official image for us here:

  • according to the Elastic docs this image includes X-Pack, which has only a trial license. I don't even want to start down the road of reconciling this repo's MPL license with that. 😀 So I don't think this is appropriate for us to distribute.
  • I can't find the Dockerfile for this image anywhere, or at least not easily, so even if we did use this image we should include a link on where to find the Dockerfile.

Copy link
Author

@sberryman sberryman Apr 7, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I had a feeling this would be an issue. This is due to the "official" elasticsearch image being depreciated.

This image will receive no further updates after 2017-06-20 (June 20, 2017). Please adjust your usage accordingly. - Docker Hub

X-Pack is definitely part of the official image and has been turned off via env vars. I can remove it if that will help? Or obviously drop back to debian and build it.

Funny you mentioned not being able to find the Dockerfiles, I remember coming across a github issue mentioning the same thing. I can't seem to find that issue right now though.

Dockerfile
Base Dockerfile

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

X-Pack is definitely part of the official image and has been turned off via env vars. I can remove it if that will help?

Removing it doesn't remove it from the shipped binary blobs because of the layered file system. If we going to try to use their current build I'd say we should just fork from what they have in their Dockerfiles.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, would you like to fork based on their dockerfile? I would think copying the "official" repository from Elastic would be the best option for long term support.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could use FROM docker.elastic.co/elasticsearch/elasticsearch-alpine-base:latest but after that I imagine we could just copy in the contents of https://github.com/elastic/elasticsearch-docker/blob/master/build/elasticsearch/Dockerfile into this directory and remove the bits we don't want. We'd need to put the LICENSE notice for their repo at the top of that file.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, I'll work on that tomorrow morning

# leave this here then we get the ports as well-known environment variables
# for purposes of linking.
EXPOSE 9200
EXPOSE 9300
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Were these removed just because the official image has them already? (Again, not sure because I can't find the Dockerfile.)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bin/manage.sh Outdated
consulCommand() {
consul-cli --quiet --consul="${CONSUL_HOST}:8500" $*
}

preStart() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless I'm missing something this preStart doesn't do anything, right? Almost everything you have in the onStart should really be in the preStart I think.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing you mean preStart not the highlighted consulCommand... But correct, I just left it in there as a placeholder, will remove it now.

bin/manage.sh Outdated

if [[ -z ${CONSUL} ]]; then
readonly lockPath=service/elasticsearch-master/locks/master
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this for? It's unused anywhere else I think.

Copy link
Author

@sberryman sberryman Apr 7, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to pull it, it was left in there when I was testing using a lock

- ES_NODE_DATA=true

consul:
image: autopilotpattern/consul:0.7.2-r0.8
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seeing warnings for consol 0.8.0 now, any blockers to upgrading that you are aware of?

ContainerPilot has only 0.7 bindings, so that shouldn't be a problem but it's a possible source of problems. Also we don't have our Consul image updated yet, so we'd want to verify that the updated 0.8 Consul agent works with 0.7 first too. I'd suggest not doing this in this PR to keep the changeset manageable.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👌🏼

- ES_NODE_DATA=true
links:
- consul:consul
elasticsearch_master:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not really a big fan of elasticsearch-master, elasticsearch-data and elasticsearch (with a service name of elasticsearch-master). I'm guessing this has to do with elasticsearch_master being a "gateway" where you would expose the ports? Wouldn't it be better to have HA running and reverse proxy to the elastic masters?

The reason we have 3 variants is for flexibility in deploying the 3 roles. In a small-to-middling-size environment one would just deploy all the ES instances as master+data. In a large production environment typically you'll want to send all the writes thru masters and do all the reads on data-only instances. We didn't want to be overly opinionated about the topology in this blueprint.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to know, I've never run a big ES environment. My experience is limited to < 1 TB.

bin/manage.sh Outdated
n=$((n+1))
else

# wait for a healthy master
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is ugly how it just keeps hitting consul every second to see if there are any other services registered as 'elasticsearch-master'. Maybe creating a lock is a good idea?

You mean here? That's only during startup, so while I suppose it's ugly it's also rather simple and it goes away after the initial startup. How does creating a lock help us out here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't now that I've had a chance to sleep on it.

#
# gateway.recover_after_nodes: 3
discovery.zen.ping.unicast.hosts: ["${ES_BOOTSTRAP_HOST}"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You had ES_BOOTSTRAP_HOST as an env var for discovery.zen.ping.unicast.hosts: ["${ES_BOOTSTRAP_HOST}"] but it isn't defined in the project.

True, and this should be better documented. It's not used by default but it's an option for a user who has an existing cluster they want to bootstrap from. We didn't end up using it in https://github.com/autopilotpattern/elk and it seems to be of dubious use. I'd be fine with eliminating that.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I'll comment out that line and the function replace (which has been renamed replaceZenHosts to be a little more verbose) will place the host and uncomment it.

Something I just realized is that we only populate it with a single host. If that container restarts and the referenced node is no longer part of the cluster it will fail to join the cluster.

@sberryman
Copy link
Author

@tgross I've implemented several changes based on your feedback. Thanks for being so detailed and giving it a thorough review.

@sberryman
Copy link
Author

sberryman commented Apr 8, 2017

Joyent issue:

+ exec /usr/share/elasticsearch/bin/es-docker
time="2017-04-08T20:57:01Z" level=info msg="    2017/04/08 20:57:01 [INFO] agent: Synced node info" 
[2017-04-08T20:57:06,064][WARN ][o.e.b.JNANatives         ] unable to install syscall filter: 
java.lang.UnsupportedOperationException: seccomp unavailable: requires kernel 3.5+ with CONFIG_SECCOMP and CONFIG_SECCOMP_FILTER compiled in
	at org.elasticsearch.bootstrap.SystemCallFilter.linuxImpl(SystemCallFilter.java:350) ~[elasticsearch-5.3.0.jar:5.3.0]
	at org.elasticsearch.bootstrap.SystemCallFilter.init(SystemCallFilter.java:638) ~[elasticsearch-5.3.0.jar:5.3.0]
	at org.elasticsearch.bootstrap.JNANatives.tryInstallSystemCallFilter(JNANatives.java:215) [elasticsearch-5.3.0.jar:5.3.0]
	at org.elasticsearch.bootstrap.Natives.tryInstallSystemCallFilter(Natives.java:99) [elasticsearch-5.3.0.jar:5.3.0]
	at org.elasticsearch.bootstrap.Bootstrap.initializeNatives(Bootstrap.java:111) [elasticsearch-5.3.0.jar:5.3.0]
	at org.elasticsearch.bootstrap.Bootstrap.setup(Bootstrap.java:204) [elasticsearch-5.3.0.jar:5.3.0]
	at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:360) [elasticsearch-5.3.0.jar:5.3.0]
	at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:123) [elasticsearch-5.3.0.jar:5.3.0]
	at org.elasticsearch.bootstrap.Elasticsearch.execute(Elasticsearch.java:114) [elasticsearch-5.3.0.jar:5.3.0]
	at org.elasticsearch.cli.EnvironmentAwareCommand.execute(EnvironmentAwareCommand.java:58) [elasticsearch-5.3.0.jar:5.3.0]
	at org.elasticsearch.cli.Command.mainWithoutErrorHandling(Command.java:122) [elasticsearch-5.3.0.jar:5.3.0]
	at org.elasticsearch.cli.Command.main(Command.java:88) [elasticsearch-5.3.0.jar:5.3.0]
	at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:91) [elasticsearch-5.3.0.jar:5.3.0]
	at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:84) [elasticsearch-5.3.0.jar:5.3.0]

This error does NOT show up when running on Docker for Mac (Version 17.03.1-ce-mac5 (16048)).

Update: https://www.elastic.co/guide/en/elasticsearch/reference/current/system-call-filter-check.html
Update 2: This does appear to be a blocking issue.

@tgross
Copy link
Contributor

tgross commented Apr 10, 2017

Joyent issue:

Thanks. Let me open a ticket with the appropriate team here at Joyent and see what they say.

@tgross
Copy link
Contributor

tgross commented Apr 10, 2017

Taking a deeper look I would not expect their bootstrap check to work. Seccomp isn't currently supported by LX/sdc-docker. It's not even supported in all Linux distros, so more enterprisey Linux has a similar problem. (See also elasticsearch GH #23159, puppet GH #798, and elk-docker GH #107)

Update 2: This does appear to be a blocking issue.

Yeah looks like this was a breaking change for 5.2, where they changed it from a warning to an error. For now it looks like we can disable this check. Elastic says:

To pass the system call filter check you must either fix any configuration errors on your system that prevented system call filters from installing (check your logs), or at your own risk disable system call filters by setting bootstrap.system_call_filter to false.

@sberryman
Copy link
Author

@tgross any suggestion on how to check in preStart or onStart to see if Seccomp is available? I would prefer not to turn it off completely. Ideally it should be left on for hosts that support it and disabled for hosts such as Joyent?

But to confirm I was able to disable system call filter checks via config and have it running on Triton this weekend.

@tgross
Copy link
Contributor

tgross commented Apr 10, 2017

@tgross any suggestion on how to check in preStart or onStart to see if Seccomp is available? I would prefer not to turn it off completely. Ideally it should be left on for hosts that support it and disabled for hosts such as Joyent?

Not sure. Maybe we could hijack whatever mechanism Elastic is using? Do we know if they're running a shell script to do that check or is it all in Java?

@sberryman
Copy link
Author

This is what I found on seccomp:

Running on triton:

bash-4.3$ head /proc/config.gz
head: /proc/config.gz: No such file or directory
bash-4.3$ cat /boot/config-`uname -r`
cat: can't open '/boot/config-3.13.0': No such file or directory

Using /proc/config.gz worked when testing on Docker for Mac. It did NOT disable seccomp when running on docker locally. It properly disabled it on Joyent but I have not tested anywhere else.

Requires us to duplicate a few files from dockers official image. Also
add in a seccomp check to disable it on systems that don’t have it
@sberryman
Copy link
Author

@tgross: I've changed the base docker image and added checks for seccomp. Unfortunately we have to duplicate a few files from their official repo.

I wasn't sure what license you were talking about included on the top of the dockerfile so I took a guess.

I have it running on Joyent and scaling up and down just fine

@@ -0,0 +1,39 @@
#!/bin/bash

# Run Elasticsearch and allow setting default settings via env vars
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that we control the command we're using to call ES along with its environment, we don't really need this shell script. We can include its effects entirely in the docker-compose.yml's command section.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can move it to elastic-server.sh to reduce the number of files but it doesn't really simplify it that much does it? I don't think it is smart to remove the code, anyone who has used the official docker repo for Elasticsearch will expect that functionality to continue. I would expect switching to containerpilot from the official image to work without any changes. I would want all ENV variables I used to configure elastic to continue working as is with containerpilot.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would want all ENV variables I used to configure elastic to continue working as is with containerpilot.

Sure but there's literally only one line that doesn't work as-is: https://github.com/autopilotpattern/elasticsearch/pull/8/files/2f807599ea62bf41d3fc811586ffd0648ac40496#diff-35c179adb2b0c25802542935645540a7R37

exit 0
fi
consulCommand() {
consul-cli --quiet --consul="${CONSUL_HOST}:8500" $*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's already a consul binary in the container. We don't need this wrapper.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So you want me to pull this out and make the following changes?

Service addresses

Use curl and jq to get service addresses
https://github.com/sberryman/elasticsearch/blob/wip-v5/bin/manage.sh#L28

waitForLeader

waitForLeader using consul members -status alive | grep server

Replace: https://github.com/sberryman/elasticsearch/blob/wip-v5/bin/manage.sh#L48-L64
With: https://github.com/sberryman/autopilotpattern-nats/blob/master/bin/manage.sh#L28-L44

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that should do it.

bin/manage.sh Outdated
replace() {
REPLACEMENT=$(printf 's/^discovery\.zen\.ping\.unicast\.hosts.*$/discovery.zen.ping.unicast.hosts: ["%s"]/' ${MASTER})
sed -i "${REPLACEMENT}" /etc/elasticsearch/elasticsearch.yml
# seccomp (not supported on joyent)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to nitpick this comment slightly: # disable seccomp (only supported on newer Linux kernels)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haha, done

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants