Skip to content
This repository has been archived by the owner on Feb 22, 2022. It is now read-only.

Commit

Permalink
Harden aspects of the elasticsearch chart
Browse files Browse the repository at this point in the history
* Added configmap to explicitly provide cluster configurations and scripts

* Replace depreciating `ES_HEAP_SIZE` with `ES_JAVA_OPTS` to position for ES v5 support

* Removed alpha storage class operators

* Removed catastrophic liveness probe checking entire clusters health

* Readiness probe now inspects local node health

* Added termination grace period (defaults to 15m) to allow pre-stop-script.sh time to gracefully migrate shards

* Added init container on data nodes to configure `vm.max_map_count`

* Updated elasticsearch.yaml:
  * Added `PROCESSOR` configuration to prevent large cluster garbage collection issues leading to node eviction
  * Added configurable gateway defaults to help avoid a split brain, requiring two masters online and in consensus before recovery can continue

* Updated pre-stop-script.sh:
  * Check `v1beta1` `statefulset` endpoint
  * Evalute `.spec.replicas` for statefulset desired size
  * Clear `_cluster/settings` ip exclusion prior to shutdown to avoid a possible (random) ip match scenario on expansion of the clsuter

* Data nodes now use default storage class if once is not specified
  • Loading branch information
icereval committed May 13, 2017
1 parent 50bcbe7 commit 918f888
Show file tree
Hide file tree
Showing 8 changed files with 230 additions and 93 deletions.
6 changes: 5 additions & 1 deletion stable/elasticsearch/Chart.yaml
Original file line number Diff line number Diff line change
@@ -1,11 +1,15 @@
name: elasticsearch
home: https://www.elastic.co/products/elasticsearch
version: 0.1.5
version: 0.1.6
description: Flexible and powerful open source, distributed real-time search and analytics engine.
icon: https://static-www.elastic.co/assets/blteb1c97719574938d/logo-elastic-elasticsearch-lt.svg
sources:
- https://www.elastic.co/products/elasticsearch
- https://github.com/jetstack/elasticsearch-pet
- https://github.com/giantswarm/kubernetes-elastic-stack
- https://github.com/GoogleCloudPlatform/elasticsearch-docker
maintainers:
- name: Christian Simon
email: [email protected]
- name: Michael Haselton
email: [email protected]
65 changes: 32 additions & 33 deletions stable/elasticsearch/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,9 +25,9 @@ elasticsearch and their
## Chart Details
This chart will do the following:

* Implemented a dynamically scalable elasticsearch cluster using Kubernetes PetSets/Deployments
* Implemented a dynamically scalable elasticsearch cluster using Kubernetes StatefulSets/Deployments
* Multi-role deployment: master, client and data nodes
* PetSet Supports scaling down without degrading the cluster
* Statefulset Supports scaling down without degrading the cluster

## Installing the Chart

Expand All @@ -51,33 +51,33 @@ $ kubectl delete pvcs -l release=my-release,type=data

The following tables lists the configurable parameters of the elasticsearch chart and their default values.

| Parameter | Description | Default |
|---------------------------|-----------------------------------|----------------------------------------------------------|
| `Image` | Container image name | `jetstack/elasticsearch-pet` |
| `ImageTag` | Container image tag | `2.3.4` |
| `ImagePullPolicy` | Container pull policy | `Always` |
| `ClientReplicas` | Client node replicas (deployment) | `2` |
| `ClientCpuRequests` | Client node requested cpu | `25m` |
| `ClientMemoryRequests` | Client node requested memory | `256Mi` |
| `ClientCpuLimits` | Client node requested cpu | `100m` |
| `ClientMemoryLimits` | Client node requested memory | `512Mi` |
| `ClientHeapSize` | Client node heap size | `128m` |
| `MasterReplicas` | Master node replicas (deployment) | `2` |
| `MasterCpuRequests` | Master node requested cpu | `25m` |
| `MasterMemoryRequests` | Master node requested memory | `256Mi` |
| `MasterCpuLimits` | Master node requested cpu | `100m` |
| `MasterMemoryLimits` | Master node requested memory | `512Mi` |
| `MasterHeapSize` | Master node heap size | `128m` |
| `DataReplicas` | Data node replicas (petset) | `3` |
| `DataCpuRequests` | Data node requested cpu | `250m` |
| `DataMemoryRequests` | Data node requested memory | `2Gi` |
| `DataCpuLimits` | Data node requested cpu | `1` |
| `DataMemoryLimits` | Data node requested memory | `4Gi` |
| `DataHeapSize` | Data node heap size | `1536m` |
| `DataStorage` | Data persistent volume size | `30Gi` |
| `DataStorageClass` | Data persistent volume Class | `anything` |
| `DataStorageClassVersion` | Version of StorageClass | `alpha` |
| `Component` | Selector Key | `elasticsearch` |
| Parameter | Description | Default |
| ----------------------------------- | --------------------------------------- | ---------------------------- |
| `Image` | Container image name | `jetstack/elasticsearch-pet` |
| `ImageTag` | Container image tag | `2.4.0` |
| `ImagePullPolicy` | Container pull policy | `Always` |
| `ClientReplicas` | Client node replicas (deployment) | `2` |
| `ClientCpuRequests` | Client node requested cpu | `25m` |
| `ClientMemoryRequests` | Client node requested memory | `256Mi` |
| `ClientCpuLimits` | Client node requested cpu | `1` must be an integer |
| `ClientMemoryLimits` | Client node requested memory | `512Mi` |
| `ClientHeapSize` | Client node heap size | `128m` |
| `MasterReplicas` | Master node replicas (deployment) | `2` |
| `MasterCpuRequests` | Master node requested cpu | `25m` |
| `MasterMemoryRequests` | Master node requested memory | `256Mi` |
| `MasterCpuLimits` | Master node requested cpu | `1` must be an integer |
| `MasterMemoryLimits` | Master node requested memory | `512Mi` |
| `MasterHeapSize` | Master node heap size | `128m` |
| `DataReplicas` | Data node replicas (statefulset) | `3` |
| `DataCpuRequests` | Data node requested cpu | `250m` |
| `DataMemoryRequests` | Data node requested memory | `2Gi` |
| `DataCpuLimits` | Data node requested cpu | `1` must be an integer |
| `DataMemoryLimits` | Data node requested memory | `4Gi` |
| `DataHeapSize` | Data node heap size | `1536m` |
| `DataStorage` | Data persistent volume size | `30Gi` |
| `DataStorageClass` | Data persistent volume Class | `nil` |
| `DataTerminationGracePeriodSeconds` | Data termination grace period (seconds) | `900` |
| `Component` | Selector Key | `elasticsearch` |

Specify each parameter using the `--set key=value[,key=value]` argument to `helm install`.

Expand All @@ -102,7 +102,7 @@ would degrade performance heaviliy. The issue is tracked in

## Select right storage class for SSD volumes

### GCE + Kubernetes 1.4
### GCE + Kubernetes 1.5

Create StorageClass for SSD-PD

Expand All @@ -117,9 +117,8 @@ parameters:
type: pd-ssd
EOF
```
Create cluster with Storage class `ssd` on Kubernetes 1.4+
Create cluster with Storage class `ssd` on Kubernetes 1.5+

```
$ helm install incubator/elasticsearch --name my-release --set DataStorageClass=ssd,DataStorageClassVersion=beta
$ helm install incubator/elasticsearch --name my-release --set DataStorageClass=ssd,DataStorage=100Gi
```
32 changes: 20 additions & 12 deletions stable/elasticsearch/templates/elasticsearch-client-deployment.yaml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
apiVersion: extensions/v1beta1
apiVersion: apps/v1beta1
kind: Deployment
metadata:
name: "{{ printf "%s-client-%s" .Release.Name .Values.Name | trunc 24 }}"
Expand Down Expand Up @@ -33,8 +33,12 @@ spec:
value: "false"
- name: NODE_MASTER
value: "false"
- name: ES_HEAP_SIZE
value: "{{.Values.ClientHeapSize}}"
- name: PROCESSORS
valueFrom:
resourceFieldRef:
resource: limits.cpu
- name: ES_JAVA_OPTS
value: "-Djava.net.preferIPv4Stack=true -Xms{{.Values.ClientHeapSize}} -Xmx{{.Values.ClientHeapSize}}"
- name: KUBERNETES_MASTER
value: kubernetes.default.svc.cluster.local
resources:
Expand All @@ -44,22 +48,26 @@ spec:
limits:
cpu: "{{.Values.ClientCpuLimits}}"
memory: "{{.Values.ClientMemoryLimits}}"
livenessProbe:
httpGet:
path: /
port: 9200
initialDelaySeconds: 90
timeoutSeconds: 10
readinessProbe:
httpGet:
path: /
path: /_cluster/health?local=true
port: 9200
initialDelaySeconds: 90
timeoutSeconds: 10
initialDelaySeconds: 5
image: "{{.Values.Image}}:{{.Values.ImageTag}}"
imagePullPolicy: "{{.Values.ImagePullPolicy}}"
ports:
- containerPort: 9200
name: http
- containerPort: 9300
name: transport
volumeMounts:
- mountPath: /usr/share/elasticsearch/config/elasticsearch.yml
name: elasticsearch-config
subPath: elasticsearch.yml
- mountPath: /usr/share/elasticsearch/config/logging.yml
name: elasticsearch-config
subPath: logging.yml
volumes:
- name: elasticsearch-config
configMap:
name: "{{ printf "%s-%s" .Release.Name .Values.Name | trunc 24 }}"
111 changes: 111 additions & 0 deletions stable/elasticsearch/templates/elasticsearch-configmap.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,111 @@
apiVersion: v1
kind: ConfigMap
metadata:
name: "{{ printf "%s-%s" .Release.Name .Values.Name | trunc 24 }}"
labels:
app: "{{ printf "%s-%s" .Release.Name .Values.Name | trunc 24 }}"
chart: "{{ .Chart.Name }}-{{ .Chart.Version }}"
release: "{{ .Release.Name }}"
heritage: "{{ .Release.Service }}"
data:
pre-stop-hook.sh: |-
#!/bin/bash
set -e
SERVICE_ACCOUNT_PATH=/var/run/secrets/kubernetes.io/serviceaccount
KUBE_TOKEN=$(<${SERVICE_ACCOUNT_PATH}/token)
KUBE_NAMESPACE=$(<${SERVICE_ACCOUNT_PATH}/namespace)
STATEFULSET_NAME=$(echo "${HOSTNAME}" | sed 's/-[0-9]*$//g')
INSTANCE_ID=$(echo "${HOSTNAME}" | grep -o '[0-9]*$')
echo "Prepare stopping of Pet ${KUBE_NAMESPACE}/${HOSTNAME} of StatefulSet ${KUBE_NAMESPACE}/${STATEFULSET_NAME} instance_id ${INSTANCE_ID}"
INSTANCES_DESIRED=$(curl -s \
--cacert ${SERVICE_ACCOUNT_PATH}/ca.crt \
-H "Authorization: Bearer $KUBE_TOKEN" \
"https://${KUBERNETES_SERVICE_HOST}:${KUBERNETES_PORT_443_TCP_PORT}/apis/apps/v1beta1/namespaces/${KUBE_NAMESPACE}/statefulsets/${STATEFULSET_NAME}/status" | jq -r '.spec.replicas')
echo "Desired instance count is ${INSTANCES_DESIRED}"
if [ "${INSTANCE_ID}" -lt "${INSTANCES_DESIRED}" ]; then
echo "No data migration needed"
exit 0
fi
echo "Prepare to migrate data of the node"
NODE_STATS=$(curl -s -XGET 'http://localhost:9200/_nodes/stats')
NODE_IP=$(echo "${NODE_STATS}" | jq -r ".nodes[] | select(.name==\"${HOSTNAME}\") | .host")
echo "Move all data from node ${NODE_IP}"
curl -s -XPUT localhost:9200/_cluster/settings -d "{
\"transient\" :{
\"cluster.routing.allocation.exclude._ip\" : \"${NODE_IP}\"
}
}"
echo
echo "Wait for node to become empty"
DOC_COUNT=$(echo "${NODE_STATS}" | jq ".nodes[] | select(.name==\"${HOSTNAME}\") | .indices.docs.count")
while [ "${DOC_COUNT}" -gt 0 ]; do
NODE_STATS=$(curl -s -XGET 'http://localhost:9200/_nodes/stats')
DOC_COUNT=$(echo "${NODE_STATS}" | jq -r ".nodes[] | select(.name==\"${HOSTNAME}\") | .indices.docs.count")
echo "Node contains ${DOC_COUNT} documents"
sleep 1
done
curl -s -XPUT localhost:9200/_cluster/settings -d "{
\"transient\" :{
\"cluster.routing.allocation.exclude._ip\" : \"\"
}
}"
echo
echo "Node clear to shutdown"
elasticsearch.yml: |-
node.data: ${NODE_DATA:true}
node.master: ${NODE_MASTER:true}
node.name: ${HOSTNAME}
# see https://github.com/kubernetes/kubernetes/issues/3595
bootstrap.mlockall: ${BOOTSTRAP_MLOCKALL:false}
network.host: 0.0.0.0
cloud:
kubernetes:
service: ${SERVICE}
namespace: ${KUBERNETES_NAMESPACE}
discovery:
type: kubernetes
zen:
minimum_master_nodes: 2
# see https://github.com/elastic/elasticsearch-definitive-guide/pull/679
processors: ${PROCESSORS:}
# avoid split-brain w/ a minimum consensus of two masters plus a data node
gateway.expected_master_nodes: ${EXPECTED_MASTER_NODES:2}
gateway.expected_data_nodes: ${EXPECTED_DATA_NODES:1}
gateway.recover_after_time: ${RECOVER_AFTER_TIME:5m}
gateway.recover_after_master_nodes: ${RECOVER_AFTER_MASTER_NODES:2}
gateway.recover_after_data_nodes: ${RECOVER_AFTER_DATA_NODES:1}
logging.yml: |-
# you can override this using by setting a system property, for example -Des.logger.level=DEBUG
es.logger.level: INFO
rootLogger: ${es.logger.level}, console
logger:
# log action execution errors for easier debugging
action: DEBUG
# reduce the logging for aws, too much is logged under the default INFO
com.amazonaws: WARN
appender:
console:
type: console
layout:
type: consolePattern
conversionPattern: "[%d{ISO8601}][%-5p][%-25c] %m%n"
51 changes: 38 additions & 13 deletions stable/elasticsearch/templates/elasticsearch-data-statefulset.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,19 @@ spec:
type: data
annotations:
pod.alpha.kubernetes.io/initialized: "true"
# see https://www.elastic.co/guide/en/elasticsearch/reference/current/vm-max-map-count.html
# and https://www.elastic.co/guide/en/elasticsearch/reference/current/setup-configuration-memory.html#mlockall
pod.alpha.kubernetes.io/init-containers: '[
{
"name": "sysctl",
"image": "busybox",
"imagePullPolicy": "Always",
"command": ["sysctl", "-w", "vm.max_map_count=262144"],
"securityContext": {
"privileged": true
}
}
]'
spec:
serviceAccountName: "{{ printf "%s-%s" .Release.Name .Values.Name | trunc 24 }}"
containers:
Expand All @@ -38,8 +51,12 @@ spec:
fieldPath: metadata.namespace
- name: NODE_MASTER
value: "false"
- name: ES_HEAP_SIZE
value: "{{.Values.DataHeapSize}}"
- name: PROCESSORS
valueFrom:
resourceFieldRef:
resource: limits.cpu
- name: ES_JAVA_OPTS
value: "-Djava.net.preferIPv4Stack=true -Xms{{.Values.DataHeapSize}} -Xmx{{.Values.DataHeapSize}}"
- name: KUBERNETES_MASTER
value: kubernetes.default.svc.cluster.local
image: "{{.Values.Image}}:{{.Values.ImageTag}}"
Expand All @@ -54,32 +71,40 @@ spec:
limits:
cpu: "{{.Values.DataCpuLimits}}"
memory: "{{.Values.DataMemoryLimits}}"
livenessProbe:
httpGet:
path: /
port: 9200
initialDelaySeconds: 90
timeoutSeconds: 10
readinessProbe:
httpGet:
path: /
path: /_cluster/health?local=true
port: 9200
initialDelaySeconds: 90
timeoutSeconds: 10
initialDelaySeconds: 5
volumeMounts:
- mountPath: /usr/share/elasticsearch/data
name: elasticsearch-data
- mountPath: /usr/share/elasticsearch/config/elasticsearch.yml
name: elasticsearch-config
subPath: elasticsearch.yml
- mountPath: /usr/share/elasticsearch/config/logging.yml
name: elasticsearch-config
subPath: logging.yml
- name: elasticsearch-config
mountPath: /pre-stop-hook.sh
subPath: pre-stop-hook.sh
lifecycle:
preStop:
exec:
command: ["/bin/bash","/pre-stop-hook.sh"]
terminationGracePeriodSeconds: {{.Values.DataTerminationGracePeriodSeconds}}
volumes:
- name: elasticsearch-config
configMap:
name: "{{ printf "%s-%s" .Release.Name .Values.Name | trunc 24 }}"
volumeClaimTemplates:
- metadata:
name: elasticsearch-data
annotations:
volume.{{ .Values.DataStorageClassVersion }}.kubernetes.io/storage-class: "{{ .Values.DataStorageClass }}"
spec:
accessModes: [ ReadWriteOnce ]
{{- if .Values.DataStorageClass }}
storageClassName: "{{.Values.DataStorageClass}}"
{{- end }}
resources:
requests:
storage: "{{.Values.DataStorage}}"
18 changes: 0 additions & 18 deletions stable/elasticsearch/templates/elasticsearch-data-svc.yaml

This file was deleted.

Loading

0 comments on commit 918f888

Please sign in to comment.